Cover Image for Sparsity for Efficient LLM Inference: A UPenn Lecture | Sponsored by Turing
Cover Image for Sparsity for Efficient LLM Inference: A UPenn Lecture | Sponsored by Turing
Avatar for Turing Events
Presented by
Turing Events
Unleashing the world's untapped human potential to accelerate AGI. Solving the human intelligence bottleneck with genAI products and solutions.
Hosted By
19 Went

Sparsity for Efficient LLM Inference: A UPenn Lecture | Sponsored by Turing

Zoom
Registration
Past Event
Welcome! To join the event, please register below.
About Event

​You are invited to join a special virtual stream of Prof. Mayur Naik’s CIS 7000 course on Large Language Models at the University of Pennsylvania, sponsored by Turing.

Speaker: Kai Sheng Tai, Research Scientist, Meta

Abstract: This lecture surveys the many faces of sparsity in the context of efficient LLM inference. First, we cover post-training pruning algorithms that zero-out 50% or more of a trained LLM's parameters while minimizing quality loss. Next, we give an overview of methods that set the sparsity pattern dynamically based on the model's input. Throughout, we will discuss the various tradeoffs that arise when deciding which of these tools to use in practice.

Bio: Kai Sheng is a Research Scientist at Meta working on inference efficiency for on-device LLMs. Prior to Meta, he received his Ph.D. in CS from Stanford, focusing on algorithms and architectures for resource efficient machine learning.

Supplementary Reading: [paper 1] [paper 2]

​Learn more about the course at CIS 7000 - Large Language Models.

Avatar for Turing Events
Presented by
Turing Events
Unleashing the world's untapped human potential to accelerate AGI. Solving the human intelligence bottleneck with genAI products and solutions.
Hosted By
19 Went