Cover Image for Sparsity for Efficient LLM Inference: A UPenn Lecture | Sponsored by Turing

Presented by

Unleashing the world's untapped human potential to accelerate AGI. Solving the human intelligence bottleneck with genAI products and solutions.

Hosted By

19 Went

Sparsity for Efficient LLM Inference: A UPenn Lecture | Sponsored by Turing

Turing Events

Zoom

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

You are invited to join a special virtual stream of Prof. Mayur Naik’s CIS 7000 course on Large Language Models at the University of Pennsylvania, sponsored by Turing.

Speaker: Kai Sheng Tai, Research Scientist, Meta

Abstract: This lecture surveys the many faces of sparsity in the context of efficient LLM inference. First, we cover post-training pruning algorithms that zero-out 50% or more of a trained LLM's parameters while minimizing quality loss. Next, we give an overview of methods that set the sparsity pattern dynamically based on the model's input. Throughout, we will discuss the various tradeoffs that arise when deciding which of these tools to use in practice.

Bio: Kai Sheng is a Research Scientist at Meta working on inference efficiency for on-device LLMs. Prior to Meta, he received his Ph.D. in CS from Stanford, focusing on algorithms and architectures for resource efficient machine learning.

Supplementary Reading: [paper 1] [paper 2]

Learn more about the course at CIS 7000 - Large Language Models.

Presented by

Turing Events

Unleashing the world's untapped human potential to accelerate AGI. Solving the human intelligence bottleneck with genAI products and solutions.

Hosted By

19 Went