Cover Image for Live Podcast: Quantizing LLMs to run on smaller systems with Llama.cpp
Cover Image for Live Podcast: Quantizing LLMs to run on smaller systems with Llama.cpp
Avatar for AIFoundry.org
Presented by
AIFoundry.org
In-person and virtual community events of AIFoundry.org
Hosted By
20 Going

Live Podcast: Quantizing LLMs to run on smaller systems with Llama.cpp

Virtual
Registration
Past Event
Welcome! To join the event, please register below.
About Event

An essential requirement to make LLMs more accessible to developers and applications is making it possible for larger, more accurate models to run efficiently on smaller systems such as laptops.  This requires two steps: 1) models must be compacted to run more efficiently via quantization, and 2) running these quantized models with hardware-optimized software such as llama.cpp. This makes it possible to run on local systems with only minor tradeoffs in performance and accuracy.

Yulia Yakovleva will walk through the theories behind the quantization of models and discuss how llama.CPP works, and then demonstrate the quantization of the OLMo model and compare the relative performance of the quantized models.

Avatar for AIFoundry.org
Presented by
AIFoundry.org
In-person and virtual community events of AIFoundry.org
Hosted By
20 Going