SuperNova: Distillation of Llama 3.1
Small Language Models (SLMs) are gaining popularity and the trend towards LMs becoming ever-larger and ever-smaller continues.
SLMs are particularly powerful in settings where the LLM can benefit from specialization in a particular domain. One method for making large language models smaller and more efficient is distillation.
Effectively, we can make big models smaller while maintaining much of the performance achieved through the massive cost associated with the initial unsupervised pretraining. In the case of Llama 3.1-405B-Instruct, we are actually distilling an instruct-tuned, aligned version of the model, so the costs associated with both supervised fine-tuning and alignment are also baked in.
According to Arcee.ai, who trained Arcee-SuperNova (Llama-3-SuperNova), it is the most flexible, secure, and cost-effective language model on the market
. Really? Color us curious.
In this event, we’ll do a deep dive into how their team created SuperNova (alongside Lucas and Fernando, from our previous event on Spectrum), what exactly distillation means and how it’s accomplished, and what the implications are of being able to distill 405B parameter (or larger) models down to much more manageable sizes.
Previously, we’ve seen how Arcee has leveraged simple ideas that were well-implemented programmatically to produce great results with Domain Adapted Language Modeling (DALM), mergekit, and Spectrum. This time, we investigate the concepts and code that underlie SuperNova!
You’ll learn:
How SuperNova was trained to start with off-the-shelf Llama-3.1-405B
How distillation allows us to benefit directly from significant pretraining done on Open-Source models
Who should attend the event?
GenAI enthusiasts interested in LLM innovation by startups at the open-source LLM edge
Aspiring AI Engineers looking to leverage open-source LLMs in their apps
AI Engineering leaders who want to understand tools for leveraging SLMs in production
Speakers
Lucas Atkins is a research engineer at Arcee.ai, where he specializes in alignment. As the primary implementer of Spectrum, Lucas played a crucial role in integrating this technology into Arcee's training pipeline. He oversees the company's Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) training pipelines, constantly pushing the boundaries of open-source post-training techniques. Lucas's work focuses on ensuring that Arcee's methodologies remain closely aligned with cutting-edge closed-source solutions, contributing to the advancement of responsible AI development.
Fernando Fernandes Neto, an AI Research Scientist at Arcee.ai. Blending deep technical expertise with business acumen, he transforms complex data into actionable insights and cutting-edge AI solutions. With a PhD in Complex Systems Engineering and a master's in both Industrial Processes and Financial Engineering, he brings a multidisciplinary approach to solving intricate business and technological challenges.
Dr. Greg” Loughnane is the Co-Founder & CEO of AI Makerspace, where he is an instructor for their AI Engineering Bootcamp. Since 2021 he has built and led industry-leading Machine Learning education programs. Previously, he worked as an AI product manager, a university professor teaching AI, an AI consultant and startup advisor, and an ML researcher. He loves trail running and is based in Dayton, Ohio.
Chris “The Wiz” Alexiuk is the Co-Founder & CTO at AI Makerspace, where he is an instructor for their AI Engineering Bootcamp. During the day, he is also a Developer Advocate at NVIDIA. Previously, he was a Founding Machine Learning Engineer, Data Scientist, and ML curriculum developer and instructor. He’s a YouTube content creator YouTube who’s motto is “Build, build, build!” He loves Dungeons & Dragons and is based in Toronto, Canada.
Follow AI Makerspace on LinkedIn and YouTube to stay updated about workshops, new courses, and corporate training opportunities.