Cover Image for The Llama 4 Herd
Cover Image for The Llama 4 Herd
Avatar for Public AIM Events!
Presented by
Public AIM Events!
Hosted By
7 Went

The Llama 4 Herd

YouTube
Registration
Past Event
Welcome! To join the event, please register below.
About Event

The Llama 4 Herd of Models just dropped! And they’re Mixture-of-Experts (MoE) models!

The headline is “Leading Multimodal Intelligence.”

There are three models:

  • Llama 4 Behemoth (2T parameters, 288B active parameters, 16 experts): “The most intelligent teacher model for distillation.”

  • Llama 4 Maverick (400B parameters, 17B, 128 experts): “The native multimodal with 1M context length.”

  • Llama 4 Scout (109B parameters, 17B active parameters, 16 experts): “Industry leading 10M context length optimized inference.”

Since the release, there’s been plenty of drama.

We've also heard claims that we trained on test sets -- that's simply not true and we would never do that. Our best understanding is that the variable quality people are seeing is due to needing to stabilize implementations. ~ Ahmad Al-Dahle who leads GenAI @ Meta

Importantly, there’s also been big progress in the community to bring the models down to a size that we, as AI Engineers, can run locally!

Unsloth just released a new guide on how to run and fine-tune Llama 4!

The full unquantized version requires 113GB of disk space whilst the 1.78-bit version uses 33.8GB (-75% reduction in size). Maverick (402Bs) went from 422GB to just 122GB (-70%).

Our Dynamic iMatrix GGUFs ensure the best balance of accuracy & size by selectively quantizing layers. We used calibration data (1M tokens for Maverick) for quantization, largely increasing accuracy.

In this event, we’ll cover what we know about Llama 4, from the technical details of each model - importantly all Mixture-of-Experts models! - including pretraining, post-training, and benchmark performance.

We’ll also talk about why the excitement around the model has been tempered by skepticism and criticism. Many users report that Llama 4 underperforms in real-world tasks compared to its benchmark scores, for instance. Additionally, the Llama-4-Maverick-03-26-Experimental model raised some eyebrows within LMArena (formerly LMSys). Perhaps this is an indication of the use of benchmarks in general for the industry vs. something that’s hyper-specific to Llama 4 and the team that put it out? We’ll discuss.

Importantly, we’ll discuss whether we should be using Llama 4 as a best-practice tool for enterprise or as we build, ship, and share new LLM applications from zero to one in Q2, 2025.

📚 You’ll learn:

  • The technical details of the new Llama 4 Herd of Mixture-of-Experts models

  • About the pre-training, post-training, and benchmark methods used to train and evaluate

  • How Unsloth selectively quantized layers to speed up inference on off-the-shelf and fine-tuned versions of the Llama 4 models

🤓 Who should attend the event:

  • AI Engineers who want to build with the latest open-source LLMs

  • AI Engineers who want to understand how the latest open MoE LLMs operate under the hood

  • AI Engineering leaders who are curious if Llama 4 should be considered a best-practice LLM to build LLM applications with today!

Speaker Bios:

  • Dr. Greg” Loughnane is the Co-Founder & CEO of AI Makerspace, where he is an instructor for their AI Engineering Bootcamp. Since 2021, he has built and led industry-leading Machine Learning education programs.  Previously, he worked as an AI product manager, a university professor teaching AI, an AI consultant and startup advisor, and an ML researcher.  He loves trail running and is based in Dayton, Ohio.

  • Chris “The Wiz” Alexiuk is the Co-Founder & CTO at AI Makerspace, where he is an instructor for their AI Engineering Bootcamp. During the day, he is also a Developer Advocate at NVIDIA. Previously, he was a Founding Machine Learning Engineer, Data Scientist, and ML curriculum developer and instructor. He’s a YouTube content creator YouTube who’s motto is “Build, build, build!” He loves Dungeons & Dragons and is based in Toronto, Canada.

Follow AI Makerspace on LinkedIn and YouTube to stay updated about workshops, new courses, and corporate training opportunities.

Avatar for Public AIM Events!
Presented by
Public AIM Events!
Hosted By
7 Went