

Scaling AI: From Colab to Clusters — A Practitioner’s Guide to Distributed Training and Inference (with Zach Mueller)
Training big models used to be reserved for OpenAI or DeepMind. But these days? Builders everywhere have access to clusters of 4090s, Modal credits, and open-weight models like LLaMA 3 and Qwen.
Zach Mueller, Technical Lead for Accelerate at Hugging Face and creator of a new course on distributed ML, joins us to talk about what scaling actually looks like in 2025 for individual devs and small teams.
We’ll break down the messy middle between “just use Colab” and “spin up 128 H100s,” and explore how scaling, training, and inference are becoming skills that every ML builder needs.
We’ll cover:
⚙️ When (and why) you actually need scale
🧠 How distributed training works under the hood
💸 Avoiding wasted compute and long runtimes
📦 How to serve models that don’t fit on one GPU
📈 Why this skillset is becoming essential—even for inference
Whether you’re fine-tuning a model at work, experimenting with open weights at home, or just wondering how the big models get trained, this session will help you navigate the stack—without drowning in systems details.
🚀 Want to go deeper?
Zach is also teaching a full 4-week course on distributed training: From Scratch to Scale (Sept 1–Oct 3). It’s hands-on, async-friendly, and packed with practical content — covering DDP, FSDP, ZeRO, DeepSpeed, and more. The course includes:
$500 in compute credits from Modal
6 months of Hugging Face Pro
Guest speakers from Hugging Face, Meta, and TorchTitan
Zach has kindly offered $450 off for friends of Vanishing Gradients — grab your spot here: