Cover Image for AI Meetup Berlin: MoE inference economics from first priciples
Cover Image for AI Meetup Berlin: MoE inference economics from first priciples
Avatar for Aleph Alpha AI Meetup
10 Going

AI Meetup Berlin: MoE inference economics from first priciples

Registration
Welcome! To join the event, please register below.
About Event

Piotr (Senior AI Inference Engineer at Aleph Alpha, https://x.com/tugot17) will talk about MoE inference economics from first priciples.

The release of Kimi K2 mixture-of-expert (MoE) models has firmly established them as the leading architecture of large language models (LLMs) at the intelligence frontier. Due to their massive size (+1 trillion parameters) and sparse computation pattern, selectively activating parameter subsets rather than the entire model for each token, MoE-style LLMs present significant challenges for inference workloads, significantly altering the underlying inference economics. With the ever-growing consumer demand for AI models, as well as the internal need of AGI companies to generate trillions of tokens of synthetic data, the "cost per token" is becoming an even more important factor, determining the profit margins and the cost of capex required for internal reinforcment learning (RL) training rollouts.
In this talk we will go through the details of the cost structure of generating a "DeepSeek token," we will discuss the tradeoffs between latency/throughput and cost, and we will try to estimate the optimal setup to run it.

Location
Aleph Alpha Berlin
Ritterstraße 6, 10969 Berlin, Germany
Ground floor
Avatar for Aleph Alpha AI Meetup
10 Going