Training DBRX (Databricks LLM): Model Design and Challenges
In this talk Shashank Rajput, a research scientist at Databricks Mosaic Research, will cover three aspects of training DBRX (Databricks LLM):
The Mixture of Experts architecture.
The process for determining various model components and hyperparameters.
Challenges of large-scale training.
He will begin with a brief introduction to MOE (Mixture of Experts) models and explain why his team chose that architecture.
Then he will discuss the various other model components and hyperparameter choices his team had, as well as how they made their decisions.
Shashank will also address the challenges they encountered during large-scale training and how those problems were mitigated.
Speakers bio:
Shashank Rajput is a research scientist in the pre-training team at Databricks, where he was part of the team that trained DBRX.
He received his PhD from the University of Wisconsin - Madison, where his dissertation was recognized as the runner-up for the Best PhD Thesis Award by the Department of Computer Sciences.
He is also a recipient of the Google PhD Fellowship.