Towards a New Paradigm of Distributed AI Training by Google DeepMind
Modern AI research relies on large-scale training. At the same time, making it more robust and efficient still represents a challenge. Researchers from Google DeepMind, in order to address the distributed training problem, suggested a prototype of a scalable, modular ML paradigm – an architecture and training algorithm called Distributed Paths Composition (DiPaCo).
DiPaCo’s architecture and optimization have been co-designed to reduce communication and enable better scaling. The high-level idea is to distribute computation by path. In this context, a 'path' refers to a sequence of modules that define an input-output function. Paths are small relative to the entire model and require only a handful of tightly connected devices to train or evaluate.
During both training and deployment, a query is routed to a replica of a path rather than a replica of the whole model. In other words, the DiPaCo architecture is sparsely activated.
Arthur Douillard, Senior Research Scientist at Google DeepMind, will share with the BuzzRobot community the technical details of a new distributed training paradigm.