Cover Image for Google DeepMind’s Griffin architecture: A challenger to the Transformer?
Cover Image for Google DeepMind’s Griffin architecture: A challenger to the Transformer?
Hosted By
112 Going

Google DeepMind’s Griffin architecture: A challenger to the Transformer?

Hosted by BuzzRobot
Zoom
Registration
Past Event
Welcome! To join the event, please register below.
About Event

In the last few years, transformers have become the default architecture for sequential modeling tasks like language modeling. However, a new family of models, such as state space models, is trying to challenge the status quo.

In this talk, a Google DeepMind research scientist, Aleksandar Botev, will investigate the recent progress on these classes of models and provide context and different perspectives on them from both theoretical and practical points of view.

He will argue that not only the choice of the recurrent layer matters, but the whole block design and architecture also play a huge role in their success.

With this, Aleksandar will present to the BuzzRobot community Griffin—a Gated Linear Recurrent Unit and Local Attention system that achieves state-of-the-art performance similar to Transformers but is significantly faster at inference, both in latency and throughput.

He will also show that these models can leverage much longer contexts than those on which they were trained and will discuss the interesting implications of this.

Hosted By
112 Going