FA2: Next-Level Attention

Public AIM Events!

YouTube

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Attention is All You Need” goes the wisdom of the OG transformer paper.

Although it’s been nearly seven years since the paper came out in June of 2017, most aspiring AI Engineering and AI Engineering leaders still don’t have a solid grasp of the attention mechanism itself!

If we want to understand the state-of-the-art methods used to calculate attention in LLMs, we must first understand the basics. We’ve covered the attention mechanism elsewhere, and will review it during this event!

The problem with attention calculations is that they are quite compute intensive; typically the attention layer is the main bottleneck when feeding longer sequences of text through the Transformer.

Thus, attention computations had to be made more efficient for memory and speed. Thanks to FlashAttention, an “IO-aware exact attention algorithm that uses tiling to reduce the number of memory reads/writes between GPU high bandwidth memory (HBM) and GPU on-chip SRAM,” attention was streamlined in May of 2022.

Wait, what? No worries - we’ll break down each word live during the event!

In July of 2023, FlashAttention-2 was released, which exploited the “asymmetric GPU memory hierarchy” to increase memory savings and runtime speedup.

In this event, we’ll explore the initial optimizations used in Flash Attention and the additional optimizations used to bring FA2 to live.

Of course, we’ll break down the concepts and code from first principles, and give the Flash Attention methods a test drive, as well as do comparisons.

📚 You’ll learn:

How the attention mechanism works in transformers
How attention calculations evolved to be progressively faster Flash Attention & FA2
Why you’ll see FA2 when you see high-performance SOTA model benchmarking

🤓 Who should attend the event:

Aspiring AI Engineers who want to understand the latest tools from the LLM edge
AI Engineering leaders interested in speeding up training and inference

Speakers

Dr. Greg” Loughnane is the Co-Founder & CEO of AI Makerspace, where he is an instructor for The AI Engineering Bootcamp and LLM Engineering: The Foundations. Since 2021 he has built and led industry-leading Machine Learning education programs. Previously, he worked as an AI product manager, a university professor teaching AI, an AI consultant and startup advisor, and an ML researcher. He loves trail running and is based in Dayton, Ohio.
Chris “The Wiz” Alexiuk is the Co-Founder & CTO at AI Makerspace, where he is an instructor for The AI Engineering Bootcamp and LLM Engineering: The Foundations. During the day, he is also a Developer Advocate at NVIDIA. Previously, he was a Founding Machine Learning Engineer, Data Scientist, and ML curriculum developer and instructor. He’s a YouTube content creator YouTube who’s motto is “Build, build, build!” He loves Dungeons & Dragons and is based in Toronto, Canada.

Follow AI Makerspace on LinkedIn and YouTube to stay updated about workshops, new courses, and corporate training opportunities.

Presented by

Public AIM Events!

Hosted By

41 Went

FA2: Next-Level Attention

​Speakers

Speakers