Long Context Windows: Extending Llama 3
When Llama 3 came out, it was announced that “Over the coming months, we’ll release multiple models with new capabilities including … a much longer context window.”
For the industry, the race was on to create a much longer context window for Llama 3 before meta.
Gradient AI did just that.
On Thursday, April 18, Meta released Llama 3.
On Wednesday, April 24, Gradient released a 160k context-length version of Llama 3 8B
On Monday, April 29, Gradient released a 1M context-length version of Llama 3 8B
On Friday, May 3, they released a 262k context-length version of Llama 3 70B.
And on Saturday, May 4, they released a 524k context-length version of Llama 3 70B and the 1M Llama 3 70B.
The Fourth was definitely with them.
By May 8th, Gradient put out the 4M Llama 3 8B.
—
So, what is going on?
How are they doing this? How is Gradient AI’s small team getting long-context window versions of Llama 3 to market faster than Meta?
In this event, we dive into the technical details required to increase the context window size of these open-source LLMs. We will discuss the technical challenges as parameters increase from 8B to 70B. We will also discuss the compute requirements and details directly with the Gradient AI team!
Finally, we’ll look at one of the most popular and cited benchmarks for long-context LLM retrieval: Greg Kamradt’s Needle in a Haystack. This is the benchmark that Gradient has been reporting each time it releases a new model, and it’s essential to understand as long-context models become more and more popular!
Finally, we’ll discuss directly with Gradient's Chief Scientist, Leo Pekelis, how to use RAG versus Long-Context LLMs. Do we even need RAG now that we have these great long-context models?
Tune in to find out!
Join us live to build with the latest with us and get your questions answered!
📚 You’ll learn:
How context window length is extended of an open-source LLM, technically
Evaluation techniques for long-context window LLMs like a needle in a haystack
To think about how to answer the question of “RAG or just use a Long-Context window?”
Speakers:
Leonid Pekelis, is a Chief Scientist at Gradient leading research and analytics, a full stack AI platform that enables businesses to build customized agents to power enterprise workload. Prior to Gradient, Leo led CloudTruck's ML and data science orgs pioneering applied ML to operational challenges. Before that, Leo held leadership roles across Opendoor, Optimizely, and Disney. Leo holds a bachelor's degree in economics from Stanford, as well as a masters and PhD in statistics from Stanford.
Dr. Greg” Loughnane is the Co-Founder & CEO of AI Makerspace, where he is an instructor for their AI Engineering Bootcamp. Since 2021, he has built and led industry-leading Machine Learning education programs. Previously, he worked as an AI product manager, a university professor teaching AI, an AI consultant and startup advisor, and an ML researcher. He loves trail running and is based in Dayton, Ohio.
Chris “The Wiz” Alexiuk is the Co-Founder & CTO at AI Makerspace, where he is an instructor for their AI Engineering Bootcamp. During the day, he is also a Developer Advocate at NVIDIA. Previously, he was a Founding Machine Learning Engineer, Data Scientist, and ML curriculum developer and instructor. He’s a YouTube content creator YouTube who’s motto is “Build, build, build!” He loves Dungeons & Dragons and is based in Toronto, Canada.
Follow AI Makerspace on LinkedIn and YouTube to stay updated about workshops, new courses, and corporate training opportunities.