Optimization of LLMs: Series Introduction

Public AIM Events!

YouTube

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

We’ve all probably heard about GRPO at this point. Or at least, about DeepSeek.

And RLHF. And PPO. And DPO. Right? What about CGPO? what’s up with all these PO’s?

Let’s define them.

Group Relative Policy Optimization (GRPO) (re; DeepSeek, 2024)
Constrained Generative Policy Optimization (CGPO) (re; Meta, 2024)
Direct Preference Optimization (DPO) (re; Stanford, 2023)
Reinforcement Learning with Human Feedback (re; DeepMind, OpenAI, 2017)
Proximal Policy Optimization (PPO) (re; OpenAI, 2017)

What’s the keyword? It’s O - it’s Optimization. Let’s trace this history back even further.

PPO is what’s called a gradient-based policy optimization technique. It’s based on on optimizing a “surrogate,” or approximate objective function using stochastic gradient ascent.

What?

What’s a surrogate model? What’s an objective function? What’s stochastic gradient ascent? I’ve heard of gradient descent… 🤔

Gradient descent, also called the “method of steepest descent” focuses on finding optimum solutions to bounded problems by looking at the gradients, or the derivatives, of the objective function with respect to the design parameters and constraints that define our design space.

Fundamentally, and generally, we are talking about the process of optimizing a design.

In the end, when we “train” an LLM, we are “optimizing its design.”

This is the guiding principle. And it runs deep. In this new series from AI Makerspace, we want to follow the thread of design optimization all the way to LLMs.

We want to create a roadmap of how we got from gradient descent to where we are today.

Of course, when we optimize a Language Model, we must know what we’re optimizing for. What task do we want to become good at?

We might want to do a regression, or a classification, an object detection, a sequence reconstruction - or perhaps we should say a sequence transduction.

“The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best performing models also connect the encoder and decoder through an attention mechanism.” ~ Attention is All You Need, 2017 [Ref]

We can follow the threads from classic optimization techniques, to classic machine learning, to deep learning, to computer vision, to natural language processing, and finally, to attention, LLMs unsupervised pretraining, supervised fine-tuning, RLHF, and the latest and greatest optimization algorithms helping us to create ever-more powerful AI tools (like GRPO!).

Along the way key milestones like Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Long-Short Term Memory Networks (LSTMs) that played key roles in leading up to “Attention is All You Need,” (2017) [Ref] and everything that has come next are important to understand, from a concepts and code perspective.

“We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.” ~ Attention is All You Need, 2017 [Ref]

What’s the common thread?

We’re optimizing these systems.

Join us for a new series where we’ll learn to build, ship, and share with statistical and optimization techniques, from mild to wild.

We will learn the foundations of LLMs.

Bring your questions and comments to join the discussion live!

📚 You’ll learn:

What to expect in this series!
Basic definitions of the concepts we’ll need for the journey.
An overview of our working roadmap and thread from optimization to LLMs.

🤓 Who should attend the event:

Aspiring AI Engineers and leaders who want to go deep and understand the origins and history of LLMs.

Speakers:

Dr. Greg” Loughnane is the Co-Founder & CEO of AI Makerspace, where he is an instructor for their AI Engineering Bootcamp. Since 2021, he has built and led industry-leading Machine Learning education programs. Previously, he worked as an AI product manager, a university professor teaching AI, an AI consultant and startup advisor, and an ML researcher. He loves trail running and is based in Dayton, Ohio.
Chris “The Wiz” Alexiuk is the Co-Founder & CTO at AI Makerspace, where he is an instructor for their AI Engineering Bootcamp. During the day, he is also a Developer Advocate at NVIDIA. Previously, he was a Founding Machine Learning Engineer, Data Scientist, and ML curriculum developer and instructor. He’s a YouTube content creator YouTube who’s motto is “Build, build, build!” He loves Dungeons & Dragons and is based in Toronto, Canada.

Follow AI Makerspace on LinkedIn and YouTube to stay updated about workshops, new courses, and corporate training opportunities.

Presented by

Public AIM Events!

Hosted By

109 Went