
Build Your Own GPT-2
A hands-on course to build GPT-2 from scratch!
This is an all-day workshop - in order to get everything into one day, it's going to be intense.
This workshop is for people who:
Are very comfortable writing and debugging Python.
Have some familiarity with PyTorch or Numpy (we will be using PyTorch).
Some ML background is nice, but is not required.
It's sufficient to know that a neural-network is approximately "a bunch of matrix multiplications with non-linear activation functions in between".
Have ~8 hours to spare before the workshop on the prerequisites (may be less, depending on background).
If this isn't you, you'll likely struggle and not enjoy the day.
Learning Objectives
Understand what a transformer is, and how it is used
Learn what tokenization is on a high level
Understand the causal attention mechanism in transformers, and how to construct it by hand.
Understand what logits are, and how to use them to derive a probability distribution over the vocabulary.
After today, you will understand what exactly happens when you interact with large language models like GPT-2. We hide nothing*, no magic!
*Due to time constraints we do not implement the tokenizer. Andrej Karpathy has videos on this if you're curious, but we will black-box this (and only this) for today. It's just a look-up table from "words" (tokens) to numbers, so it's not that interesting, and isn't core to understanding how the transformer works.
About the teacher
David Quarel is a PhD student at the Australian National University, (ANU) Canberra, Australia, working on AI safety at the London Initiative for Safe AI (LISA), as well as a teaching assistant for ARENA. David has years of experience as a teacher, developing content both for courses at the ANU and for ARENA. He recently co-authored a new textbook on Universal Artificial Intelligence.
Ticket price covers breakfast, lunch, dinner, and a contribution toward FutureHouse.uk, our venue. This is a not-for-profit event.
Before the event:
Depending on your background, you may be able to skip bits and pieces. We estimate there would be about ~8 hours of content if you were to do all of it. Prioritise the Einops exercises, as you will not be able to build GPT-2 without it.
Do one of the following:
a) Colab: Create a Google Colab account.
b) Local:
There are installation instructions for either running locally or using cloud compute available at https://arena3-chapter0-fundamentals.streamlit.app/
For today's material, you won't need to rent a GPU.Please PLEASE make sure you've got your python environment set-up ahead of time if you choose this option.
By default, we recommend Colab as it's less hassle to setup.
Watch the following 3Blue1Brown videos.
Linear Algebra
Deep Learning
Chapter 6: Attention in transformers
Don't worry if you struggle with this video, understanding and constructing the attention mechanism is the main goal of the day.
Don't worry about backpropagation, we will not be training any models. It's enough to understand the goal the optimiser has, and that it somehow adjusts the weights to minimise the loss function.
Skim the background material. This material is for the entire ARENA course and you won't need all of it, only
Neural Networks (covered by the above videos)
Linear Algebra (covered by the above video)
Attempt the prereq exercises under "Einops, Einsum & tensors"
"Einsum Is All You Need" provides a good intro to the einsum library.
Don't worry if you can't get through all the exercises! Getting a feel for
reduce
,reshape
andeinsum
is the main objective here.
Agenda
0930 - Doors Open & Breakfast
1000 - Starting bang on - please don't be late
1230 - Lunch
1830 - Dinner
Join the WhatsApp group to ask questions about the event and pre-req material.
Slides for tomorrow: https://docs.google.com/presentation/d/1umuZLA4ZunbfLMfJTyGYb65Fr824PcFtsH41GtsKVGY/edit?usp=sharing
Material for tomorrow: https://arena3-chapter1-transformer-interp.streamlit.app/[1.1]_Transformer_from_Scratch