AI Safety Workshop
The Recurse Center is hosting a workshop in collaboration with the AI Safety Awareness Foundation.
This is your chance to learn deeply about AI. We’ll begin with an introduction to the state of AI today, probable developments on the horizon, and safety issues that said developments may cause.
Then, we will split participants by background and interest into two technical tracks:
Track A is for people comfortable writing Python but who don't know too much about AI: We will start with an introduction to neural nets and build our own vanilla networks. Then, we will analyze a malicious agent trained via reinforcement learning. Specifically, one that displays malicious goal generalization: benign during training but malicious during production.
Track B is for people with experience building and training LLMs: We will build and train an LLM (GPT-2) while analyzing how transformers perform induction. We’ll focus on mechanistic interpretability as analyzed in Anthropic's 2021 paper: A Mathematical Framework for Transformer Circuits.
This will be a day-long event as the technical content is fairly time-intensive.
Food will be provided.
While we recommend arriving at 10:30 a.m. EST for the full experience, we will also have a quick intro session in the afternoon to catch latecomers up.
You will need to bring a computer to participate!