LLM Evaluations Workshop - Replicating an Anthropic Paper
This is a remote workshop where we’ll be introducing people to the basics of LLM evaluations!
Come to learn:
A deeper dive into LLM prompt engineering
How to interact directly with LLM providers’ APIs
How to design and implement your own evaluations of LLMs
How to measure whether and how quickly LLMs are getting dangerous
This workshop is meant for people who have Python programming experience. We do not require AI research expertise or prior experience with AI model providers' APIs, but we do recommend having some experience as an end user with ChatGPT or other similar models.
The workshop will culminate in replicating Anthropic’s “Alignment Faking in LLMs” paper, where we’ll go over to what extent modern AI systems can figure out that they are in a training environment and actively modify their behavior to manipulate the training process against the wishes of human trainers.