LLM Evaluations Workshop - Replicating an Anthropic Paper

AI Safety Awareness Project

Google Meet

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

This is a remote workshop where we’ll be introducing people to the basics of LLM evaluations!

Come to learn:

A deeper dive into LLM prompt engineering
How to interact directly with LLM providers’ APIs
How to design and implement your own evaluations of LLMs
How to measure whether and how quickly LLMs are getting dangerous

This workshop is meant for people who have Python programming experience. We do not require AI research expertise or prior experience with AI model providers' APIs, but we do recommend having some experience as an end user with ChatGPT or other similar models.

The workshop will culminate in replicating Anthropic’s “Alignment Faking in LLMs” paper, where we’ll go over to what extent modern AI systems can figure out that they are in a training environment and actively modify their behavior to manipulate the training process against the wishes of human trainers.

Presented by

AI Safety Awareness Project

Hosted By

17 Went

AI