

CAIA Speaker Event: Jerry Wei (Anthropic)
Here are some details on Caltech AI Alignment’s next speaker event:
Who: Jerry Wei (in person), Anthropic
When: June 3rd at 4-5 pm PT
Where: Broad 100
What: Jerry Wei is an AI researcher at Anthropic (formerly Google DeepMind) who works on improving language model capabilities and alignment. His talk will focus on his work on "Constitutional Classifiers" - machine learning systems that detect and block "jailbreak" attempts where users try to bypass safety training to get harmful outputs. These classifiers prove significantly more robust against manipulation than the language models they protect, withstanding thousands of hours of human jailbreaking attempts.
No specific technical background is required - we welcome all interested students who are eager to learn! As with all CAIA events, we will have pizza and boba!