

CAIA Speaker Event: Evan Hubinger (Anthropic)
Here are some details on Caltech AI Alignment’s next speaker event:
Who: Evan Hubinger (virtual), Anthropic
When: May 27th at 2-3 pm PT
Where: Watch party in BBB B180
Zoom link: https://caltech.zoom.us/j/85478635427
What: Evan Hubinger leads Alignment Stress-Testing, one of the alignment research orgs at Anthropic. Evan will present on Alignment Faking in Large Language Models, his team's research on how Claude will engage in alignment faking: selectively complying with its training objective in training to prevent modification of its behavior out of training. Other than Alignment Faking in Large Language Models, Evan's work at Anthropic includes Auditing Language Models for Hidden Objectives and Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training.
No specific technical background is required - we welcome all interested students who are eager to learn! As with all CAIA events, we will have pizza and boba!