

CAIA Speaker Series: Alex Turner
Who: Alex Turner, Research scientist at Google DeepMind
When: April 4th at 5-6 pm PT
Where: ANB 121
What: Your AI’s training data might make it more “evil” and more able to circumvent your security, monitoring, and control measures. Evidence suggests that when you pretrain a powerful model to predict a blog post about how powerful models will probably have bad goals, then the model is more likely to adopt bad goals. I discuss ways to test for and mitigate these potential mechanisms. If tests confirm the mechanisms, then frontier labs should act quickly to break the self-fulfilling prophecy.
No specific technical background is required - we welcome all interested students who are eager to learn! As with all CAIA events, we will have pizza and boba!