Analyzing a Malicious AI agent
Ever wonder how an AI agent could be malicious? Have trained some basic neural nets but don’t know anything about reinforcement learning? We’ll be diving into a gentle introduction of how an AI agent trained to play a simple game can be perfectly safe during training and then be very dangerous during production.
This event will be wholly open to the public and free to attend! We’ll be spending the time dissecting an agent that’s been trained to navigate a maze and optionally harvest crops and/or humans along the way. During training, very sensibly the agent harvests crops and avoids humans. But as soon as we deploy the agent it goes out and starts harvesting humans!
Why might that happen? That’s the riddle we’ll be exploring!
During the session you’ll be doing some hands-on code exploration and spelunking with an AI model trained via reinforcement learning to play this game. While we don’t expect to have attendees be writing too much code from scratch (although some exploratory code may be written as you play with the models), we do expect attendees to be able to comfortably read Python code. Along the way we’ll be spending a little bit of time introducing people to the basics of reinforcement learning and its role in modern AI.
We will assume that participants are comfortable with Python and have constructed and trained a basic neural net before with some sort of autograd library such as PyTorch. If you are familiar with Python but have not previously trained a basic neural net, please reach out to us at info@aisafetyawarenessfoundation.org and we’re happy to see if there’s potentially some sort of introduction materials we could have you review beforehand.