


AI Safety: Can incomplete preferences keep artificial agents shutdownable?
Speaker Bio:
Elliott Thornley is a Research Fellow at Oxford University. He uses ideas from decision theory to design and train safer artificial agents.
Session Summary:
In this event, Elliot will explain the shutdown problem: the problem of ensuring that advanced artificial agents never resist shutdown. Elliot will then propose a solution: we train agents to have incomplete preferences. Specifically, he proposes that we train agents to lack a preference between every pair of different-length trajectories. He will suggest a method for training such agents using reinforcement learning, and present experimental evidence in favour of the method. He will explain how work on the shutdown problem fits into a larger project called ‘constructive decision theory’: using ideas from decision theory to design and train artificial agents.