Cover Image for AI Safety: Can incomplete preferences keep artificial agents shutdownable?
Cover Image for AI Safety: Can incomplete preferences keep artificial agents shutdownable?
38 Went

AI Safety: Can incomplete preferences keep artificial agents shutdownable?

Registration
Past Event
Welcome! To join the event, please register below.
About Event

Speaker Bio:

Elliott Thornley is a Research Fellow at Oxford University. He uses ideas from decision theory to design and train safer artificial agents.

Session Summary:

In this event, Elliot will explain the shutdown problem: the problem of ensuring that advanced artificial agents never resist shutdown. Elliot will then propose a solution: we train agents to have incomplete preferences. Specifically, he proposes that we train agents to lack a preference between every pair of different-length trajectories. He will suggest a method for training such agents using reinforcement learning, and present experimental evidence in favour of the method. He will explain how work on the shutdown problem fits into a larger project called ‘constructive decision theory’: using ideas from decision theory to design and train artificial agents.

Location
22 Cross St
Singapore 048421
38 Went