

SISL Seminar - Matthijs Spaan (Delft University of Technology)
Hi all,
Prof. Matthijs Spaan (TU Delft) will be visiting and delivering a lecture this Thursday, 04/24, from 4-5pm PDT in Durand 023.
Please feel free to join!
On behalf of the Stanford Intelligent Systems Lab (SISL),
Mansur Arief, Duncan Eddy, Mykel Kochenderfer
Speaker: Prof. dr. Matthijs Spaan (Delft University of Technology)
Title: Exploiting Epistemic Uncertainty for Deep Exploration in Reinforcement Learning
Time: Thursday, 04/24 4-5pm PDT
Location: Durand 023 (In-person)
Abstract: Reinforcement Learning (RL) allows an autonomous agent to optimize its decision making based on data it gathers while exploring its environment. Given limited and possibly inaccurate data, the agent is uncertain regarding its state of knowledge, which is referred to as epistemic uncertainty. Estimates of such epistemic uncertainty can guide an agent's decision making, notably where to focus its exploration of its environment. The principled embedding of epistemic uncertainty in present-data reinforcement learning is an important open problem.
In this talk, I will present recent work on exploiting epistemic uncertainty estimates in hard-exploration problems. First, our approach called Sequential Monte-Carlo for Deep Q-Learning studies uncertainty quantification for the value function in a model-free RL algorithm by training an ensemble of models to resemble the Bayesian posterior. Second, our Projection-Ensemble DQN algorithm focuses on the distributional RL setting and increases the diversity of an ensemble of distributional value functions by employing different projections of value distributions for different ensemble members. Third, our Epistemic Monte Carlo Tree Search methodology incorporates epistemic uncertainty into model-based RL by estimating the epistemic uncertainty associated with predictions at every node in the MCTS planning tree.
We demonstrate our algorithms on a variety of hard-exploration benchmarks, showing that they succeed in outperforming state-of-the-art baselines and highlighting how exploiting epistemic uncertainty brings about these improvements.
Bio: Matthijs Spaan is a Professor of Reliable AI Algorithms, co-directing the Sequential Decision Making group in the department of Intelligent Systems, Delft University of Technology, Delft, The Netherlands. His recent research focus has been on developing reinforcement-learning algorithms for safe and robust decisions, such that we can make AI more reliable.