Cover Image for TAI AAI #08 - Offline Reinforcement Learning
Cover Image for TAI AAI #08 - Offline Reinforcement Learning
Avatar for Tokyo AI (TAI)
Presented by
Tokyo AI (TAI)

TAI AAI #08 - Offline Reinforcement Learning

Register to See Address
Shinagawa City, Tokyo
Registration
Approval Required
Your registration is subject to approval by the host.
Welcome! To join the event, please register below.
About Event

This Tokyo AI (TAI) Advanced AI (AAI) group session will feature speakers on Offline Reinforcement Learning.

Note: each approved attendee will need to register for an additional QR code to access the venue (you will receive the link later).

​Schedule

18:30 - 19:00 Doors open
19:00 - 19:10 Introduction
19:10 - 19:40 Introduction to Offline Reinforcement Learning (Takuma Seno)
19:40 - 20:10 Offline Reinforcement Learning from Datasets with Nonstationarity (Johannes Ackermann)
20:10 - 20:40 Studying Sample Efficiency in Deep RL Through Better Evaluation Methods and Data Pruning (Shivakanth Sujit)
20:40 - 21:30 Networking

Speakers

Takuma Seno (https://takuseno.github.io/)

Title: Introduction to Offline Reinforcement Learning

Abstract: Offline reinforcement learning (RL) is a paradigm where an RL agent is optimized exclusively with static datasets, which doesn't require any online interaction with an environment. This paradigm unlocks possibilities of new RL applications that were previously difficult to implement with online RL. In this talk, I will cover popular offline RL algorithms and software tools for practitioners.

Bio: Takuma Seno is a Senior Research Scientist at Sony AI, Tokyo Laboratory in Japan. He is working on deep reinforcement learning research for AI agents in Gran Turismo, called Gran Turismo Sophy. He received a Ph.D from Keio University, Japan. He is the author of an offline reinforcement learning library, d3rlpy. He has received several awards for his work, including the Mitou Super Creator 2020 from the Information Promotion Agency (IPA), Japan, and the Outstanding Paper Award on applications of RL at the Reinforcement Learning Conference (RLC) 2024.

Johannes Ackermann (https://johannesack.github.io)

Title: Offline Reinforcement Learning from Datasets with Nonstationarity

Abstract: In Offline RL we aim to learn a policy from a dataset that we have previously connected. The environment is usually assumed to be unchanging throughout the data collection, however, this assumption won't hold when collecting a dataset in practice over a longer timeframe. We thus address a problem setting in which, while collecting the dataset, the transition and reward functions gradually change between episodes but stay constant within each episode. We show that existing methods fail in this setting and propose a method based on contrastive predicting coding that addresses the shortcomings of previous methods.
RLC 2024, https://arxiv.org/abs/2405.14114

Bio: Johannes Ackermann is a PhD student at the University of Tokyo, supervised by Professor Masashi Sugiyama. His research focuses on Reinforcement Learning with changing or complicated transition dynamics and reward functions.

Shivakanth Sujit (https://shivakanthsujit.github.io/)

Title: Studying Sample Efficiency in Deep RL Through Better Evaluation Methods and Data Pruning

Abstract: Reinforcement learning (RL) has shown great promise with algorithms learning in environments with large state and action spaces purely from scalar reward signals. A crucial challenge for current deep RL algorithms is that they require a tremendous amount of environment interactions for learning. This can be infeasible in situations where such interactions are expensive, such as in robotics. In this talk, I want to present previous work on sample efficiency, through the lens of better evaluation methods as well as from the algorithmic perspective. For the former, I present an approach for evaluating offline RL methods as a function of data rather than the traditional method of basing it on compute or gradient steps. This approach reveals interesting insights about current offline methods into the data efficiency of the learning process and the robustness of algorithms to distribution changes in the dataset while also answering how much we actually learn from existing benchmarks. Next, I will talk about modifications to existing algorithms that can improve their sample efficiency. Off-policy RL methods use an experience replay buffer to store and sample data the agent has observed. However, simply assigning equal importance to each of the samples is a naive strategy. This work proposes a method to prioritize samples based on the loss reduction potential of a point, i.e. how much we can learn from a sample.

Bio: Shivakanth is a Senior Researcher at Araya. He received his M.Sc. in 2023 from Mila Quebec. He is interested in deep reinforcement learning for robotics and LLMs. Before joining Mila he completed his undergraduate at NIT Trichy, India in Control Engineering, and this background drives his research in combining the insights from control theory and RL for building agents that can safely interact in the real world.

Our Community

​​​​​Tokyo AI (TAI) is a community composed of people based in Tokyo and working with, studying, or investing in AI. We are engineers, product managers, entrepreneurs, academics, and investors intending to build a strong “AI coreˮ in Tokyo. Find more in our overview: https://bit.ly/tai_overview

Organizers

Kai Arulkumaran: Research Team Lead at Araya, working on brain-controlled robots as part of the JST Moonshot R&D program. Previously, he completed his PhD in Bioengineering at Imperial College London and had work experience at DeepMind, FAIR, Microsoft Research, Twitter Cortex, and NNAISENSE. His research areas are deep learning, reinforcement learning, evolutionary computation, and computational neuroscience.

​Craig Sherstan: Research Scientist at Sony AI Tokyo. His current research is on the application of RL to create AI opponents for the video game Gran Turismo. Previously, he completed his PhD in Reinforcement Learning at the University of Alberta, Canada as part of the Bionic Limbs for Improved Natural Control Lab. Craig has past experience working with human-computer interfaces, robotics, and various software industries.

​Ilya Kulyatin: Fintech and AI entrepreneur with work and academic experience in the US, Netherlands, Singapore, UK, and Japan, with an MSc in Machine Learning from UCL.

​Sponsor

​AWS is kindly hosting us for this session.

Location
Please register to see the exact location of this event.
Shinagawa City, Tokyo
Avatar for Tokyo AI (TAI)
Presented by
Tokyo AI (TAI)