

Nived's dissertation talk
Come listen to my dissertation talk!
It'll be at 2:00pm PT at 510 Soda Hall, and there will be snacks. Whether I pass or fail, I guarantee that you will be entertained.
My talk will also be cast on Zoom, so feel free to join there too! (https://berkeley.zoom.us/j/99844910979?pwd=zmsT42wuztnYQObVTd0KLiGSWbb7SQ.1&jst=3).
Parking details
If you have visited Berkeley before, you might know that parking around campus can be awful. There are two options:
1. Park right behind Soda hall. [maps]
This is 2 hour free parking, but first-come first-serve and usually hard to find. That doesn't mean you shouldn't try.
2. One of the nearby garages. [garage 1] [garage 2] [garage 3]
These options are usually available most of the time.
And now for the boring details,
Title: New data-centric frameworks for learning in changing environments
Talk Abstract:
As machine learning systems grow increasingly general-purpose and data-centric, there is a pressing need to develop approaches which mitigate the significant cost of collecting high quality data. This challenge is exacerbated when agents are deployed in settings involving sequential decision making. In such changing environments, unseen situations are encountered frequently and undesirable behavior can be catastrophic.
A two-stage pipeline (1. pre-training a base model from large offline datasets, followed by 2. post-training on smaller datasets) has emerged as one of the most effective ways to train performant agents. But how do we carry out pre-training and fine-tuning efficiently and robustly, when access to high-quality datasets and compute requirements form a major bottleneck?
In this talk, I will discuss new approaches for this problem, which build upon insights derived from principled mathematical frameworks. I will present,
(i) "Pre-training": A statistical framework for Imitation Learning (IL). As a consequence, we develop new algorithms which are provably optimal, and based on a principled notion of dataset augmentation.
(ii) "Post-training": A finer grained understanding of whether LLM fine-tuning should use imitation learning (popularly known as supervised finetuning) or RL-based methods which optimize policies by learning a verifier / reward model.
I will conclude with a discussion of future research directions and the longer-term goal of exploring the interplay of RL approaches with LLMs.