BCV AI Research Roundtable #1 with Neil Chowdhury
Join us for the first of a new series, BCV AI Research Roundtables! We will be bringing together research-minded folks to read and discuss a paper, and have dedicated Q&A time with one of the paper authors.
We will dedicate 20-30 minutes at the beginning of the event to skim the paper in silence, but please feel free to dive in prior to the event if you’d like a more thorough read!
We're excited to host you at the BCV San Francisco Office. Dinner will be provided.
Paper: MLE-Bench: Evaluating Machine Learning Agents on Machine Learning Engineering (OpenAI’s paper blog post here)
What is it about: This paper evaluates how well AI agents perform on 75 ML engineer related kaggle competitions using different frontier models and open sourced agent scaffoldings. It also examines eval results given different forms of resource scaling and potential pre-training data contamination.
Why it matters: AI Agents that can autonomously solve machine learning related challenges can lead to an exponential acceleration in scientific progress and LLM agent capabilities. MLE-Bench can be used to evaluate model autonomy in order to deploy advancements in accordance with different AI safety frameworks like the OpenAI Preparedness Framework.
Meet The Paper Author
Meet Neil Chowdhury, who was previously a member of technical staff at OpenAI’s preparedness team and a researcher at MIT CSAIL. Neil is currently a founding member of Transluce, an independent non-profit research lab. Check out some of his recent work on automated capability elicitation of language models here.