Raising the Bar: the Evolution of AI Evaluation

Hosted by Outlier AI

Zoom

Registration Closed

This event is not currently taking registrations. You may contact the host or subscribe to receive updates.

About Event

🚀 Raising the Bar: The Evolution of AI Evaluation

Smarter models need smarter tests.

AI is advancing at breakneck speed—but how do we actually know it’s improving?

We’ve all seen it—models acing benchmarks but falling short on real-world tasks. The gap between what we measure and what matters is getting harder to ignore.

We’ll talk about:

✅ Why evaluation methods shape how AI systems are built
🧩 The limitations of traditional benchmarks—and what’s replacing them
⚙️ How Humanity's Last Exam (HLE) tests challenging, real-world reasoning
👩‍🔬 Why PhD researchers and domain experts are key to the future of AI development

🎙 About the Speaker

Alex Fabbri is a Senior ML Research Scientist at Scale AI, focused on coding data and evaluation methods. He earned his PhD from Yale University, and previously worked at Salesforce.

His research spans multilingual reasoning, code generation, and summarization, with papers at top NLP conferences. He serves as Senior Area Chair for Summarization at ACL 2025 and has been an Area Chair for ACL Rolling Review since 2021.

Explore more at alex-fabbri.github.io or on Google Scholar.

Who Should Attend:

This session is for:

AI/ML researchers and engineers building next-gen systems
PhD students and domain experts looking to shape real-world AI applications
Product leaders and technical teams who rely on AI model performance
Anyone curious about where AI progress is heading—and how we’ll measure it

If you’re building, studying, or just keeping pace with AI’s evolution, this one’s for you.

Hosted By

AI

Raising the Bar: the Evolution of AI Evaluation

​🚀 Raising the Bar: The Evolution of AI Evaluation

​We’ll talk about:

​🎙 About the Speaker

​Who Should Attend:

🚀 Raising the Bar: The Evolution of AI Evaluation

We’ll talk about:

🎙 About the Speaker

Who Should Attend: