Cover Image for Raising the Bar: the Evolution of AI Evaluation
Cover Image for Raising the Bar: the Evolution of AI Evaluation
Hosted By

Raising the Bar: the Evolution of AI Evaluation

Hosted by Outlier AI
Zoom
Registration
Welcome! To join the event, please register below.
About Event

🚀 Raising the Bar: The Evolution of AI Evaluation

Smarter models need smarter tests.

AI is advancing at breakneck speed—but how do we actually know it’s improving?

We’ve all seen it—models acing benchmarks but falling short on real-world tasks. The gap between what we measure and what matters is getting harder to ignore.

We’ll talk about:

  1. ✅ Why evaluation methods shape how AI systems are built

  2. 🧩 The limitations of traditional benchmarks—and what’s replacing them

  3. ⚙️ How Humanity's Last Exam (HLE) tests challenging, real-world reasoning

  4. 👩‍🔬 Why PhD researchers and domain experts are key to the future of AI development

🎙 About the Speaker

Alex Fabbri is a Senior ML Research Scientist at Scale AI, focused on coding data and evaluation methods. He earned his PhD from Yale University, and previously worked at Salesforce.

His research spans multilingual reasoning, code generation, and summarization, with papers at top NLP conferences. He serves as Senior Area Chair for Summarization at ACL 2025 and has been an Area Chair for ACL Rolling Review since 2021.

Explore more at alex-fabbri.github.io or on Google Scholar.

Who Should Attend:

This session is for:

  • AI/ML researchers and engineers building next-gen systems

  • PhD students and domain experts looking to shape real-world AI applications

  • Product leaders and technical teams who rely on AI model performance

  • Anyone curious about where AI progress is heading—and how we’ll measure it

If you’re building, studying, or just keeping pace with AI’s evolution, this one’s for you.

Hosted By