

Benchmarking AGI: Measuring Intelligence vs. Memorization with Greg Kamradt
Most AI benchmarks today measure skill—how well a model performs a predefined task. But does that really test intelligence? Intelligence isn’t just about performing a task—it’s about acquiring new skills efficiently.
In this episode, we explore the messy world of AI evaluation and how researchers are trying to measure generalization, not just memorization. One major effort in this space is ARC-AGI, the Abstraction and Reasoning Corpus, designed by François Chollet to test AI’s ability to reason and learn. It’s easy for humans, yet incredibly difficult for AI. Now, with a new ARC Prize competition launching on March 24 this year, researchers are pushing the limits of AI systems in novel ways.
In this livestream Greg Kamradt, President of the ARC Prize Foundation, joins Hugo Bowne-Anderson to discuss:
🔍 What makes a good AI benchmark? The challenges of evaluating intelligence.
🏆 The ARC Prize competition—what’s changing, what’s at stake, how the launch went, and what we learnt!
⚖️ Benchmarking behind the scenes—what goes into running large-scale AI evaluations.
📊 The evolving landscape of AI evaluation—agents, retrieval, and where benchmarks fall short.
🤔 Does ARC-AGI actually measure intelligence? Debating its role in AGI progress.
How the ARC Prize launch went and what we’ve learned since!
We'll break down the philosophy, challenges, and real-world implications of AI benchmarking—plus what the ARC team has learned from running a global competition.