Large Reasoning Models: The Illusion of Thinking

Public AIM Events!

YouTube

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Are Large Reasoning Models (LRMs) truly reasoning—or are we just misinterpreting what they’re doing?

As LLMs evolve into LRMs, the AI community is asking harder questions about whether models can truly “think.” Two influential 2025 papers offer starkly different answers—and we’re bringing them into direct conversation.

The Illusion of Thinking (Shojaee et al.)

This paper investigates how LRMs perform on structured reasoning tasks like Tower of Hanoi and River Crossing. Using controllable puzzle environments, the authors map a “three-phase” performance regime:

Standard LLMs outperform LRMs on easy tasks.
LRMs gain advantage on medium complexity.
But at high complexity, both types of models completely collapse, failing to solve tasks even with sufficient token budgets.

They dive deep into reasoning traces and observe a disturbing pattern: LRMs use less reasoning effort as problems become harder, contradicting what we'd expect from a scalable reasoning system.

The Illusion of the Illusion of Thinking (C. Opus et al.)

This response paper pushes back hard. The authors argue the observed collapse is an artifact of how the experiments were designed, not a flaw in the models themselves:

In Tower of Hanoi, models fail because token limits are hit—not because they can't reason.
In River Crossing, many “failures” are due to unsolvable puzzle instances that no model or algorithm could solve.
When output constraints are addressed (e.g., asking for generating functions instead of verbose move lists), accuracy dramatically improves—even on problems previously called “total failures.”

In short, the collapse may be an illusion of an illusion.

🔍 What You’ll Learn

How LLMs and LRMs behave across different complexity regimes
Why evaluating reasoning is so hard—and how evaluation design choices can mislead us
Techniques for probing internal reasoning traces, not just final answers
How output format, token limits, and benchmark design can radically change conclusions
Open questions: Are we hitting reasoning limits—or context window limits?

👩‍💻 Who Should Attend

AI researchers and engineers building or evaluating LLMs and LRMs
ML practitioners designing evaluation frameworks for reasoning
Product leads curious about where and when LRMs add real value
Anyone interested in the future of cognitive capabilities in generative models

Speakers:

Dr. Greg” Loughnane is the Co-Founder & CEO of AI Makerspace, where he is an instructor for their AI Engineering Bootcamp. Since 2021, he has built and led industry-leading Machine Learning education programs. Previously, he worked as an AI product manager, a university professor teaching AI, an AI consultant and startup advisor, and an ML researcher. He loves trail running and is based in Dayton, Ohio.
Chris “The Wiz” Alexiuk is the Co-Founder & CTO at AI Makerspace, where he is an instructor for their AI Engineering Bootcamp. During the day, he is also a Developer Advocate at NVIDIA. Previously, he was a Founding Machine Learning Engineer, Data Scientist, and ML curriculum developer and instructor. He’s a YouTube content creator YouTube who’s motto is “Build, build, build!” He loves Dungeons & Dragons and is based in Toronto, Canada.

Follow AI Makerspace on LinkedIn and YouTube to stay updated about workshops, new courses, and corporate training opportunities.

Presented by

Public AIM Events!

Hosted By

34 Went