Cover Image for Ragas Paper Club #2
Cover Image for Ragas Paper Club #2
Avatar for Ragas Community
Presented by
Ragas Community
Events for and by the community ❤️
115 Going

Ragas Paper Club #2

Zoom
Registration
Approval Required
Your registration is subject to approval by the host.
Welcome! To join the event, please register below.
About Event

Paper Club #2 – The Leaderboard Illusion

Large-language-model leaderboards look definitive—until you peek behind the curtain. The Leaderboard Illusion (Singh, Nan, et al.) reveals how undisclosed testing, cherry-picked “best runs,” and skewed sampling quietly tilt Chatbot Arena’s rankings, warping the very notion of “state-of-the-art.”

On July 24th @ 09:00 AM PT we’ll dig into:

  • Shadow runs & score retractions – how vendors trial dozens of versions, publish the single top score, then erase the rest.

  • Data-access asymmetry – why giants with privileged API taps get to fine-tune on >60 % of Arena battles, sidelining open-source peers.

  • Deprecations that break the graph – silent model removals that shatter Bradley-Terry comparisons and inflate win rates.

  • A roadmap to fairness – five concrete fixes (from banning retractions to open-sourcing the sampler) that could reboot LLM benchmarking integrity.

Speakers
Shahul – Founder, Ragas
Mike – Product Leader, Ex-NodeSource

20 min walkthrough → 15 min live Q&A.
Live and Free, on Zoom.

Slides, notes, and code links shared afterward.

See you in the chat!

🔗 Paper: https://arxiv.org/abs/2504.20879

Avatar for Ragas Community
Presented by
Ragas Community
Events for and by the community ❤️
115 Going