Ragas Paper Club #2

Ragas Community

Zoom

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Paper Club #2 – The Leaderboard Illusion

Large-language-model leaderboards look definitive—until you peek behind the curtain. The Leaderboard Illusion (Singh, Nan, et al.) reveals how undisclosed testing, cherry-picked “best runs,” and skewed sampling quietly tilt Chatbot Arena’s rankings, warping the very notion of “state-of-the-art.”

On July 24th @ 09:00 AM PT we’ll dig into:

Shadow runs & score retractions – how vendors trial dozens of versions, publish the single top score, then erase the rest.
Data-access asymmetry – why giants with privileged API taps get to fine-tune on >60 % of Arena battles, sidelining open-source peers.
Deprecations that break the graph – silent model removals that shatter Bradley-Terry comparisons and inflate win rates.
A roadmap to fairness – five concrete fixes (from banning retractions to open-sourcing the sampler) that could reboot LLM benchmarking integrity.

Speakers
• Shahul – Founder, Ragas
• Mike – Product Leader, Ex-NodeSource

20 min walkthrough → 15 min live Q&A.
Live and Free, on Zoom.

Slides, notes, and code links shared afterward.

See you in the chat!

🔗 Paper: https://arxiv.org/abs/2504.20879

Presented by

Ragas Community

Events for and by the community ❤️

Hosted By

121 Went

IA