Cover Image for Cohere’s resonance paper on how AI companies game the LM Arena leaderboard
Cover Image for Cohere’s resonance paper on how AI companies game the LM Arena leaderboard
Avatar for BuzzRobot
Presented by
BuzzRobot
AI research discussions
Hosted By
87 Going

Cohere’s resonance paper on how AI companies game the LM Arena leaderboard

Zoom
Registration
Welcome! To join the event, please register below.
About Event

Do you remember that Cohere recently released a paper (The Leaderboard Illusion) claiming that the chatbot ranking platform, LM Arena, is being gamed by AI labs to improve their LLM rankings? We've invited the authors of this work to join us and discuss the technical details behind their findings.

Marzieh Fadaee, a Staff Research Scientist, and Shivalika Singh, an Open Science Research Engineer at Cohere Labs, will present The Leaderboard Illusion to the BuzzRobot community.

Chatbot Arena has become a cornerstone for evaluating and ranking AI models. In this talk, our guests will highlight systemic biases that undermine its credibility as a benchmark. The discussion will raise important questions about fairness, transparency, and the future of LLM evaluation, offering actionable recommendations to reform Chatbot Arena and promote more equitable benchmarking.

Join the BuzzRobot Slack to stay connected with the community.

Avatar for BuzzRobot
Presented by
BuzzRobot
AI research discussions
Hosted By
87 Going