Cohere’s resonance paper on how AI companies game the LM Arena leaderboard
Do you remember that Cohere recently released a paper (The Leaderboard Illusion) claiming that the chatbot ranking platform, LM Arena, is being gamed by AI labs to improve their LLM rankings? We've invited the authors of this work to join us and discuss the technical details behind their findings.
Marzieh Fadaee, a Staff Research Scientist, and Shivalika Singh, an Open Science Research Engineer at Cohere Labs, will present The Leaderboard Illusion to the BuzzRobot community.
Chatbot Arena has become a cornerstone for evaluating and ranking AI models. In this talk, our guests will highlight systemic biases that undermine its credibility as a benchmark. The discussion will raise important questions about fairness, transparency, and the future of LLM evaluation, offering actionable recommendations to reform Chatbot Arena and promote more equitable benchmarking.
Join the BuzzRobot Slack to stay connected with the community.