How Do Olympiad Medalists Judge LLMs in Competitive Programming? w/ Peiyao Shang (Sentient.xyz)
About Event
🔬 AI4Science x AI Security on alphaXiv
🗓 Friday August 15th 2025 · 10AM PT
🎙 Featuring Peiyao Shang
💬 Casual Talk + Open Discussion
🎥 Zoom: https://stanford.zoom.us/j/92623849311?pwd=US3bnlcWEf6pKjjUPyYoLn5w3hS8Kj.1
Peiyao Shang presents her work on LiveCodeBench Pro, a continuously updated benchmark showing that despite recent claims, frontier LLMs still significantly lag behind human Olympiad medalists in competitive programming, achieving only 53% on medium problems and 0% on hard problems where experts excel.
Whether you’re working on the frontier of LLMs or just curious about anything AI4Science, we’d love to have you there.
Hosted by: alphaXiv x Intology
AI4Science: join the community
AI Security: join the community