Personal

​​​​​​​🔬 AI4Science x AI Security on alphaXiv

https://stanford.zoom.us/j/92623849311?pwd=US3bnlcWEf6pKjjUPyYoLn5w3hS8Kj.1

Peiyao Shang presents her work on LiveCodeBench Pro, a continuously updated benchmark showing that despite recent claims, frontier LLMs still significantly lag behind human Olympiad medalists in competitive programming, achieving only 53% on medium problems and 0% on hard problems where experts excel.

​​Whether you’re working on the frontier of LLMs or just curious about anything AI4Science, we’d love to have you there.

How Do Olympiad Medalists Judge LLMs in Competitive Programming? w/ Peiyao Shang (Sentient.xyz)

Abhishek Srivastava

Weihe

Nihal Kurth

Richard Blythman