BuzzRobot

In this talk, our guest Alex Pan from UC Berkeley will describe the WMDP Benchmark, a 3,668-question dataset designed to measure whether LLMs (Large Language Models) could help malicious actors develop bio, cyber, and chemical weapons. 

WMDP serves both as a proxy evaluation for hazardous knowledge in LLMs and as a benchmark for unlearning methods to remove such knowledge. 

He will also cover RMU, the state-of-the-art unlearning method introduced to reduce the hazardous knowledge of LLMs on WMDP.

​​Join the BuzzRobot community on Slack

The WMDP Benchmark: Measuring and Reducing Malicious Use with Unlearning

Bill Chen

Chetan  Tonde