Hosted By
61 Going
The WMDP Benchmark: Measuring and Reducing Malicious Use with Unlearning
Hosted by BuzzRobot
Registration
Past Event
About Event
In this talk, our guest Alex Pan from UC Berkeley will describe the WMDP Benchmark, a 3,668-question dataset designed to measure whether LLMs (Large Language Models) could help malicious actors develop bio, cyber, and chemical weapons.
WMDP serves both as a proxy evaluation for hazardous knowledge in LLMs and as a benchmark for unlearning methods to remove such knowledge.
He will also cover RMU, the state-of-the-art unlearning method introduced to reduce the hazardous knowledge of LLMs on WMDP.
Join the BuzzRobot community on Slack
Subscribe to BuzzRobot YouTube channel
Hosted By
61 Going