Cover Image for The WMDP Benchmark: Measuring and Reducing Malicious Use with Unlearning
Cover Image for The WMDP Benchmark: Measuring and Reducing Malicious Use with Unlearning
Hosted By
61 Going

The WMDP Benchmark: Measuring and Reducing Malicious Use with Unlearning

Hosted by BuzzRobot
Zoom
Registration
Past Event
Welcome! To join the event, please register below.
About Event

In this talk, our guest Alex Pan from UC Berkeley will describe the WMDP Benchmark, a 3,668-question dataset designed to measure whether LLMs (Large Language Models) could help malicious actors develop bio, cyber, and chemical weapons.

WMDP serves both as a proxy evaluation for hazardous knowledge in LLMs and as a benchmark for unlearning methods to remove such knowledge.

He will also cover RMU, the state-of-the-art unlearning method introduced to reduce the hazardous knowledge of LLMs on WMDP.

​​Join the BuzzRobot community on Slack
Subscribe to BuzzRobot YouTube channel

The paper

Hosted By
61 Going