Cover Image for Google DeepMind's Approach to AGI Safety & Security
Cover Image for Google DeepMind's Approach to AGI Safety & Security
Avatar for BuzzRobot
Presented by
BuzzRobot
AI research discussions
Hosted By
261 Went

Google DeepMind's Approach to AGI Safety & Security

Zoom
Registration
Past Event
Welcome! To join the event, please register below.
About Event

This time the BuzzRobot guest is Rohin Shah, who leads AGI safety and security research at Google DeepMind.

He'll share about a technical approach his team has developed to increase the probability of safe AGI.

There are four areas of risk: misuse, misalignment, mistakes, and structural risks. Of these, Rohin's team focuses on technical approaches to misuse and misalignment.

For misuse, their strategy aims to prevent threat actors from accessing dangerous capabilities by proactively identifying dangerous capabilities and implementing robust security, access restrictions, monitoring, and model safety mitigations.

To address misalignment, the team outlines two lines of defense. First, model-level mitigations such as amplified oversight and robust training can help to build an aligned model. Second, system-level security measures such as monitoring and access control can mitigate harm even if the model is misaligned. Techniques from interpretability, uncertainty estimation, and safer design patterns can enhance the effectiveness of these mitigations.

Finally, our guest will briefly outline how these ingredients could be combined to produce safety cases for AGI systems.

Join BuzzRobot Slack to connect with the community

Avatar for BuzzRobot
Presented by
BuzzRobot
AI research discussions
Hosted By
261 Went