Google DeepMind's Approach to AGI Safety & Security
This time the BuzzRobot guest is Rohin Shah, who leads AGI safety and security research at Google DeepMind.
He'll share about a technical approach his team has developed to increase the probability of safe AGI.
There are four areas of risk: misuse, misalignment, mistakes, and structural risks. Of these, Rohin's team focuses on technical approaches to misuse and misalignment.
For misuse, their strategy aims to prevent threat actors from accessing dangerous capabilities by proactively identifying dangerous capabilities and implementing robust security, access restrictions, monitoring, and model safety mitigations.
To address misalignment, the team outlines two lines of defense. First, model-level mitigations such as amplified oversight and robust training can help to build an aligned model. Second, system-level security measures such as monitoring and access control can mitigate harm even if the model is misaligned. Techniques from interpretability, uncertainty estimation, and safer design patterns can enhance the effectiveness of these mitigations.
Finally, our guest will briefly outline how these ingredients could be combined to produce safety cases for AGI systems.
Join BuzzRobot Slack to connect with the community