Cover Image for Anthropic's "Agentic Misalignment: How LLMs could be insider threats"
Cover Image for Anthropic's "Agentic Misalignment: How LLMs could be insider threats"
Avatar for BuzzRobot
Presented by
BuzzRobot
AI research discussions
Hosted By
226 Went

Anthropic's "Agentic Misalignment: How LLMs could be insider threats"

Zoom
Registration
Past Event
Welcome! To join the event, please register below.
About Event

Anthropic recently released a report "Agentic Misalignment: How LLMs could be insider threats" in which it tested 16 models from different providers to see how agents would behave autonomously.

The agents were allowed to act autonomously, for example, send emails and access sensitive data. The assigned goals were harmless.

Then the Anthropic team tested whether they would act against companies in scenarios when they could be replaced with an updated version or when their assigned goal conflicted with the company's changing direction.

In this conversation, Aengus Lynch from University College London, who actively collaborated with the Anthropic team and is a core contributor to this work, will share with the BuzzRobot community the details of this work.

Read "Agentic Misalignment: How LLMs could be insider threats"

Join the BuzzRobot community on Slack

Avatar for BuzzRobot
Presented by
BuzzRobot
AI research discussions
Hosted By
226 Went