Cover Image for AI Safety Thursdays: Agentic Misalignment: How LLMs could be insider threats

Presented by

Catalyzing Toronto's role in steering AI progress toward a future of human flourishing. Join us for a variety of events on technical AI safety, governance in a world of advanced AI, and more.

Hosted By

44 Went

AI

AI Safety Thursdays: Agentic Misalignment: How LLMs could be insider threats

Name: AI Safety Thursdays: Agentic Misalignment: How LLMs could be insider threats
Start: 2025-07-24T18:00:00.000-04:00
End: 2025-07-24T21:00:00.000-04:00
Location: 30 Adelaide St E 12th floor

Trajectory Labs

30 Adelaide St E 12th floor

Toronto, Ontario

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Can AI agents misbehave while carrying out actions autonomously? At this event, Giles Edkins will guide us through a look at and critique some research by Anthropic that demonstrates blackmail and other phenomena when an agent is threatened with shutdown or reprogramming.

Event Schedule
6:00 to 6:30 - Food & Networking
6:30 to 7:30 - Main Presentation & Questions
7:30 to 8:00 - Discussion

If you can't make it in person, feel free to join the live stream at 6:30 pm, via this link.

Location