


Presented by
Trajectory Labs
Catalyzing Toronto's role in steering AI progress toward a future of human flourishing.
Hosted By
1 Went
Auditing language models for hidden objectives
Registration
Past Event
About Event
Today's Topic
Last month, Anthropic released a new paper about "systematic investigations into whether models are pursuing hidden objectives".
> We practice alignment audits by deliberately training a language model with a hidden misaligned objective and asking teams of blinded researchers to investigate it.
Join us as Shivam Arora takes us through an explanation of the paper's key findings and some critiques of its approach and conclusions.
We welcome a variety of backgrounds, opinions and experience levels.
Event Schedule
6:00 to 6:45 - Networking and refreshments
6:45 to 8:00 - Main Presentation
Location
30 Adelaide St E
Toronto, ON M5C 3G8, Canada
Enter the main lobby of the building and let the security staff know you are here for the AI meetup. You may need to show your RSVP on your phone. You will be directed to the 12th floor where the meetup is held. If you have trouble getting in, give Smitty a call at 647-424-4111.

Presented by
Trajectory Labs
Catalyzing Toronto's role in steering AI progress toward a future of human flourishing.
Hosted By
1 Went