Cover Image for Auditing language models for hidden objectives
Cover Image for Auditing language models for hidden objectives
Avatar for Trajectory Labs
Presented by
Trajectory Labs
Catalyzing Toronto's role in steering AI progress toward a future of human flourishing.
Hosted By
1 Went

Auditing language models for hidden objectives

Registration
Past Event
Welcome! To join the event, please register below.
About Event

Today's Topic

Last month, Anthropic released a new paper about "systematic investigations into whether models are pursuing hidden objectives".

> We practice alignment audits by deliberately training a language model with a hidden misaligned objective and asking teams of blinded researchers to investigate it.

Join us as Shivam Arora takes us through an explanation of the paper's key findings and some critiques of its approach and conclusions.

We welcome a variety of backgrounds, opinions and experience levels.

Event Schedule
6:00 to 6:45 - Networking and refreshments
6:45 to 8:00 - Main Presentation

Location
30 Adelaide St E
Toronto, ON M5C 3G8, Canada
Enter the main lobby of the building and let the security staff know you are here for the AI meetup. You may need to show your RSVP on your phone. You will be directed to the 12th floor where the meetup is held. If you have trouble getting in, give Smitty a call at 647-424-4111.
Avatar for Trajectory Labs
Presented by
Trajectory Labs
Catalyzing Toronto's role in steering AI progress toward a future of human flourishing.
Hosted By
1 Went