Cover Image for Auditing language models for hidden objectives

Presented by

Trajectory Labs

Catalyzing Toronto's role in steering AI progress toward a future of human flourishing.

Hosted By

1 Went

AI

Auditing language models for hidden objectives

Name: Auditing language models for hidden objectives
Start: 2025-04-03T18:00:00.000-04:00
End: 2025-04-03T20:30:00.000-04:00
Location: 30 Adelaide St E

Trajectory Labs

30 Adelaide St E

Toronto, Ontario

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Today's Topic

Last month, Anthropic released a new paper about "systematic investigations into whether models are pursuing hidden objectives".

> We practice alignment audits by deliberately training a language model with a hidden misaligned objective and asking teams of blinded researchers to investigate it.

Join us as Shivam Arora takes us through an explanation of the paper's key findings and some critiques of its approach and conclusions.

We welcome a variety of backgrounds, opinions and experience levels.

Event Schedule
6:00 to 6:45 - Networking and refreshments
6:45 to 8:00 - Main Presentation

Location

30 Adelaide St E

Toronto, ON M5C 3G8, Canada

Enter the main lobby of the building and let the security staff know you are here for the AI meetup. You may need to show your RSVP on your phone. You will be directed to the 12th floor where the meetup is held. If you have trouble getting in, give Smitty a call at 647-424-4111.

Presented by

Trajectory Labs

Catalyzing Toronto's role in steering AI progress toward a future of human flourishing.

Hosted By

1 Went

AI