Cover Image for AI Safety Thursdays: Tracing the Thoughts of a Large Language Model
Cover Image for AI Safety Thursdays: Tracing the Thoughts of a Large Language Model
Avatar for Trajectory Labs
Presented by
Trajectory Labs
Hosted By
1 Going

AI Safety Thursdays: Tracing the Thoughts of a Large Language Model

Registration
Welcome! To join the event, please register below.
About Event

​​Description

How do large language models actually work on the inside? Annie Sorkin presents on new research from Anthropic's Transformer Circuits team that opens up the "black box" of Claude 3.5 Haiku, revealing the computational mechanisms behind everything from multi-step reasoning to poetry planning.

Using a new methodology called attribution graphs, we'll explore how models handle multiple languages, exhibit concerning behaviors like jailbreaks, and sometimes engage in unfaithful reasoning.

​​Event Schedule

6:00 to 6:45 - Networking and refreshments

6:45 to 8:00 - Main Presentation

8:00 to 9:00 - Breakout Discussions

Location
30 Adelaide St E
Toronto, ON M5C 3G8, Canada
Enter the main lobby of the building and let the security staff know you are here for the AI meetup. You may need to show your RSVP on your phone. You will be directed to the 12th floor where the meetup is held. If you have trouble getting in, give Smitty a call at 647-424-4111.
Avatar for Trajectory Labs
Presented by
Trajectory Labs
Hosted By
1 Going