Trust, Safety and AI: Fireside Chat with Anthropic Researchers & Intrinsic
Join us for an evening of engaging conversations around Trust, Safety, and AI on July 23, 2024, from 5:30 PM to 8:00 PM at the beautiful outdoor patio of the 645 Ventures offices.
Fireside Chat
Join Anthropic AI Safety researchers Trenton Bricken and Cem Anil during a fireside chat, where we will dive into the inner workings of large language models and potential impact of these technologies on Trust & Safety.
Trenton Bricken is part of the Mechanistic Interpretability team seeking to understand and interpret the “features” that large language models learn. The most recent display of this mechanistic understanding was “Golden Gate Claude” where a Claude was made to believe that it was the Golden Gate Bridge instead of an AI Assistant.
Cem Anil is on the Alignment Science team. His work includes “influence functions” that reveal what training data caused the AI to do something. He has also pioneered Multi-Shot Jailbreaks, an effective approach to evade the safety guardrails of LLMs by leveraging their long context windows.
This event will be moderated by Karine Mellata from Intrinsic and will be conducted under Chatham House rule to encourage open and honest discussion.
Follow Trenton Bricken, and Cem Anil on Twitter!
Event
The fireside chat will take place at 6:15PM PST for 30 minutes. There will be time before and after to mingle! Enjoy catered sushi along with a selection of alcoholic and non-alcoholic beverages.
Hosted by Intrinsic and 645 Ventures, this event will take place on the sidelines of the TrustCon conference.