Cover Image for Scaling Monosemanticity Explained
Cover Image for Scaling Monosemanticity Explained
Avatar for Unify
Presented by
Unify
Build AI Your Way ✨
Hosted By

Scaling Monosemanticity Explained

Google Meet
Registration
Past Event
Welcome! To join the event, please register below.
About Event

In this session, Dan will go through the paper "Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet", which is found here. The authors find a diversity of highly abstract features. They both respond to and behaviorally cause abstract behaviours. Examples of features they find include features for famous people, features for countries and cities, and features tracking type signatures in code.

Avatar for Unify
Presented by
Unify
Build AI Your Way ✨
Hosted By