Presented by
Unify
Build AI Your Way ✨
Hosted By
Scaling Monosemanticity Explained
Registration
Past Event
About Event
In this session, Dan will go through the paper "Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet", which is found here. The authors find a diversity of highly abstract features. They both respond to and behaviorally cause abstract behaviours. Examples of features they find include features for famous people, features for countries and cities, and features tracking type signatures in code.
Presented by
Unify
Build AI Your Way ✨
Hosted By