

Moonshot Alignment Program Demo Day
The Moonshot Alignment Program ends with a public poster session and job fair. Teams will present their work in a virtual conference format on GatherTown. Each team has a space to display their results, answer questions, and defend their method. Senior researchers will review the posters and vote on standout projects.
Following the presentation is a job fair where research orgs, labs, and startups can host booths, meet researchers, and share open roles.
About the Moonshot Alignment Program:
Moonshot Alignment: 5 Week Intensive Research Program
A 5 week program for directly tackling the hard part of alignment.
Most alignment research focuses on subproblems. This fellowship tackles the core challenge directly: getting values into models with strong empirical evidence that the methods work and scale. Previous research experience is advised—everyone is welcome to apply, but due to mentorship bandwidth limitations, there is a limit to how many applications we can accept.
We guarantee personalized feedback to the first 300 applicants.
Different tracks have suggested requirements, however, lots of people can make valuable contributions even if they don't have the specific requirements for a track. For example, a data scientist with zero neuroscience background could help in the neuroscience track by extracting key information from brain scans. Someone with a strong interpretability, robustness or evals background could contribute to almost any team, by helping the team find out if the alignment method they're trying is actually working.
Program Details
Duration: 5 weeks, 10 hours a week
Format: Teams of 3-5 researchers
Start: Kickoff call with Kabir Kumar, 2nd August
End: Poster Evening followed by Careers Fair
Research Tracks
1. Agent Foundations
Either solve mathematical problems in agent foundations or implement existing theoretical work like Infrabayesianism.
Strong applicants to this track will:
- Be competent in Bayes nets, measure theory and propositional logic
- Be able to quickly learn new math
It would be helpful to know:
- Decision theory
- Computability/provability theory
Please apply even if you don't have any/all of these prerequisites—assuming a basic math background (e.g., a Masters), if you're hard working and willing to learn a lot, you can likely contribute. Formal qualifications are not necessary.
2. Brain-Based AI Safety
Develop architectures based on how morals/values are encoded in the human brain.
Strong applicants to this track could:
- Have a background in neuroscience and machine learning
- Understand why it's difficult to find out specifically what the brain is doing
3. Improved Preference Optimization Methods
Create non-shallow methods of scalable oversight that demonstrably embed values deeply into models. Must show generalization beyond training distribution through interpretability-based evaluations.
4. Original/Other Methods
Novel approaches to the core alignment problem that don't fit other tracks.
Program Structure
Week 1: Form teams, define specific approaches, design falsifiable experiments
Weeks 2-3: Build implementations, run experiments, test for generalization and for larger models
Week 4: critique approach, red-team, document failure modes
Week 5: write research summary, prepare poster and get final feedback
What We Provide
GPU compute for experiments
Mentorship from senior alignment researchers
Collaboration infrastructure
Stipends for full-time participants
Presentation opportunity to AI lab representatives
Application Process
Stage 1 requirements:
CV
Confidence level (1-10) you can commit for 30 days
Brief explanation of why you can commit
(Optional) Additional relevant work/ideas/links
Final Events
Poster Evening
Teams present their work in Gather Town. Senior alignment researchers judge projects. Conference-style format where attendees can visit your virtual booth.
Careers Fair
Representatives from DeepMind, Anthropic, Redwood Research, Apart Research, MIRI, Conjecture, CAIS, FAR AI, Ought, and others will discuss opportunities at their organizations.
Expected Outcomes
Working implementations of value alignment techniques
Empirical evidence of generalization and scaling
Falsifiable predictions with test results
Open-source contributions
Direct connections to alignment organizations
Questions? Contact kabir@ai-plans.com