Personal

https://stanford.zoom.us/j/99152418195?pwd=c9fbFF2zyhnXPf9HpC5tjUA0EpU2bR.1

What if language models could learn to "think harder" only when they need to—allocating deep computation to challenging tokens while breezing through simple ones?

, a breakthrough architecture that unifies parameter sharing with adaptive computation. By dynamically assigning different recursion depths to individual tokens, MoR achieves large-model quality with significantly fewer parameters and computational resources.

​​Whether you’re working on the frontier of LLMs or just curious about anything AI4Science, we’d love to have you there.

Mixture-of-Recursions: The Power of Recursive Transformers

Mike

John Bohannon

Ben Lerner