Cover Image for Community Paper Reading: Letta's Sleep-time Compute
Cover Image for Community Paper Reading: Letta's Sleep-time Compute
Avatar for Arize AI
Presented by
Arize AI
Generative AI-focused workshops, hackathons, and more. Come build with us!
Hosted By

Community Paper Reading: Letta's Sleep-time Compute

Zoom
Registration
Welcome! To join the event, please register below.
About Event

Letta’s New Approach to Scaling LLMs with “Sleep-time Compute

What if your LLM could think ahead—preparing answers before questions are even asked?


In this week's paper read, we’ll dive into a groundbreaking new paper from researchers at Letta, introducing sleep-time compute: a novel technique that lets models do their heavy lifting offline, well before the user query arrives. By predicting likely questions and precomputing key reasoning steps, sleep-time compute dramatically reduces test-time latency and cost—without sacrificing performance.

We’ll explore new benchmarks—Stateful GSM-Symbolic, Stateful AIME, and the multi-query extension of GSM—that show up to 5x lower compute at inference, 2.5x lower cost per query, and up to 18% higher accuracy when scaled.

You’ll also see how this method applies to realistic agent use cases and what makes it most effective.If you care about LLM efficiency, scalability, or cutting-edge research, don’t miss this deep dive.

Avatar for Arize AI
Presented by
Arize AI
Generative AI-focused workshops, hackathons, and more. Come build with us!
Hosted By