Community Paper Reading: Letta's Sleep-time Compute

Arize AI

Zoom

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Letta’s New Approach to Scaling LLMs with “Sleep-time Compute”

What if your LLM could think ahead—preparing answers before questions are even asked?

In this week's paper read, we’ll dive into a groundbreaking new paper from researchers at Letta, introducing sleep-time compute: a novel technique that lets models do their heavy lifting offline, well before the user query arrives. By predicting likely questions and precomputing key reasoning steps, sleep-time compute dramatically reduces test-time latency and cost—without sacrificing performance.

We’ll explore new benchmarks—Stateful GSM-Symbolic, Stateful AIME, and the multi-query extension of GSM—that show up to 5x lower compute at inference, 2.5x lower cost per query, and up to 18% higher accuracy when scaled.

You’ll also see how this method applies to realistic agent use cases and what makes it most effective.If you care about LLM efficiency, scalability, or cutting-edge research, don’t miss this deep dive.

Presented by

Arize AI

Generative AI-focused workshops, hackathons, and more. Come build with us!

Hosted By

11 Went

AI