Cover Image for [Paper Reading] s1: Simple Test-Time Scaling compared to R1 DeepSeek

Presented by

We are a group of applied AI practitioners and enthusiasts who have formed a collective learning community. Every Wednesday evening at PM PST, we hold our research paper reading seminar covering an AI

Hosted By

AI

[Paper Reading] s1: Simple Test-Time Scaling compared to R1 DeepSeek

SupportVectors AI Weekly Seminars

Zoom

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

This week, we will walk through and discuss the paper:
s1: Simple Test-Time Scaling Compared to R1 DeepSeek
Research is based on the integration of concepts from the following research paper: [https://arxiv.org/html/2501.19393v1]

Abstract:
Test-time scaling is a promising new approach to language modeling that uses extra test-time compute to improve performance. Recently, OpenAI’s o1 model showed this capability but did not publicly share its methodology, leading to many replication efforts. We seek the simplest approach to achieve test-time scaling and strong reasoning performance. First, we curate a small dataset s1K of 1,000 questions paired with reasoning traces relying on three criteria we validate through ablations: difficulty, diversity, and quality. Second, we develop budget forcing to control test-time compute by forcefully terminating the model’s thinking process or lengthening it by appending “Wait” multiple times to the model’s generation when it tries to end. This can lead the model to double-check its answer, often fixing incorrect reasoning steps. After supervised finetuning the Qwen2.5-32B-Instruct language model on s1K and equipping it with budget forcing, our model s1-32B exceeds o1-preview on competition math questions by up to 27% (MATH and AIME24). Further, scaling s1-32B with budget forcing allows extrapolating beyond its performance without test-time intervention: from 50% to 57% on AIME24. Our model, data, and code are open-source at https://github.com/simplescaling/s1.

Speaker: Fae Gaze

A Machine Learning/AI Data Scientist, Biostatistician, and Bioinformatician with over 8 years of experience working on diverse projects for companies and research institutions. Her expertise spans text classification, transformers, deep learning, cloud computing, and computational biology. Her mission is to apply advanced AI and statistical techniques to solve complex computational problems in biomedical research, genomic medicine, and data-driven systems.

------

About SupportVectors AI Meetup:
We are a group of applied AI practitioners and enthusiasts who have formed a collective learning community. Every Wednesday evening at PM PST, we hold our research paper reading seminar covering an AI topic. One member carefully explains the paper, making it more accessible to a broader audience. Then, we follow this reading with a more informal discussion and socializing.

You are welcome to join this in person or over Zoom. SupportVectors is an AI training lab located in Fremont, CA, close to Tesla and easily accessible by road and BART. We follow the weekly sessions with snacks, soft drinks, and informal discussions.

Presented by

SupportVectors AI Weekly Seminars

We are a group of applied AI practitioners and enthusiasts who have formed a collective learning community. Every Wednesday evening at PM PST, we hold our research paper reading seminar covering an AI

Hosted By

AI