[Paper Reading] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Details
We will walk through the paper:
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
[2501.12948] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Abstract
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without super vised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors. However, it encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL. DeepSeek R1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks. To support the research community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama.
----
We are a group of applied AI practitioners and enthusiasts who have formed a collective learning community. Every Wednesday evening, we hold our research paper reading seminar covering an AI topic. One member carefully explains the paper, making it more accessible to a broader audience. Then, we follow this reading with a more informal discussion and socializing.
Speaker :
Asif Qamar
LinkedIn: https://www.linkedin.com/in/asifqamar/
Technology Leader | AI/Data Scientist | Computer Scientist | Educator | Theoretical Particle Physicist
Technical Leadership
Primarily interested in technical leadership positions that couple visionary leadership with a high‐octane, technical involvement in
applied AI/Machine learning. What distinguishes me is a technical leadership that brings together extensive, hands‐on technical, AI,
and architectural ability on the one hand and a capacity to bring together a very productive, creative, passionate, and happy team
across geographical boundaries.
Track record of consistently delivering more than a dozen successful products of enduring value that I was instrumental in envisioning,
crafting the architecture of, doing the early R&D, prototyping, and then building together a dedicated, cohesive, and talented team
around to take the ideas to fruition, through significant projects. Without fail, the products that I have led the creation of are in extensive
deployment and healthy evolution after many years.
Teaching & Mentoring
. Over 20 years of leading, teaching, and mentoring engineers, through team-building around non-trivial projects, classes at universities, workshops, brown bags, and other informal gatherings. Currently, running off-work hours evening workshops in AI/Data-science/Machine-learning and Cloud computing.
You are welcome to join this in person or over Zoom (https://us02web.zoom.us/meeting/register/tZUvf-uvrTwvHdP9B-vE03j3BapgRypn64CS). SupportVectors is an AI training lab located in Fremont, CA, close to Tesla and easily accessible by road and BART. We follow the weekly sessions with snacks, soft drinks, and informal discussions.