RLVR: Reinforcement Learning with Verifiable Rewards

Public AIM Events!

YouTube

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

What exactly is RLVR?

The term was coined in 2024 the Tülu 3 paper by The Allan Institute for AI. It was defined as:

a novel method for training language models on tasks with verifiable outcomes such as mathematical problem-solving and instruction following. RLVR leverages the existing RLHF objective but replaces the reward model with a verification function

In that paper, the Proximal Policy Optimization (PPO) Algorithm was used to train against only verified rewards. For each type of data used, a verification function was implemented.

Verification, of course, was not new. In 2023, OpenAI made great strides towards reasoning models with their work entitled “Let’s Verify Step by Step,” where they introduced the idea of the supervision of process, rather than simply the supervision of outcomes. Process supervision was shown to increase capabilities in solving math problems, and was an indication of the path ahead.

Of course, in January of 2025, DeepSeek-R1 burst onto the scene, trained with the now-famous Group Relative Policy Optimization (GRPO), established in the DeepSeekMath paper (2024), instead of the classic PPO optimization scheme.

Importantly, GRPO is a policy optimization algorithm, while RLVR is a rewards-based training paradigm.

In 2025, there has also been much work done on RLVR that is worth exploring, and it is our intention in this event to get you up to date! For instance, we’ve seen the investigation of RLVR into other domains besides math, coding, and instruction following, as in Crossing The Reward Bridge.

At the same time, finding the limits of RLVR has been important to researchers and engineers at the edge of AI today, including the determination of what is verifiable and what isn’t. In General Reasoning without Verifiers, the authors claim that:

this methodology is limited to tasks where rule-based answer verification is possible and does not naturally extend to real-world domains such as chemistry, healthcare, engineering, law, biology, business, and economics.

The propose a verifier-free (VeriFree) method that “bypasses answer verification and instead uses RL to directly maximize the probability of generating the reference answer.”

There have even been surveys completed in recent months on where we are with RLVR today, including 100 Days After DeepSeek-R1: A Survey on Replication Studies and RLVR that we’re quite interested to review in detail!

In this event, we aim to get you up to speed on RLVR in 2025.

🤓 Who should attend

AI engineers & data scientists interested in training their own LLMs, LRMs, and SLMs.
AI engineering teams looking to optimize cost and performance at scale by through fine-tuning and post-training of off-the-shelf models.
AI Engineering leaders who want to stay up to date with the latest domain adaptation and fine-tuning techniques for todays LLMs, LRMs, and SLMs.

Speaker Bios

Dr. Greg” Loughnane is the Co-Founder & CEO of AI Makerspace, where he is an instructor for their AI Engineering Bootcamp. Since 2021, he has built and led industry-leading Machine Learning education programs. Previously, he worked as an AI product manager, a university professor teaching AI, an AI consultant and startup advisor, and an ML researcher. He loves trail running and is based in Dayton, Ohio.
Chris “The Wiz” Alexiuk is the Co-Founder & CTO at AI Makerspace, where he is an instructor for their AI Engineering Bootcamp. During the day, he is also a Developer Advocate at NVIDIA. Previously, he was a Founding Machine Learning Engineer, Data Scientist, and ML curriculum developer and instructor. He’s a YouTube content creator who’s motto is “Build, build, build!” He loves Dungeons & Dragons and is based in Toronto, Canada.

Follow AI Makerspace on LinkedIn and YouTube to stay updated about workshops, new courses, and corporate training opportunities.

Presented by

Public AIM Events!

Hosted By

24 Going