Aligning LLMs: ReFT
Fine-tuning and alignment are often misunderstood terms regarding Large Language Models (LLMs). In this series on Aligning LLMs, we will cover the most popular fine-tuning alignment methods, as well as emerging techniques, namely:
Reinforcement Learning with Human Feedback (RLHF)
Reinforcement Learning with AI Feedback (RLAIF)
Direct Preference Optimization (DPO)
Reasoning with Reinforced Fine-Tuning (ReFT)
In our fourth and final event, we tackle a method that straddles the line between fine-tuning and alignment. What we’re aligning the LLM to do in this one is to get better at math!
Released in January 2024 by authors from ByteDance Research, the paper leveraged both Reinforcement Learning and Proximal Policy Optimization, similar to RLHF and RLAIF.
The technique is called Reasoning with Reinforced Fine-Tuning, and rather than a true alignment technique that balances helpfulness and harmlessness, this method leverages RL to enhance fine-tuning beyond simply using Chain-of-Thought (CoT) reasoning annotations to improve LLM performance against classic math benchmarks.
Consider a single math word problem posed as a question, provided to an LLM for SFT, complete with a CoT-reasoned answer. This single math problem, despite having been reasoned through one way, probably could be solved using many different reasoning paths.
It is this fact that multiple reasoning paths exist that ReFT seeks to exploit!
In this event, we’ll break down the steps of ReFT, which consists of two stages: the warm-up stage and the reinforcement learning state. We’ll also discuss how the authors were able to achieve significantly increased performance on classic benchmarks like Grade School Math 8k (GSM8K), MathQA, and Simple Variations on Arithmetic Math word Problems (SVAMP) datasets.
As always, we’ll perform a detailed demonstration of how to code the core aspects of ReFT yourself in a Google Colab notebook environment, and all code will be provided directly to attendees!
Join us live to learn:
How this new method can be used to solve math problems!
How ReFT uses an approach similar to RLHF and RLAIF to improve SFT
What goes on in the warm-up and reinforcement learning stages
Speakers:
Dr. Greg Loughnane is the Co-Founder & CEO of AI Makerspace, where he serves as an instructor for their AI Engineering Bootcamp. Since 2021 he has built and led industry-leading Machine Learning education programs. Previously, he worked as an AI product manager, a university professor teaching AI, an AI consultant and startup advisor, and an ML researcher. He loves trail running and is based in Dayton, Ohio.
Chris Alexiuk is the Co-Founder & CTO at AI Makerspace, where he serves as an instructor for their AI Engineering Bootcamp. Previously, he’s held roles as a Founding Machine Learning Engineer, Data Scientist, and ML curriculum developer and instructor. He’s a YouTube content creator YouTube who’s motto is “Build, build, build!” He loves Dungeons & Dragons and is based in Toronto, Canada.
Follow AI Makerspace on LinkedIn & YouTube to stay updated with workshops, new courses, and opportunities for corporate training.