LLM Paper Club (Q-STaR and friends - for real)
(second time is the charm!) Ahead of the Strawberry launch, we'll survey a few related papers rumored to be relevant:
STaR: Boostrapping Reasoning with Reasoning (https://arxiv.org/abs/2203.14465)
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking (https://arxiv.org/abs/2403.09629)
V-STaR: Training Verifiers for Self-Taught Reasoners (https://arxiv.org/abs/2402.06457)
Related/Bonus:
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B (https://arxiv.org/abs/2406.07394)
AlphaProof/AlphaGeometry blogpost (https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/)
Improve Mathematical Reasoning in Language Models by Automated Process Supervision (https://arxiv.org/abs/2406.06592)
AlphaMath Almost Zero: process Supervision without process (https://arxiv.org/abs/2405.03553)
ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (https://arxiv.org/abs/2406.03816)
For future: we need YOU to volunteer to do rapid-fire recaps and explanations of our remaining papers on the board: https://app.sli.do/event/bNV6mo3BFGhe8Bqzb1tonb/live/questions
please sign up in #llm-paper-club!