Cover Image for Human-seeded Evals: Scaling Judgement with LLMs (with Samuel Colvin)
Cover Image for Human-seeded Evals: Scaling Judgement with LLMs (with Samuel Colvin)
Avatar for Vanishing Gradients Livestreams
37 Going

Human-seeded Evals: Scaling Judgement with LLMs (with Samuel Colvin)

Register to See Address
Registration
Welcome! To join the event, please register below.
About Event

Everyone agrees evals are essential—but almost no one implements them rigorously. Why? They're hard to write, time-consuming, and brittle. Samuel Colvin, creator of Pydantic and co-founder of Logfire, has been thinking deeply about how to fix that.

In this livestream, we’ll dive into “Human-seeded Evals,” a lightweight process for generating scoring rubrics and LLM-as-a-judge systems by bootstrapping from a few hand-labeled examples. It’s practical, fast, and being tested in dev workflows today.

We’ll discuss:

🔁 How to seed a feedback loop with just a few examples
🧪 Using LLMs to scale qualitative judgement
📉 Where evals fail—and how to recover
📊 Lessons from experimenting with this approach in Logfire
🧠 What robust evaluation could look like for agents and apps

Whether you're debugging an LLM agent, trying to track regressions, or just tired of vibes-based dev cycles, this one’s for you.

Avatar for Vanishing Gradients Livestreams
37 Going