Human-seeded Evals: Scaling Judgement with LLMs (with Samuel Colvin)

Name: Human-seeded Evals: Scaling Judgement with LLMs (with Samuel Colvin)
Start: 2025-06-19T16:00:00.000+01:00
End: 2025-06-19T17:00:00.000+01:00
Location: Register to See Address

Vanishing Gradients Livestreams

Register to See Address

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Everyone agrees evals are essential—but almost no one implements them rigorously. Why? They're hard to write, time-consuming, and brittle. Samuel Colvin, creator of Pydantic and co-founder of Logfire, has been thinking deeply about how to fix that.

In this livestream, we’ll dive into “Human-seeded Evals,” a lightweight process for generating scoring rubrics and LLM-as-a-judge systems by bootstrapping from a few hand-labeled examples. It’s practical, fast, and being tested in dev workflows today.

We’ll discuss:

🔁 How to seed a feedback loop with just a few examples
🧪 Using LLMs to scale qualitative judgement
📉 Where evals fail—and how to recover
📊 Lessons from experimenting with this approach in Logfire
🧠 What robust evaluation could look like for agents and apps

Whether you're debugging an LLM agent, trying to track regressions, or just tired of vibes-based dev cycles, this one’s for you.

Presented by

Vanishing Gradients Livestreams

Hosted By

37 Going

人工智能