Evaluating AI Agents: From Demos to Dependability

Vanishing Gradients Livestreams

YouTube

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

In this free live workshop with Ravin Kumar (DeepMind, Google, ex-Tesla) and Hugo Bowne-Anderson (Vanishing Gradients), learn how to trace, test, and debug AI agents—so they actually work in the real world.

Most agent demos look impressive—until they break in practice 😭

In this hands-on session, you’ll build the tools to make AI agents reliable:

🧵 Trace tool use and model reasoning
🎭 Simulate real interactions and edge cases
📏 Define what success actually means
🚨 Catch silent failures and iterate effectively

We’ll work through a concrete use case, a lightweight data science agent that can:

🗃️ Query a SQL database
📊 Run Python-based data analysis
📈 Generate basic visualizations

You’ll see how to evaluate whether it:

🧠 Chose the right tool
⚙️ Executed the right logic
🗣️ Explained the result correctly

And how to build this kind of iterative evaluation process into your AI agent development workflow—so reliability isn’t an afterthought.

All running locally using Gemma 3 models and Ollama.
No cloud dependencies. No frameworks required.

This is the third workshop in a series:

1️⃣ We started by building local LLM apps and adding evaluation harnesses to guide iteration.
2️⃣ Then we built agents that could call tools and adapt dynamically.
3️⃣ Now, we focus on making those agents reliable and testable.

Each session stands on its own—but together, they map the real-world development process of AI systems.

Bring your laptop. This is fully hands-on.

Presented by

Vanishing Gradients Livestreams

Hosted By

67 Going

AI