From Images to Agents: Building and Evaluating Multimodal AI Workflows

Name: From Images to Agents: Building and Evaluating Multimodal AI Workflows
Start: 2025-06-30T10:00:00.000-04:00
End: 2025-06-30T12:00:00.000-04:00
Location: Register to See Address

Vanishing Gradients Livestreams

Register to See Address

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

In this free live workshop with Ravin Kumar (DeepMind, Google, ex-SpaceX) and Hugo Bowne-Anderson (Vanishing Gradients), you’ll build LLM-powered image workflows and end with a simple image-driven agent.

We’ll start with hands-on tasks—like reading receipts and classifying images—and build a simple agent powered by LLMs that interpret images.

In this session, you’ll learn how to:

🧾 Extract structured information from images (e.g. OCR on receipts)
🖼️ Classify and count objects using LLMs
⚙️ Route behavior based on image content
🚦 Prototype image-to-action workflows (e.g. “if dog, then…”)

We’ll also cover how to evaluate these systems:
🔍 Does the OCR extract the right text?
📏 How well do the models classify or count objects?
🧪 Can we compare outputs across models or against ground truth?

You’ll see open-weight and hosted options in action:

🔹 Use Gemma models locally with Ollama for fully on device workflows
🔹 Use Gemini via AI Studio for frontier model performance
🔹 Wrap everything in Gradio for a smooth interface

We’ll end by wiring the logic together into a lightweight agent that reacts to what it sees.

And we’ll preview the next workshop in August, where we go deeper into evaluating AI agents and orchestrating more complex behaviors.

Presented by

Vanishing Gradients Livestreams

Hosted By

36 Going

AI