Cover Image for From Images to Agents: Building and Evaluating Multimodal AI Workflows
Cover Image for From Images to Agents: Building and Evaluating Multimodal AI Workflows
Avatar for Vanishing Gradients Livestreams
36 Going

From Images to Agents: Building and Evaluating Multimodal AI Workflows

Register to See Address
Registration
Welcome! To join the event, please register below.
About Event

In this free live workshop with Ravin Kumar (DeepMind, Google, ex-SpaceX) and Hugo Bowne-Anderson (Vanishing Gradients), you’ll build LLM-powered image workflows and end with a simple image-driven agent.

We’ll start with hands-on tasks—like reading receipts and classifying images—and build a simple agent powered by LLMs that interpret images.

In this session, you’ll learn how to:

🧾 Extract structured information from images (e.g. OCR on receipts)
🖼️ Classify and count objects using LLMs
⚙️ Route behavior based on image content
🚦 Prototype image-to-action workflows (e.g. “if dog, then…”)

We’ll also cover how to evaluate these systems:
🔍 Does the OCR extract the right text?
📏 How well do the models classify or count objects?
🧪 Can we compare outputs across models or against ground truth?

You’ll see open-weight and hosted options in action:

🔹 Use Gemma models locally with Ollama for fully on device workflows
🔹 Use Gemini via AI Studio for frontier model performance
🔹 Wrap everything in Gradio for a smooth interface

We’ll end by wiring the logic together into a lightweight agent that reacts to what it sees.

And we’ll preview the next workshop in August, where we go deeper into evaluating AI agents and orchestrating more complex behaviors.

Avatar for Vanishing Gradients Livestreams
36 Going