

From Images to Agents: Building and Evaluating Multimodal AI Workflows
In this free live workshop with Ravin Kumar (DeepMind, Google, ex-SpaceX) and Hugo Bowne-Anderson (Vanishing Gradients), you’ll build LLM-powered image workflows and end with a simple image-driven agent.
We’ll start with hands-on tasks—like reading receipts and classifying images—and build a simple agent powered by LLMs that interpret images.
In this session, you’ll learn how to:
🧾 Extract structured information from images (e.g. OCR on receipts)
🖼️ Classify and count objects using LLMs
⚙️ Route behavior based on image content
🚦 Prototype image-to-action workflows (e.g. “if dog, then…”)
We’ll also cover how to evaluate these systems:
🔍 Does the OCR extract the right text?
📏 How well do the models classify or count objects?
🧪 Can we compare outputs across models or against ground truth?
You’ll see open-weight and hosted options in action:
🔹 Use Gemma models locally with Ollama for fully on device workflows
🔹 Use Gemini via AI Studio for frontier model performance
🔹 Wrap everything in Gradio for a smooth interface
We’ll end by wiring the logic together into a lightweight agent that reacts to what it sees.
And we’ll preview the next workshop in August, where we go deeper into evaluating AI agents and orchestrating more complex behaviors.