Multimodality with Llama 3.2

Public AIM Events!

YouTube

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Multimodal models are on the rise in 2025! Images and videos generated using AI continue to get better and better, rapidly.

The industry has come a long way! Llama 3.2 was the first multimodal Llama Model, supporting vision and text modalities.

We’re told by Meta that Llama 3.2 models “Support image reasoning use cases, such as document-level understanding including charts and graphs, captioning of images, and visual grounding tasks such as directionally pinpointing objects in images based on natural language descriptions.”

When thinking about multimodal models, a few questions come to mind:

How can the model “see?”
What exactly is Llama 3.2 actually doing when you prompt it? How does this differ from an LLM that is text only?
Is it just embedding representations that differ, or also the generation process?
How exactly are multimodal models trained and adapted to vision, from pretraining to post-training?

With the rise of AI capabilities like computer use, it’s time we all get a handle on multimodality.

Start learning what you need to know live Dr. Greg & The Wiz, from concepts to code.

📚 You’ll learn:

How multimodal LLMs differ from text-only models, from prompting to encoding to decoding
How to leverage Llama 3.2 in your workflows and applications

🤓 Who should attend the event:

Aspiring AI Engineers who want to build state-of-the-art multimodal LLM applications
AI Engineering leaders who want to build production applications with of multimodal LLMs

Speakers:

Dr. Greg” Loughnane is the Co-Founder & CEO of AI Makerspace, where he is an instructor for their AI Engineering Bootcamp. Since 2021, he has built and led industry-leading Machine Learning education programs. Previously, he worked as an AI product manager, a university professor teaching AI, an AI consultant and startup advisor, and an ML researcher. He loves trail running and is based in Dayton, Ohio.
Chris “The Wiz” Alexiuk is the Co-Founder & CTO at AI Makerspace, where he is an instructor for their AI Engineering Bootcamp. During the day, he is also a Developer Advocate at NVIDIA. Previously, he was a Founding Machine Learning Engineer, Data Scientist, and ML curriculum developer and instructor. He’s a YouTube content creator YouTube who’s motto is “Build, build, build!” He loves Dungeons & Dragons and is based in Toronto, Canada.

Follow AI Makerspace on LinkedIn and YouTube to stay updated about workshops, new courses, and corporate training opportunities.

Presented by

Public AIM Events!

Hosted By

120 Went

AI