


Featured in
London
ML Institute Open Lectures
Hosted by Izaak Sofer, Besart Shyti & Stathis
Registration
Past Event
Please click on the button below to join the waitlist. You will be notified if additional spots become available.
About Event
How can large language models describe images? Or even videos? Modern deep learning systems are no longer confined to text—they now process and generate multiple types of data, from speech to images and beyond.
In this talk, Besart Shyti will explore the architectures behind multimodal models, from vision-language transformers to generative AI systems that create text from images and audio. We'll discuss how these models integrate diverse information, their current limitations, and the emerging breakthroughs. From improving search engines to enabling video understanding, multimodal AI is increasingly entering real-world applications.
Alternative dates
Mar
13
ML Institute Open Lectures
Thu, Mar 13, 6:00 PM GMT
Mar
20
ML Institute Open Lectures
Thu, Mar 20, 6:00 PM GMT