Multimodal Weekly 52: HCI for How-to Videos, Feeling & Building Multimodal Intelligence, and Visually-Grounded Video QA
In the 52nd session of Multimodal Weekly, we have three exciting researchers working in Human-Computer Interaction for video understanding, large-scale multimodal models, and video question answering.
✅ Saelyne Yang, Ph.D. Candidate at KAIST, will present her work on enhancing how people learn procedural tasks through how-to videos.
✅ Bo Li and Yuanhan Zhang, Ph.D. students at NTU Singapore, will introduce recent works at LMMs-Lab, including LLaVA-NeXT, LongVA, and LMMs-Eval.
✅ Junbin Xiao, a Research fellow at NUS Singapore, will present his work on visually-grounded video question-answering.
Join the Multimodal Minds community to connect with the speakers!
Multimodal Weekly is organized by Twelve Labs, a startup building multimodal foundation models for video understanding. Learn more about Twelve Labs here: https://twelvelabs.io/