

Multimodal Weekly 80: Streaming Event Detection and Grounded Video Caption Generation
In the 80th session of Multimodal Weekly, we have two exciting presentations on streaming event detection and grounded video caption generation.
✅ Cristobal Eyzaguirre will present a novel task for multimodal video understanding - Streaming Detection of Quert Event Start (SDQES). The goal of SDQES is to identify the beginning of a complex event as described by a natural language query, with high accuracy and low latency.
✅ Evangelos Kazakos will present a novel approach for captioning and object grounding in video, where the objects in the caption are grounded in the video via temporally dense bounding boxes.
Join the Multimodal Minds community to connect with the speakers!
Multimodal Weekly is organized by Twelve Labs, a startup building multimodal foundation models for video understanding. Learn more about Twelve Labs here: https://twelvelabs.io/