Cover Image for Collov@CVPR'24: Long-form Video Understanding Workshop
Cover Image for Collov@CVPR'24: Long-form Video Understanding Workshop
Hosted By
6 Went

Collov@CVPR'24: Long-form Video Understanding Workshop

Hosted by Collov AI
Register to See Address
Seattle, Washington
Registration
Past Event
Welcome! To join the event, please register below.
About Event

​Long-form Video Understanding Towards Multimodal AI Assistant and Copilot Workshop @ CVPR'24

β€‹πŸ“… Date: June 17, 2024 (the schedule is subject to change, and all registered attendees will be notified promptly of any adjustments)
πŸ“ Location: 2024 Conference on Computer Vision and Pattern Recognition
🏒 Industry Host: Collov AI, a leader in AI-powered interior design (website: collov.ai; Twitter: @collov_ai)

​Collov AI invites the academic community to join our workshop at CVPR'24, organized by researchers from NUS, UTS, UC Berkeley, FAIR, ZJU, UT Austin, Meta AI, Dartmouth, NTU, Shanghai AI Lab and UW. The workshop will explore advancements and emerging challenges in the field of long-form video understanding. This year’s workshop features focused tracks on key areas of research and development:

  • ​Track 1: Long-Term Video Question Answering

    • ​This track addresses the challenges of interpreting and answering questions based on content from extended video footage. Research presentations and discussions will focus on algorithm development for better understanding the context and details within long video sequences. Participants will examine case studies and current research that utilize AI to parse and respond to nuanced inquiries over lengthy durations, enhancing automated systems' capabilities in media analysis and interaction.

  • ​Track 2A: Text-Guided Video Editing

    • ​Track 2A provides an in-depth look at how textual metadata and scripts can dynamically influence video editing processes. The track will cover various approaches, including the use of AI to interpret text cues and apply them for video segmentation, scene recognition, and content-appropriate editing techniques. It aims to bridge the gap between traditional video editing and automated, text-driven processes, fostering a discussion on the integration of advanced language models with video editing tools.

  • ​Track 2B: Text-to-Video Generation

    • ​In Track 2B, participants will explore the field of generating video content directly from text descriptions. This track will include comprehensive overviews of the current state-of-the-art technologies, practical applications, and the theoretical foundations of text-to-video synthesis. Discussions will focus on the challenges of ensuring fidelity and coherence in generated videos, as well as potential applications in entertainment, education, and content creation.

​Speakers:

​Dima Damen (University of Bristol / Google DeepMind)

​Marc Pollefeys (ETH Zurich / Microsoft)

​Chunyuan Li (ByteDance / TikTok)

Location
Please register to see the exact location of this event.
Seattle, Washington
Hosted By
6 Went