Cover Image for Multimodal Weekly 83: A Framework for Enhancing Video Generation at Inference Time via a Teacher Model

Presented by

This webinar series happens every Friday from 1:30 - 2:30 PM PST. Each webinar will have speakers who will share their startups, projects, or research work in the Multimodal AI space.

Hosted By

5 Went

AI

Multimodal Weekly 83: A Framework for Enhancing Video Generation at Inference Time via a Teacher Model

Multimodal Weekly

Zoom

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

In the 83rd session of Multimodal Weekly, we have an exciting presentation with a framework for enhancing video generation at inference time via a teacher model from Dohun Lee.

Abstract
Text-to-image (T2I) diffusion models have revolutionized visual content creation, but extending these capabilities to text-to-video (T2V) generation remains a challenge, particularly in preserving temporal consistency. Existing methods that aim to improve consistency often cause trade-offs such as reduced imaging quality and impractical computational time.

To address these issues we introduce VideoGuide, a novel framework that enhances the temporal consistency of pretrained T2V models without the need for additional training or fine-tuning. Instead, VideoGuide leverages any pretrained video diffusion model (VDM) or itself as a guide during the early stages of inference, improving temporal quality by interpolating the guiding model’s denoised samples into the sampling model's denoising process.

The proposed method brings about significant improvement in temporal consistency and image fidelity, providing a cost-effective and practical solution that synergizes the strengths of various video diffusion models. Furthermore, we demonstrate prior distillation, revealing that base models can achieve enhanced text coherence by utilizing the superior data prior of the guiding model through the proposed method.

Resources On The Paper

Join the Multimodal Minds community to connect with the speaker!

Multimodal Weekly is organized by Twelve Labs, a startup building multimodal foundation models for video understanding. Learn more about Twelve Labs here: https://twelvelabs.io/

Presented by

Multimodal Weekly

This webinar series happens every Friday from 1:30 - 2:30 PM PST. Each webinar will have speakers who will share their startups, projects, or research work in the Multimodal AI space.

Hosted By

5 Went

AI