Multimodal Weekly 45: Open-Source Models for LLM Evaluation and Multimodal Models for Audio Processing & Creation
In the 45th session of Multimodal Weekly, we welcome two graduate students working at the cutting edge of academic research in large language models and multimodal models.
✅ Seungone Kim, M.S. Student at KAIST AI & Incoming Ph.D. Student at Carnegie Mellon University (Language Technology Institute), will dive into Prometheus - a series of open-source models specialized in evaluations.
Check out the following resources:
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Prometheus-Vision: Vision-Language Model as a Judge for Fine-Grained Evaluation
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
✅ Nikhil Singh, a Ph.D. Candidate at MIT Media Lab, will explore human-AI interaction and multimodal machine learning across a range of application areas including creative, immersive, and informational media.
Check out the following resources:
Image2Reverb: Cross-Modal Reverb Impulse Response Synthesis
Creative Text-to-Audio Generation via Synthesizer Programming
Join the Multimodal Minds community to get connected with the speakers!
Multimodal Weekly is organized by Twelve Labs, a startup building multimodal foundation models for video understanding. Learn more about Twelve Labs here: https://twelvelabs.io/