
Praxis #003 (ft. A*STAR)
About Lorong AI
Lorong AI is a co-working hub where AI practitioners connect, share knowledge, and grow through curated programming and a collaborative environment. Home to programmes like AI Wednesdays and AI ToolsDays, Lorong AI offers hands-on workshops, technical deep dives, and opportunities to solve real-world challenges with AI.🌟Join us on Friday, for edition 3 of Praxis - dedicated sessions for technical deep dives and capabilities transfer.
This upcoming session features the team from A*STAR's Institute for Infocomm Research (I2R), where they share about their work in audio-based multimodal LLMs.
Learn more about evaluation methods, innovative speech-text integration techniques, and a Singapore-specific model designed for our diverse linguistic landscape
About the sessions
Bin Wang (Scientist, A*STAR) will share insights and initiatives for evaluating audio-based multimodal LLMs, highlighting the paradigm shifts and new efforts made to address the challenges of an ever-evolving field of multimodal LLMs. Finally, he will discuss the pros and cons on existing models, explore future challenges related to model capabilities and future evaluation explorations. (Technical Level: 200)
Zhang Wenyu (Senior Scientist, I2R, A*STAR) will discuss recent advancements in AudioLLMs — models that integrate speech and audio processing with text understanding. She will highlight the challenges these models face in adapting to new tasks and introduce Mixtures of Weak Encoders (MoWE), a method that enhances learning by incorporating small, specialized audio processors that activate selectively based on input. (Technical Level: 200)
Yingxu He (Engineer, A*STAR) will introduce MERaLiON-AudioLLM, a pioneering speech-text model developed to navigate Singapore's rich linguistic diversity. This sharing will explore how MERaLiON-AudioLLM enhances accessibility by understanding local accents and dialects, providing developers and researchers with insights into building and refining multimodal models that are attuned to localized linguistic and cultural contexts. (Technical Level: 200)
Discover how these advancements are shaping the future of audio AI technologies!
More About the Speakers:
Bin Wang is a Scientist with the Institute for Infocomm Research (I2R, A*STAR, Singapore). Before that, he obtained his PhD degree from the University of Southern California, USA in 2021. He was a Research Fellow with the National University of Singapore from 2021 to 2023. His research focuses on Multimodal LLM and Conversational AI systems.
Zhang Wenyu is a Senior Scientist at I2R, A*STAR, specializing in multimodal LLMs, particularly audio-text LLMs designed for Singapore’s multilingual and multicultural landscape. Her research spans computer vision, model robustness, time series prediction, and anomaly detection. She earned her Ph.D. in Statistics from Cornell University in 2020.
Yingxu He is a Senior Research Engineer at I2R, A*STAR, specializing in LLM training and deployment. He holds a Master's Degree in AI from the National University of Singapore
-----------
Please note that the views expressed by the speakers during these talks and events are their personal views only, and do not necessarily represent the official views of the Singapore Government. The Singapore Government neither endorses nor assumes responsibility for any content presented.