Cover Image for TAI AAI #10 - AI x Speech
Cover Image for TAI AAI #10 - AI x Speech
Avatar for Tokyo AI (TAI)
Presented by
Tokyo AI (TAI)

TAI AAI #10 - AI x Speech

Register to See Address
Tokyo
Registration
Approval Required
Your registration is subject to approval by the host.
Welcome! To join the event, please register below.
About Event

This Tokyo AI (TAI) Advanced AI (AAI) group session will feature speakers on AI in Speech.

​​Schedule

​​18:30 - 19:00 Doors open
19:00 - 19:10 Introduction
19:10 - 19:40 From Words to Wisdom: Japanese ASR as the Entry Point to Knowledge Transformation (Qi Chen)
19:40 - 20:10 Exploring Disentanglement in Speech (Nathania Nah)
20:10 - 20:40 Speech-to-Speech Technology: Recent Advances and Challenges (Meishu Song)
20:40 - 21:30 Networking

Speakers

Qi Chen (https://www.linkedin.com/in/qi-chen-5aa30b67/)

Title: From Words to Wisdom: Japanese ASR as the Entry Point to Knowledge Transformation
Abstract: This talk provides a practical guide to building Japanese ASR models based on my experience developing ReazonSpeech-k2-v2, an open-source model that outperformed Whisper in 2024. I'll walk through the complete pipeline from dataset preparation to deployment optimization, addressing the unique challenges of Japanese speech recognition including accent variations and real-time performance requirements. While my current work at Paraparas focuses on knowledge transformation through our Paralogue platform (offering the world-leading ASR capabilities with our partner Gladia), this session offers insights for researchers, engineers, and entrepreneurs looking to bridge the gap between academic speech research and real-world applications.
Bio: Qi Chen is the CEO and co-founder of Paraparas, developing Paralogue, a platform that transforms dialogue and monologue into personalized knowledge. He completed his PhD in Cognitive Science under Douglas Hofstadter at Indiana University Bloomington, focusing on computational models of analogical thinking. His interdisciplinary background spans cognitive science, computational linguistics, and software engineering, with a focus on creating technology that enhances rather than replaces human capabilities.

Nathania Nah (https://www.linkedin.com/in/nathanianah)

Title: Exploring Disentanglement in Speech
Abstract: Disentanglement is a method that aims to identify and separate distinctive generative factors in the data, thus removing the sensitivity of the representation to variations in data that are uninformative to the task. Traditionally, speech disentanglement has been used in speaker relevant classification and generation tasks, such as speaker verification, voice conversion, speech synthesis, etc. However, these works often focus on the model's overall performance in a given task. We will discuss our aims to disentangle features in pretrained speech representations to better identify how they are used in downstream tasks as well as improve the understanding of the features captured in self-supervised methods, with the ultimate goal to generate models with better explainability.
Bio: Nathania is a PhD student at Science Tokyo (formerly Tokyo Tech) studying machine learning in speech. She is from the United States and previously received her masters at Georgia Tech and is currently focusing on affective computing in speech at Shinoda Lab. Her work primarily consists of multimodal recognition of personality and emotion, and she is passionate about ways to improve personal and mental well-being with new technologies.

Meishu Song (https://www.linkedin.com/in/meishu-song/)

Title: Speech-to-Speech Technology: Recent Advances and Challenges
Abstract: This talk will outline the core architecture and technical principles of speech-to-speech systems, exploring the evolution from traditional pipeline approaches to today's end-to-end neural network models. The presentation will examine breakthrough advancements enabled by large language models, with focus on innovations in low-resource languages and emotion preservation.
Bio: With a Ph.D. in Affective Computing AI from the University of Tokyo, Meishu Song currently serves as a researcher at the same institution while simultaneously leading an innovative startup focused on emotional companionship products.
Meishu possesses extensive expertise in emotion recognition and deep learning, with applications spanning diverse sectors including education, mental health, and automotive industries. Her work bridges cutting-edge technology with human-centered design to create AI systems that better understand and respond to human emotions now!

Our Community

​​​​​Tokyo AI (TAI) is a community composed of people based in Tokyo and working with, studying, or investing in AI. We are engineers, product managers, entrepreneurs, academics, and investors intending to build a strong “AI coreˮ in Tokyo. Find more in our overview: https://bit.ly/tai_overview

Organizers

Kai Arulkumaran: Research Team Lead at Araya, working on brain-controlled robots as part of the JST Moonshot R&D program. Previously, he completed his PhD in Bioengineering at Imperial College London and had work experience at DeepMind, FAIR, Microsoft Research, Twitter Cortex, and NNAISENSE. His research areas are deep learning, reinforcement learning, evolutionary computation, and computational neuroscience.

​Craig Sherstan: Research Scientist at Sony AI Tokyo. His current research is on the application of RL to create AI opponents for the video game Gran Turismo. Previously, he completed his PhD in Reinforcement Learning at the University of Alberta, Canada as part of the Bionic Limbs for Improved Natural Control Lab. Craig has past experience working with human-computer interfaces, robotics, and various software industries.

​Ilya Kulyatin: Fintech and AI entrepreneur with work and academic experience in the US, Netherlands, Singapore, UK, and Japan, with an MSc in Machine Learning from UCL.

​Sponsor

TBD

Location
Please register to see the exact location of this event.
Tokyo
Avatar for Tokyo AI (TAI)
Presented by
Tokyo AI (TAI)