Cover Image for TAI AAI #03 - Natural Language Processing (NLP)

Hosted By

TAI AAI #03 - Natural Language Processing (NLP)

Name: TAI AAI #03 - Natural Language Processing (NLP)
Start: 2024-08-07T17:30:00.000+09:00
End: 2024-08-07T21:00:00.000+09:00
Location: artience Co., Ltd.

Hosted by Ilya Kulyatin & 3 others

artience Co., Ltd.

Chuo City, Tokyo

Past Event

Please click on the button below to join the waitlist. You will be notified if additional spots become available.

You will be asked to verify token ownership with your wallet.

About Event

NOTE: registration is required 24 hours before the event (to generate the entrance QR codes).

Our Community

Tokyo AI (TAI) is a community composed of people based in Tokyo and working with, studying, or investing in AI. We are engineers, product managers, entrepreneurs, academics, and investors intending to build a strong “AI coreˮ in Tokyo. Find more in our overview: https://bit.ly/tai_overview

Topic

One of the biggest areas in AI is natural language processing (NLP), which has become even more relevant in the age of LLMs. Japan has a strong community of NLP researchers, and at Tokyo AI we will be hosting several researchers from academia and industry to present their work in NLP. The first half will cover topics on training Japanese LLMs, and the second half will cover prompt engineering and interactive feedback for improving LLM performance.

Target Audience

Technical background with some knowledge of AI, but not necessarily in-depth knowledge of NLP.

Schedule

17:30 - 18:00 Doors open

18:00 - 18:30 Lessons Learned Training Open Source Japanese LLMs (Leonard Lin)

18:30 - 19:00 Building Effective Pre-training Corpus for Japanese LLM (Kakeru Hattori)

[19:00 - 19:45 Break + Food (if sponsor available)]

19:45 - 20:15 A Quick Overview to Unlock the Potential of LLMs through Prompt Engineering (Ayana Niwa)

20:15 - 20:45 Enhancing LLMs with Interactive Feedback: Advancing Learning and Reasoning (Mengsay Loem)

21:00 End

Speakers

Leonard Lin (profile)

Title: Lessons Learned Training Open Source Japanese LLMs

Abstract: This presentation will explore some of the practical challenges and insights gleaned from the past year of training open-source Japanese LLMs. Besides examining some specific discoveries and learnings on tokenizers, training data quality, and the current state of Japanese evals, we will also be looking at how the Japanese LLM landscape has rapidly evolved, from improvements in overall multilingual model quality, synthetic data generation options, dramatic drops in training costs, and the growth in strength of the Japanese LLM hobbyist community. This presentation should be of specific interest to researchers and developers actively training Japanese LLMs.

Bio: Leonard is the founder and CTO of Shisa.AI, a new Tokyo-based startup building Japanese AI infrastructure focused on production and deployment. A veteran open source contributor, techie, and builder, he started with HTML 1.0 websites, sold one of the first Web2 startups to Yahoo!, and subsequently ran their Hack innovation program. He also built and scaled tech for the Obama’08 campaign, and was the founding CTO of Code for America. He’s worked on large-scale data and production infrastructure, imaging hardware and software, deep crypto/fintech, VR/XR, VTuber tech, and biomedical testing/metabolic research. Over the past few years, his attention has turned increasingly towards AI.

Kakeru Hattori (profile)

Title: Building Effective Pre-training Corpus for Japanese LLM

Abstract: “Swallow” is a Large Language Model (LLM) developed by research teams at the Tokyo Institute of Technology and the National Institute of Advanced Industrial Science and Technology (AIST). We are working on building better Japanese LLMs by continually pre-training from English LLMs to Japanese. So far, we have successfully enhanced Japanese performance through continual pre-training from models such as Llama 2, Mistral, Mixtral, and Llama 3. The use of a high-quality pre-training corpus has been a crucial factor in this achievement. In this presentation, I will mainly introduce the construction procedure of the “Swallow Corpus”, our proprietary large Japanese web corpus, and its combination with other corpora. Additionally, if possible, I will discuss future tasks considering the current challenges of Swallow and recent research trends.

Bio: Kakeru Hattori is a second-year master’s student at the Tokyo Institute of Technology, specializing in Natural Language Processing (NLP) at Okazaki Lab. Recently, he has a strong interest in the development of a pre-training corpus for LLM. In the “Swallow” project, he focuses on the construction of an efficient Japanese pre-training corpus. Additionally, he also enjoys software engineering and has accumulated about 2 years of experience as an intern at several companies working in areas such as frontend development, SRE, and data analysis.

Ayana Niwa (profile)

Title: A Quick Overview to Unlock the Potential of LLMs through Prompt Engineering

Abstract: Prompts (natural language instructions) function as an effective interface for communication between humans and Large Language Models (LLMs), serving as a key to unlocking the models' extensive capabilities. In this presentation, I will provide an overview of prompt engineering, highlighting recent trends such as the ability of LLMs to follow instructions and techniques for optimizing prompts. I will also explore the challenges in this field and discuss potential future directions for the development of prompt engineering.

Bio: Ayana Niwa is a researcher at the Okazaki Lab, Tokyo Institute of Technology, and a research scientist at a web company in Tokyo. She completed her Ph.D. at the School of Computing, Tokyo Institute of Technology, in 2023. Her research specializes in natural language processing, with a focus on the controllability and interpretability of natural language generation, as well as the uncertainty in instructions for large language models.

Mengsay Loem (profile)

Title: Enhancing LLMs with Interactive Feedback: Advancing Learning and Reasoning

Abstract: Large Language Models (LLMs) like ChatGPT can justify their predictions through interactive discussions, significantly enriching their understanding of various instances. This talk delves into incorporating interactive feedback to enhance LLMs' learning and reasoning capabilities. We will examine various methods where models engage in dynamic discussions with partner models or humans, demonstrating how continuous feedback and iterative updates can refine the models' reasoning and verbal expression abilities. This presentation aims to uncover the transformative impact of interactive feedback on LLM training and inference, providing insights into emerging trends and future directions in this promising research area.

Bio: Mengsay Loem is a researcher specializing in Natural Language Processing (NLP), with a keen interest in large language models (LLMs), multimodal information processing (language and visual data), low-resource languages, and educational applications. He is currently part of the R&D division at Sansan, Inc. Mengsay earned a Master of Engineering in Artificial Intelligence from the Tokyo Institute of Technology. He has published several papers at top international conferences and workshops in the NLP and AI fields. His recent work includes developing LLM-based multi-agent frameworks, applying LLMs in educational contexts, data augmentation for automatic summarization, and advancing Japanese language models.

Organizers - alphabetic order

Kai Arulkumaran: Research Team Lead at Araya, working on brain-controlled robots as part of the JST Moonshot R&D program. Previously, he completed his PhD in Bioengineering at Imperial College London and had work experience at DeepMind, FAIR, Microsoft Research, Twitter Cortex, and NNAISENSE. His research areas are deep learning, reinforcement learning, evolutionary computation, and computational neuroscience.

Ilya Kulyatin: Fintech and AI entrepreneur with work and academic experience in the US, Netherlands, Singapore, UK, and Japan, with an MSc in Machine Learning from UCL.

Sam Passaglia: Lead Machine Learning Engineer at Elyza, where he develops and trains foundation models tailored for Japanese business usage. Previously a research scientist at the University of Tokyo, he holds a PhD in Astrophysics from the University of Chicago.

Location

artience Co., Ltd.

Japan, 〒104-0031 Tokyo, Chuo City, 中央区Kyōbashi, 2-chōme−2−１京橋エドグラン

When you arrive at Kyobashi Edogran, take the high-speed elevator from the 3rd floor to the 22nd floor, then scan the QR code at the gate on the 22nd floor to get to the 29th floor. NOTE the QR is not the Luma QR, but the one you'll receive by email if you register be end of Monday (if you don't register by then, it will be hard for us to register you in the system of this space).

Hosted By