

TAI AMA #09 - Exploring LLMs: Compliance, Context Protocols & Efficient Knowledge Retrieval
Summary
Four talks focused on the practical limits, applications, and infrastructure of large language models. Topics include prompt reliability at scale, intuitive models for using the Model Context Protocol (MCP), techniques for improving LLM knowledge retrieval using cache-augmented generation, and the development of a decentralized GPU-sharing platform.
Schedule
18:30 Doors open
19:00 - 20:00 Speaker sessions
20:00 - 20:05 Vector Inc. intro (venue supporter)
20:05 - 21:00 Networking
21:00 Event ends
Talks
Talk 1: LLMs Follow Rules — Until They Don’t: Prompting Isn't Enough at Scale
Speaker: Alan Roth (Head of Product, Amazon Business Japan)
Abstract: Large language models typically follow instructions, but they deviate significantly when prompts become longer or more complex. Alan will share insights from recent experiments revealing exactly where LLM rule adherence sharply declines as constraints scale beyond a surprisingly low threshold. Key insights include: 1) how prompting, although fragile, can significantly improve performance, 2) practical implications for prompt engineering at scale, and 3) how inconsistent rule adherence impacts the reliability of agentic AI systems.
Bio: Alan Roth is Head of Product for Amazon Business Japan, building products that simplify buying for Japanese businesses of all sizes. Previously, Alan led product for Alexa Japan, developing Japanese experiences across Echo, FireTV, and partner devices. He is passionate about experimentation and using data-driven insights to deliver outstanding customer experiences. Alan holds an MBA from Carnegie Mellon University and a B.S. in Electrical Engineering from Cornell University.
Talk 2: Intuitive Jargon-Free Mental Model for MCP
Speaker: Prashant Anand (Staff ML Engineer, Mercari)
Abstract: You've probably heard the buzz about MCP (Model Context Protocol). You might have read the docs and blogs claiming it's the "USB-C port for AI applications," and perhaps even used it. But deep down, you're still asking: "Why can't we just use HTTP calls?" or "Why can't we just tell the LLM what commands to run?" These questions used to trouble me, too. That is, until I had my "aha" moment for MCP. We all have those moments when learning something new, like grasping conditionals and loops in programming or "getting" Kubernetes, where everything clicks, everything falls into place, and everything makes sense. In this talk, I'll share the two "aha" moments that made MCP click for me. First, through a simple browser analogy that will make you wonder why nobody explained it this way before. Second, by pulling back the curtain on what MCP actually does to your prompts. With this talk, my promise is that you will get to your aha moment and leave with a clear, jargon-free mental model of MCP.
Bio: Staff ML Engineer at Mercari (5+ years) in Tokyo, building production ML systems, applying ML/NLP/LLMs to transform customer support, IIT Delhi B.Tech 2019, speaker at PyCon JP 2024 & APAC 2023.
Talk 3: Cache Augmented Generation for Optimized LLM Knowledge Retrieval
Speaker: Ronan Takizawa (Tech Content Creator)
Abstract: Cache-Augmented Generation (CAG) is a technique for external knowledge insertion in LLMs. CAG preloads relevant knowledge into a language model's context as a precomputed key-value (KV) cache, which can lead to faster and more efficient question-answering compared to RAG. This presentation will go over how CAG works, when to use it, and how to implement it.
Bio: Ronan Takizawa is a Japanese-Irish undergraduate computer science student and an active tech content creator (@ronantech) with over 100k followers across social media. He has found success as a computer science student by winning the 2023 Harvard hackathon and building a boxing performance analytics software that got acquired by POWA Boxing LLC. He also previously worked as a research assistant at NYU's Secure Systems Lab and was an open-source contributor to ZKSecurity's zero-knowledge proof projects.
Talk 4: Building a Decentralized GPU Pool
Speaker: Bernd Hollerit (CTO, Jasmy Lab)
Abstract: Have a GPU that you are not using? Need more GPUs to power your projects? Jasmy Lab's JANCTION project is a decentralized GPU pool to which you can connect your own GPU to earn rewards or rent GPUs to train your AI models. We also offer services such as video rendering, audio stem separation, and many upcoming applications.
Bio: Bernd Hollerit received his B.Sc. and M.Sc. degrees in software development and business from the Graz University of Technology, Austria, and his Ph.D. degree from the School of Engineering, the University of Tokyo, Japan. Dr. Hollerit is currently employed as Chief Technology Officer at Jasmy Lab. His interests include artificial intelligence, machine learning, cryptocurrencies, cybersecurity, gamification, usability, and natural language processing.
Supporters
Vector Inc. is a leading Japanese PR agency, founded 32 years ago and listed on the Tokyo Stock Exchange Prime Market (TYO: 6058). As Asia’s No.1 Total Communication Group and Japan’s top PR agency, it was ranked 6th globally in the PRovoke 2024 ranking. With offices in 9 countries and 13 locations across Asia, Vector provides comprehensive services, including PR, D2C, HRTech, AI solutions, and investment. The company has over 50 subsidiaries and a portfolio of 250 companies
Our Community
Tokyo AI (TAI) is the biggest AI community in Japan, with 2,400+ members mainly based in Tokyo (engineers, researchers, investors, product managers, and corporate innovation managers).