Hosted by Yusi Chen & 4 others
Register to See Address
Sunnyvale, California
Registration
Event Full
If you’d like, you can join the waitlist.
Please click on the button below to join the waitlist. You will be notified if additional spots become available.
About Event

Join top engineers and researchers to explore the latest breakthroughs in AI infrastructure! Hosted by LMSYS Org, SGLang, FlashInfer and XGrammer, this event is your gateway to cutting-edge insights in the field.

We’re excited to feature three outstanding speakers:

  • Zihao Ye (Nvidia) – Lead of FlashInfer Project, PhD Student at University of Washington. Zihao is advised by Luis Ceze. He also work closely with Tianqi Chen on Machine Learning Compilers. He is also a research intern at NVIDIA, working with Vinod Grover. He is a recipient of NVIDIA Graduate Fellowship 2024-2025.

  • Yixin Dong – Ph.D. student in Carnegie Mellon University, advised by Prof. Tianqi Chen. He received his bachelor's degree in computer science from Shanghai Jiao Tong University. He was a research intern at the University of Washington, advised by Prof. Luis Ceze. His research interest is machine learning systems and LLM agent systems.

  • Zhiqiang Xie – Zhiqiang Xie is a PhD student at Stanford University, advised by Prof. Christos Kozyrakis. He is a recipient of NVIDIA Graduate Fellowship 2025-2026. He works on infrastructures to enable more efficient, scalable and complex compound AI systems, while enhancing the observability and reliability of such systems.

About LMSYS Org:
Large Model Systems (LMSYS Corp.) is a 501(c)(3) non-profit focused on incubating open-source projects and research. Our mission is to make large AI models accessible to everyone by co-developing open models, datasets, systems, and evaluation tools. We conduct cutting-edge machine learning research, develop open-source software, train large language models for broad accessibility, and build distributed systems to optimize their training and inference.

About SGLang:
SGLang is a high-performance serving framework for large language and vision-language models. It has been deployed to large-scale production, generating trillions of tokens every day.

About FlashInfer: FlashInfer is a library and kernel generator for Large Language Models that provides high-performance implementation of LLM GPU kernels such as FlashAttention, SparseAttention, PageAttention, Sampling, and more. FlashInfer focuses on LLM serving and inference, and delivers state-of-the-art performance across diverse scenarios.

About XGrammer: XGrammar is an open-source library for efficient, flexible, and portable structured generation. It supports general context-free grammar to enable a broad range of structures while bringing careful system optimizations to enable fast executions. XGrammar features a minimal and portable C++ backend that can be easily integrated into multiple environments and frameworks, and is co-designed with the LLM inference engine and enables zero-overhead structured generation in LLM inference.

Welcome to the party—see you there!

Location
Please register to see the exact location of this event.
Sunnyvale, California