The Fourth vLLM Meetup x AGI Builders Meetup
Looking forward to see everyone! If you are on the waitlist, feel free to still join us!
👋 We're thrilled to invite you to the 4th vLLM meetup, as part of the monthly AGI Builders meetup on June 11th.
❤️ It's a gathering where AI builders, researchers, and enthusiasts share ideas, inspire peers, and transform the future. In particular, this event will connect vLLM users and developers to share and learn together.
💡 In this event, engineers from BentoML and vLLM teams will share recent update.
🍕 Light refreshments will be available.
Agenda:
5:30 pm - 6:00 pm: Doors open and check-in.
6:00 pm - 6:10 pm: Opening
6:10 pm - 7:00 pm: Tech Talks
(20min) Scaling LLMs like you mean it
By Sean Sheng, Head of Engineering, BentoML
(20min) vLLM Project Update and Spec Decode Deep Dive
By Woosuk Kwon, Kaichao You, Lily Liu, UC Berkeley
(10min) Q&A
7:00 pm - 8:00 pm: Networking
About the talks:
Talk 1: Scaling LLMs like you mean it
Speaker: Sean Sheng, Head of Engineering, BentoML
Abstract: With vLLM significantly enhancing the efficiency of open-source LLMs, deploying these models in production environments still presents considerable scaling challenges. Although serverless architecture promises flexible resource allocation and cost efficiency, deploying LLMs on serverless GPUs faces specific hurdles such as cold starts, elastic scaling, and inference orchestration. This talk will explore these challenges and discuss the solutions we have implemented at BentoML to build a robust AI model inference platform.
Talk 2: vLLM Project Update
Speaker: Zhuohan Li, Woosuk Kwon, Simon Mo; vLLM maintainers, UC Berkeley
Abstract: In this talk, vLLM maintainers will share update about the project, diving into recent feature additions, and unveil upcoming project roadmap.
About the hosts:
Cloudflare helps organizations make employees, applications, and networks faster & more secure.
BentoML empowers developers to run any AI models in the cloud and scale with confidence.
vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. It is an open source project contributed by many and adopted across industry.
Note:
This event will be held in person, and due to limited capacity, registration is required for entry. Registration will close 2 days before the event.
We host monthly meetups in San Francisco, have an idea you'd like to present at future events? Please apply here.