GenAI Applications: Development to Production

Name: GenAI Applications: Development to Production
Start: 2025-07-10T17:30:00.000-07:00
End: 2025-07-10T20:00:00.000-07:00
Location: 1740 Technology Dr (Room: 120-Cranium)

Hosted by Ray Paik & 3 others

1740 Technology Dr (Room: 120-Cranium)

Registration Closed

This event is not currently taking registrations. You may contact the host or subscribe to receive updates.

About Event

Join us for an opportunity to network with AI applications developers in the Bay Area! Hear insights from speakers representing Nutanix and PingCAP as they discuss the latest advancements in building and deploying scalable GenAI solutions.

Agenda

5:30 - 6:00 pm: Check-in/networking (food & drinks)
6:00 - 6:30 pm: Talk #1 - Lessons learned from managing GPU deployments on Kubernetes
6:30 - 7:00 pm: Talk #2 - Ship a Lightning-Fast FAQ & Stop Keyword Fails
7:00 - 7:30: Talk #3 - Setting the BAR: Balancing Budget, Authenticity & Reasoning in GenAI
7:30 - 8:00: Wrap-up/networking

Talk #1: Lessons learned from managing GPU deployments on Kubernetes

Speakers: Sonali Mishra & Shalin Patel (Nutanix)
Abstract: Handling workloads that require GPU on Kubernetes has become easier with tools like NVIDIA’s GPU Operator but deploying them in production across real-world environments brings hidden challenges. In this session, we will share lessons learned from managing GPU across different infrastructures including on-prem and across public clouds, air-gapped environments, and different Operating Systems. We will cover compatibility issues with drivers and runtimes, discovering GPU attached nodes and scheduling GPU workloads on Kubernetes clusters. We will also discuss our experience working with vGPUs, the challenges of enabling multi-tenancy, dynamic resource allocation and monitoring. And lastly, we will talk about how we addressed some of these challenges using the NVIDIA GPU operator and our Kubernetes operator for vGPU tokens and license management. This talk is ideal for platform engineers and architects bringing AI/ML to Kubernetes, and looking to scale GPU use efficiently, securely and with better observability.

Talk #2 - Ship a Lightning-Fast FAQ & Stop Keyword Fails

Speaker: Chris Dabatos (PingCAP)
Abstract: In this session, learn how anyone can build & ship a lightning-fast FAQ with TiDB's built-in vector search with AWS Bedrock and a few plain English Python scripts. There's no need for developers to deal with microservices or cluster baby sitting, and since TiDB is MySQL compatible, your existing "ORM" will just work.

Talk #3 - Setting the BAR: Balancing Budget, Authenticity & Reasoning in GenAI

Speaker: Jinan Zhou (Nutanix)
Abstract: There is no perfect GenAI cocktail—every system has to mix Budget, Authenticity, and Reasoning. Raise the BAR on two, and you’ll have to water down the third.

In this session, we’ll introduce the BAR Triangle, a framework for understanding why it’s impossible to fully optimize every dimension of a GenAI system at once. Through practical examples and case studies, you’ll learn how to map your own system onto the triangle, identify the hidden costs of maximizing certain aspects, and develop strategies for choosing the right BAR for your product. Whether you’re building enterprise AI, consumer chatbots, or mission-critical GenAI, this talk will help you make smarter, more transparent tradeoffs—and “set the BAR” that matters most for your goals.

Location

1740 Technology Dr (Room: 120-Cranium)

Hosted By

202 Went