Ray + vLLM in Action: Lessons from Pinterest and Deepseek Deployments

Name: Ray + vLLM in Action: Lessons from Pinterest and Deepseek Deployments
Start: 2025-06-10T17:00:00.000-07:00
End: 2025-06-10T20:00:00.000-07:00
Location: Anyscale

Anyscale

San Francisco, California

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Join us for our next Ray Meetup where we’ll explore batch inference at scale with Ray and vLLM! Learn how Pinterest scales batch inference using Ray, and get a first look at Anyscale’s latest tools—Ray Serve and Data LLM—for orchestrating large-scale LLM inference. We’ll cover topics like batch inference, prefill-decode disaggregation, DP/EP parallelism, and custom request routing.

📆 Tuesday, June 10th, 2025

🕔 5:00pm

📌 55 Hawthorne St, San Francisco

Speakers:
Chia-Wei Chen, Software Engineer, ML Training Infra, Pinterest
Kourosh Hakhamaneshi, AI Lead, Anyscale

Agenda:

5:00pm: Doors open, check-in, networking
6:00pm: 📈 From Struggle to Scale: Lessons from Scaling Ray Batch Inference on Hundreds of Kubernetes Nodes, Chia-Wei Chen, Pinterest
- Scaling Ray batch inference from a single node to hundreds exposed important challenges and best practices required at scale. Using an internal CLIP architecture based text + image embedding model as an example, we discuss key lessons learned, including establishing reliable checkpointing, robust node failure recovery, and autoscaling with KubeRay to ensure incremental progress. By fusing Ray data operations and optimizing for data locality, we substantially reduced memory footprint and data transfer costs. Fine-tuning memory configurations and upgrading Ray and PyTorch were also critical to achieving stability at scale. With these optimizations, we met our target throughput, transforming our workflow from struggling at 2% completion to delivering consistent, reliable processing at 300x the original scale.
6:30pm: 💥 Deploying Deepseek Inference Stack with vLLM and Ray, Kourosh Hakhamaneshi, Anyscale
7:00 - 8:00pm: 🤝 networking & 🍕

About Anyscale

Anyscale, the company behind Ray open source, is a fully-managed, enterprise-ready unified AI platform. With Anyscale, companies can build, deploy, and manage all their AI use cases, bringing transformational AI products to market faster.

Join the Ray Community

Join the Ray Community
Take the Ray Community Pulse Survey
Follow Anyscale on Linkedi n & Twitter / X
We are hiring! Check out the job openings

Location

Anyscale

55 Hawthorne St 9th Floor, San Francisco, CA 94105, USA

Presented by

Anyscale

Advance Your AI Platform with Anyscale.

Hosted By

238 Went

AI