Cover Image for Ray + vLLM in Action: Lessons from Pinterest and Deepseek Deployments
Cover Image for Ray + vLLM in Action: Lessons from Pinterest and Deepseek Deployments
Avatar for Anyscale
Presented by
Anyscale
Advance Your AI Platform with Anyscale.
Hosted By
97 Going

Ray + vLLM in Action: Lessons from Pinterest and Deepseek Deployments

Registration
Welcome! To join the event, please register below.
About Event

Join us for our next Ray Meetup where we’ll explore batch inference at scale with Ray and vLLM! Learn how Pinterest scales batch inference using Ray, and get a first look at Anyscale’s latest tools—Ray Serve and Data LLM—for orchestrating large-scale LLM inference. We’ll cover topics like batch inference, prefill-decode disaggregation, DP/EP parallelism, and custom request routing.

📆 Tuesday, June 10th, 2025

🕔 5:00pm

📌 55 Hawthorne St, San Francisco

Speakers:
Chia-Wei Chen, Software Engineer, ML Training Infra, Pinterest
Kourosh Hakhamaneshi, AI Lead, Anyscale

Agenda:

  • ​5:00pm: Doors open, check-in, networking

  • ​6:00pm: 📈 From Struggle to Scale: Lessons from Scaling Ray Batch Inference on Hundreds of Kubernetes Nodes, Chia-Wei Chen, Pinterest

    • Scaling Ray batch inference from a single node to hundreds exposed important challenges and best practices required at scale. Using an internal CLIP architecture based text + image embedding model as an example, we discuss key lessons learned, including establishing reliable checkpointing, robust node failure recovery, and autoscaling with KubeRay to ensure incremental progress. By fusing Ray data operations and optimizing for data locality, we substantially reduced memory footprint and data transfer costs. Fine-tuning memory configurations and upgrading Ray and PyTorch were also critical to achieving stability at scale. With these optimizations, we met our target throughput, transforming our workflow from struggling at 2% completion to delivering consistent, reliable processing at 300x the original scale.

  • ​6:30pm: 💥 Deploying Deepseek Inference Stack with vLLM and Ray, Kourosh Hakhamaneshi, Anyscale

  • ​7:00 - 8:00pm: 🤝 networking & 🍕 

About Anyscale

Anyscale, the company behind Ray open source, is a fully-managed, enterprise-ready unified AI platform. With Anyscale, companies can build, deploy, and manage all their AI use cases, bringing transformational AI products to market faster.

​Join the Ray Community

Location
Anyscale
55 Hawthorne St 9th Floor, San Francisco, CA 94105, USA
Avatar for Anyscale
Presented by
Anyscale
Advance Your AI Platform with Anyscale.
Hosted By
97 Going