Office Hours with Netflix
Discuss, showcase, learn, and share feedback with the Metaflow community!
In this edition, Michael Bao at Netflix will show how his team is building a model training service on their GPU farm with Metaflow.
In this talk, we share how we built a scalable model training service using Metaflow to seamlessly run our jobs on GPU-enabled machines at Netflix.
We walk through the migration of a PyTorch Lightning CLI script to a Metaflow flow, a process that required zero changes to our core model and data logic. To further streamline our workflow, we developed custom decorators that make our Metaflow jobs highly configurable from user input.
We conclude by demonstrating how we seamlessly deploy our training flows from a Python monorepo using our CI/CD pipeline.
We meet fortnightly on Tuesdays on Google Meet.