Making Open Source and Local LLMs Work in Practice x MLOps Community

Hosted by Rahul Parundekar & 4 others
Registration
Past Event
Please click on the button below to join the waitlist. You will be notified if additional spots become available.
About Event

Using Open source LLMs? Join fellow tech enthusiasts and practitioners for an exciting meetup to share notes, horror stories, and successes on what works and what doesn’t when trying to use open source LLMs for application patterns like RAG, function/API calling and fine-tuning. 

Our talks and discussions will also cover how running small and specialized (e.g fine-tuned) LLMs locally or on-prem enable new and interesting patterns like pushing information processing to where the data is.

Speakers will be joining from various parts of the world, including from companies: Helix.ml, Expanso, Weaviate, Acorn Labs, Sailplane, Continue, and more companies being added! Whether you're a seasoned ML engineer or just getting started with LLMOps, this meetup offers valuable insights and connections in the rapidly evolving world of AI and distributed systems.

Join us for this meet up in collaboration with MLOps Community SF Chapter, Heavybit, GenLab, and True Ventures!

(Format = lightning talks + group Q&A. Order of talks TBD)

MC: Tamao Nakahara - Ainekko COO

Confirmed talks:

KubeAI Creator Sam Stoelinga:

“Live demo: Local LLMs and RAG with Weaviate and KubeAI”

Learn how to deploy generative LLMs and embedding models locally on Kubernetes with KubeAI. Spin up a Kind cluster on your laptop and follow along. No GPUs needed.

Once done, spin up a RAG application that uses your local deployment.

Open Source AI Researcher Aditya Advani:

“Learnings from Self-hosting Llama 3.1 405B (FP8)”

This talks covers how to self-host Llama 3.1 405B (FP8) and learnings from how I did it.

Sailplane CEO Sam Ramji:

“Fine Tuning in Practice: Building The Data Engine”

This talk will share notes from our journey at Sailplane from concept to fine tuning: prototyping the algorithm, eval, and data engine.

Continue CEO Ty Dunn:

“How to own your AI code assistant”

In this talk, Ty will share how you can use Continue to offer custom autocomplete and chat experiences to your developers entirely within your own environment. Topics covered will include running open weight models on-premise, creating dashboards to understand usage data, setting up a code RAG system on a server, fine-tuning models on your development data, and more.

Acorn Labs Lead AI Engineer Sanjay Nadhavajhala:

“Fine-Tuning LLMs for Multi-Turn Function Calling”

GPTScript is an LLM app development framework that leverages complex, multi-turn function calling to enable LLMs to operate and interact with various systems. Multi-turn function calling is a capability in LLMs where the model can maintain a conversation over multiple exchanges while executing a series of tool or function calls. In each “turn,” the model may perform a specific task, retrieve information, or make a decision based on the context provided so far. The state or output from each turn is carried forward, allowing the model to handle more complex tasks that require several steps or interactions, much like a human conducting a conversation or interacting with a system over multiple exchanges. In this talk, Sanjay will walk you through the intricacies and challenges of multi-turn function calling, fine-tuning open-source models like Llama3, Mistral, and Phi via block expansion, and lessons learned along the way.

Weaviate Co-Founder & CTO Etienne Dilocker:

“A Story of Two Extremes: Large-Scale Vector Search in E-Commerce and Email RAG”

Join me for an insightful exploration into the contrasting worlds of large-scale vector search across two distinct domains: e-commerce product search and Email Retrieval-Augmented Generation (RAG). This talk will delve into the unique challenges posed by a single, billion-scale dataset in e-commerce versus managing numerous partitions of smaller datasets in email inboxes. We’ll discuss how to choose the right infrastructure and configurations for each scenario, balancing the demands of constant updates in e-commerce with the relatively static nature of email data. Attendees will gain actionable insights to identify their own use case and learn how to set up for success with cost-effective, tailored solutions.

Expanso CEO David Aronchick:

“Managing Distributed Data and Workloads in our Increasingly Global, Multi-Cloud World”

Join us for an exciting deep dive into Expanso's groundbreaking "compute over data" platform, Bacalhau, addressing the critical challenges of managing distributed data and workloads in our increasingly global, multi-cloud world. David Aronchick, drawing on his experience building distributed systems, will showcase how Expanso and Bacalhau enable universal and reproducible compute, solving key issues around data gravity, regulatory compliance, and cross-organizational collaboration. Don't miss this opportunity to learn about a technology that could revolutionize how we approach ML workloads, edge computing, and data processing at scale - potentially changing the game for any organization grappling with distributed data challenges.

Helix.ml CEO Luke Marsden:

“Notes from the trenches: making LLM application patterns work with open source LLMs”

We’ll share our experience making API calling, RAG and synthetic data generation from source documents for fine tuning work with open source LLMs from Mistral-7B to Llama3.1. We’ll cover the prompting that worked, what didn't work, what was flaky, and how we're measuring it with evals loops and customizability for real customer applications. We’ll also touch on function calling in open source LLMs, ollama’s recent support for this, gptscript on open source LLMs, and what this enables. Finally, is it too early to create a Kubernetes objects style abstraction over LLM application patterns? Yes, but we're going to do it anyway...

Take Helix's challenge and try to earn a shirt at the meetup!

Location
Digital Garage US, Inc.
717 Market St #100, San Francisco, CA 94103, USA