Llama-2 or OpenAI? How to compare LLMs using A/B testings
Should we use an open source model like Llama-2 or Open AI APIs? The best way to decide which one works better in production is through A/B testing. In this workshop, we'll show you how to set up an A/B testing for LLMs. We'll create an use case using Llama-2 and OpenAI APIs, and show you how to set up A/B tests and analyze results.
We'll cover:
Experimentation for AI: Discover why continuous experimentation is a non-negotiable aspect of AI, and how it impacts your model choices.
Demystify Llama-2: Uncover how this LLM operates and its unique characteristics.
A/B Testing for LLMs Setup: How to use Eppo's feature flagging to run experiments on a LLM
Case Study - Text Extraction: See Llama-2 in action, extracting unstructured text from a resume, revealing its practical application.
Analyze A/B Testing Results: Learn what measures are crucial when comparing LLMs, guiding your decisions with precision.
Speakers
Daliana Liu is a senior data scientist at Predibase. Previously, she worked on A/B testing for 3 years at Amazon, and developed machine learning models for AWS customers. Daliana has 200k followers on Linkedin talking about A/B testing and machine learning.
Sven Schmit is the head of statistics engineering at Eppo - the next-gen A/B testing platform. Sven has a phD in computational mathematical engineering from Stanford.
This is a great opportunity for you to learn both LLMs and A/B testing, free to sign up.