LLM Evaluation Essentials
Step into the world of LLM evaluations with a 3-part series dedicated to achieving production excellence. We’ll unpack advanced evaluation techniques and best practices formulated through rigorous testing — spanning retrieval, summarization, and hallucination — to help ensure production readiness. A must-attend for AI & ML engineers and data scientists.
This series will cover:
Binary LLM performance evaluation and its benefits
Golden datasets and how to use them
Statistical analysis of performance of GPT-4, GPT 3.5 and more
Best practices for LLM evals
Session 1 (10/3): Benchmarking and Analyzing Retrieval Approaches
Session 2 (10/10): Statistical Analysis of Summarization LLM Evaluations
Session 3 (10/16): Statistical Analysis of Hallucination LLM Evaluations