A Gentle Introduction to LLM Evaluations
How do you evaluate the quality of your LLM-powered systems? Be it a RAG-powered chatbot or a simple summarization feature, you cannot skip the evals. In this webinar, we'll discuss this topic from the first principles. The good news is that we can learn a lot from a decade of working on non-deterministic ML products. The bad news, there is work to do.
Outline:
- When you need evals: from initial experiments to production monitoring
- What can we learn from ML evaluations and apply to LLMs
- Different LLM evaluation methods (from LLM-as-a-judge to regular expressions) and how to approach them
- Evals are hard - where can you realistically start?
About the speaker
Elena is the CEO and Co-founder of Evidently AI, a startup that builds an AI observability platform to evaluate, test, and monitor AI-powered systems. They are creators of Evidently, a popular open-source tool for ML evaluation and monitoring with over 20 million downloads.
Elena has been active in applied machine learning since 2014. Previously, she co-founded and served as CPO of an industrial AI startup, working with global metal and chemical companies on machine learning for production optimization. Before that, she led business development at Yandex Data Factory, delivering ML-based solutions across retail, banking, telecom, and other industries.