The Art of RAG Evaluation
“How do I know if what comes out of my LLM application is correct?”
“What does a good output look like?”
“How can I avoid hallucinations and wrong answers?”
Everyone working to develop production LLM applications is asking these questions, and rightly so!
Until recently, these were nearly impossible questions to answer quantitatively. With the advent of open-source evaluation tools like RAG ASsessment (RAGAS) and the emergence of built-in evaluation tools within emerging LLM Ops platforms like LangSmith, it’s getting easier to measure how good, right, or correct LLM application outputs are. Further, with Metrics-Driven Development (MDD), we can actually use these quantitative measures to directionally improve our applications systematically.
While it still remains a bit of a black art today, we are beginning to get some real clarity on best practices for the industry looking to build production LLM applications.
In this event, we’ll do a deep dive into the art of evaluating complex LLM applications that leverage Retrieval Augmented Generation (RAG), the technique that aims to ground the outputs of LLMs in fact-checkable information.
We will begin by building a simple RAG system using the latest from LangChain v0.1.0 before baselineing the performance of our system with the RAGAS framework and metrics. We will explore each calculation used to estimate performance during Retrieval (Context Recall, Precision, and Relevancy), Generation (Faithfulness, Answer Relevance), and throughout our entire RAG pipeline (Answer Semantic Similarity, Answer Correctness).
We will then focus on improving key retrieval metrics by making advanced retrieval upgrades to our system. Finally, we’ll discuss important tradeoffs that come with improvements during any production AI product development process and the limitations of using quantitative metrics and AI evaluation approaches!
Special thanks to LangChain and RAGAS for partnering with us on this event!
In this event, you'll learn:
How to build and improve a RAG system in LangChain v0.1.0
How to leverage the RAGAS framework for Metrics-Driven Development
The limitations of current RAG evaluation techniques and what to watch out for!
Speakers:
Dr. Greg Loughnane is the Co-Founder & CEO of AI Makerspace, where he serves as an instructor for their AI Engineering Bootcamp. Since 2021 he has built and led industry-leading Machine Learning education programs. Previously, he worked as an AI product manager, a university professor teaching AI, an AI consultant and startup advisor, and ML researcher. He loves trail running and is based in Dayton, Ohio.
Chris Alexiuk, is the Co-Founder & CTO at AI Makerspace, where he serves as an instructor for their AI Engineering Bootcamp. Previously, he’s held roles as a Founding Machine Learning Engineer, Data Scientist, and ML curriculum developer and instructor. He’s a YouTube content creator YouTube who’s motto is “Build, build, build!” He loves Dungeons & Dragons and is based in Toronto, Canada.
Follow AI Makerspace on LinkedIn & YouTube to stay updated with workshops, new courses, and opportunities for corporate training.