Cover Image for Evaluating LLMs - recent practice and new approaches - AIFoundry.org Podcast
Cover Image for Evaluating LLMs - recent practice and new approaches - AIFoundry.org Podcast
Avatar for AIFoundry.org
Presented by
AIFoundry.org
In-person and virtual community events of AIFoundry.org
Hosted By
4 Going

Evaluating LLMs - recent practice and new approaches - AIFoundry.org Podcast

Virtual
Registration
Past Event
Welcome! To join the event, please register below.
About Event

Providing AI developers with more choices through open models & platforms is a nice idea, but we need ways to evaluate what what our choices should be as they map to our application requirements.

Yulia Yakovleva will join us with her review of the the foundational paper “Evaluating Large Language Models: A Comprehensive Survey” by Zishan Guo and colleagues. This paper categorizes the evaluation of LLMs into knowledge and capability, alignment, and safety. It underscores the necessity for thorough evaluation to harness the benefits of LLMs while mitigating risks such as bias and misinformation. The most direct example of this approach is embodied in the Open LLM Leader’s Board on Hugging Face.

More recently, a new paper, “LiveCodeBench: Holistic and Contamination-Free Evaluation of Large Language Models for Code” by Naman Jain and colleagues that provides a new approach for augmenting evaluation of LLMs. While this was specific to code generation, we’ll discuss how this approach.

AIFoundry.org podcasts are held live in front of a virtual studio audience in the AIFoundry.org Discord community: https://discord.gg/WNKvkefkUs

Avatar for AIFoundry.org
Presented by
AIFoundry.org
In-person and virtual community events of AIFoundry.org
Hosted By
4 Going