Cover Image for [Paper Reading] LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods

Presented by

We are a group of applied AI practitioners and enthusiasts who have formed a collective learning community. Every week, on Wednesday evening PM PST, we hold our research paper reading seminar.

Hosted By

2 Went

[Paper Reading] LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods

Name: [Paper Reading] LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods
Start: 2025-04-28T19:00:00.000-07:00
End: 2025-04-28T20:30:00.000-07:00
Location: 46540 Fremont Blvd

SupportVectors AI Events & Meetings

46540 Fremont Blvd

Fremont, California

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

This week, we will walk through and discuss the paper:
LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods [https://arxiv.org/pdf/2412.05579]

Abstract of the paper:
The rapid advancement of Large Language Models (LLMs) has driven their expanding application across various fields. One of the most promising applications is their role as evaluators based on natural language responses, referred to as ''LLMs-as-judges''. This framework has attracted growing attention from both academia and industry due to their excellent effectiveness, ability to generalize across tasks, and interpretability in the form of natural language. This paper presents a comprehensive survey of the LLMs-as-judges paradigm from five key perspectives: Functionality, Methodology, Applications, Meta-evaluation, and Limitations. We begin by providing a systematic definition of LLMs-as-Judges and introduce their functionality (Why use LLM judges?). Then we address methodology to construct an evaluation system with LLMs (How to use LLM judges?). Additionally, we investigate the potential domains for their application (Where to use LLM judges?) and discuss methods for evaluating them in various contexts (How to evaluate LLM judges?). Finally, we provide a detailed analysis of the limitations of LLM judges and discuss potential future directions. Through a structured and comprehensive analysis, we aim aims to provide insights on the development and application of LLMs-as-judges in both research and practice.

-------

Speaker: Krishnan Ramaswamy

Gen AI Product Development & Principal Architect @ Cisco for, AI, ML, and Gen AI-enabled computer networking products & solutions.

-------
We are a group of applied AI practitioners and enthusiasts who have formed a collective learning community. Every Wednesday evening , we hold our research paper reading seminar covering an AI topic. One member carefully explains the paper, making it more accessible to a broader audience. Then, we follow this reading with a more informal discussion and socializing.

You are welcome to join this in person or over Zoom (https://us02web.zoom.us/meeting/register/tZUvf-uvrTwvHdP9B-vE03j3BapgRypn64CS). SupportVectors is an AI training lab located in Fremont, CA, close to Tesla and easily accessible by road and BART. We follow the weekly sessions with snacks, soft drinks, and informal discussions.

Location