

LLM-as-a-Judge Evals: Comparing Kimi, Qwen, and GLM
Join us for a practical walkthrough of how LLM-evaluators, also known as “LLM-as-a-Judge”, are transforming model evaluation. We’ll also explore key use cases, prompting techniques, and see how it can be used to choose between the newest line of SoTA open Kimi, Qwen and GLM models!🔥
What You'll Learn
This session is designed to help you scale evaluation workflows, improve consistency in model assessment, and reduce reliance on manual human annotation.
Why Attend?
Applied Guidance: Get a condensed, practical overview of LLM-evaluation best practices.
Research-Backed: Insights drawn from cutting-edge papers and current community tools.
Live Q&A: Ask questions and get answers from the Together AI team.
This online talk will cover:
Why LLM-as-a-Judge is emerging now
Direct scoring vs. pairwise comparison
Prompting strategies for consistent judgments
Practical use cases in the wild
Critiques, limitations, and safeguards
We look forward to seeing you there!