

AI Real Talk Series: Benchmarking AI Tools for Newsrooms: Measuring LLMs the Journalist Way
As more newsrooms adopt large language models (LLMs) to support reporting, most evaluation still relies on tech-centric benchmarks—missing the mark on what journalists actually need: accuracy, transparent sourcing, and editorial integrity. In this Hacks/Hackers Real Talk conversation, Charlotte Li, Jeremy Gilbert, and Nicholas Diakopoulos from Northwestern University’s Generative AI in the Newsroom (GAIN) initiative will share insights from their May 2025 workshop with newsroom practitioners on how to build meaningful benchmarks grounded in real-world newsroom tasks. We’ll cover:
Why are rubric-based benchmarks that reflect editorial values and task-specific context important for journalism
How to inform and design your evaluation and benchmarking approach
Details on how we developed an example benchmark for a set of information extraction use cases
This session is designed for newsroom leaders, data journalists, audience and product teams, innovation units, and anyone assessing or implementing AI tools in journalism who wants to ensure alignment with core editorial principles.