Cover Image for Community Paper Reading: Evaluating LLMs as Agents

Community Paper Reading: Evaluating LLMs as Agents

 
 
Zoom
Registration
Past Event
Welcome! To join the event, please register below.
About Event

Join Arize Co-Founder, Jason Lopatecki, and ML Growth Lead, Amber Roberts, as they discuss “AgentBench: Evaluating LLMs as Agents”. This paper explores  AgentBench, the first benchmark designed to evaluate LLMs' ability to operate as autonomous agents in various scenarios. We'll talk through the paper's finding of a significant performance gap between leading commercial API-based LLMs and open-source alternatives, and the impact that disparity will have on the future the industry.

Link to paper: https://arxiv.org/abs/2308.03688