Community Paper Reading: Evaluating LLMs as Agents
Registration
Past Event
About Event
Join Arize Co-Founder, Jason Lopatecki, and ML Growth Lead, Amber Roberts, as they discuss “AgentBench: Evaluating LLMs as Agents”. This paper explores AgentBench, the first benchmark designed to evaluate LLMs' ability to operate as autonomous agents in various scenarios. We'll talk through the paper's finding of a significant performance gap between leading commercial API-based LLMs and open-source alternatives, and the impact that disparity will have on the future the industry.
Link to paper: https://arxiv.org/abs/2308.03688