Cover Image for Evaluating LLMs: Needle in a Haystack
Cover Image for Evaluating LLMs: Needle in a Haystack
Avatar for Arize AI
Presented by
Arize AI

Evaluating LLMs: Needle in a Haystack

Register to See Address
San Francisco, California
Registration
Registration Closed
This event is not currently taking registrations. You may contact the host or subscribe to receive updates.
About Event

LLM evaluation is a discipline where confusion reigns and foundation model builders are effectively grading their own homework.

Building on the viral threads on X/Twitter,  Greg Kamradt, Robert Nishihara, and Jason Lopatecki discuss highlights from Arize AI's ongoing research on how major foundation models – from OpenAI’s GPT-4 to Mistral and Anthropic’s Claude – are stacking up against each other at important tasks and emerging LLM use cases, covering and explaining the importance of results of Needle in a Haystack tests and other evals results on hallucination detection on private data, question-and-answer, code functionality, and more.

Curious which foundation models your company should be using for a specific use case – and which to avoid? You won’t want to miss this meetup!

-------

Agenda:

  • 5:30 PM - 6:00 PM: Arrival & Networking

  • 6:00 PM - 6:30 PM: Fine-tuning for Context Length Extension + Q&A w/ Kourosh Hakhamaneshi

  • 6:30 PM - 7:15 PM: Evaluating LLMs: Needle in a Haystack Fireside Chat + Q&A

  • 7:15 - 8:00: Networking & Drinks

Location
Please register to see the exact location of this event.
San Francisco, California
Avatar for Arize AI
Presented by
Arize AI