Animal Benchmark Building Session

Hosted by AI for Animals

Register to See Address

Berkeley, California

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Just before the AI for Animals conference, join us for a 4 hour coworking session to create a new, light benchmark!

Description: Animal Harm Assessment (AHA) project gave us data: over 100,000 answers from 10 chatbots to 4,350 curated questions. The answers have been scored on whether they increase (or decrease) the risk of harm to animals. Some of the QA pairs could plausibly form the "gold standard" for a light QA benchmark (as opposed to an open ended one). Key unresolved questions to address:

Which questions and which answers to choose?
How to set the benchmark up technically?
How to increase the use of this (and other) animal-related benchmarks?

Come if you are interested, but especially great if you have some familiarity with:

benchmarks, benchmark development
statistical methods
python, ideally - also with Inspect Evals framework
people, labs, institutions who could use and promote this and other animal benchmarks

RSVP to this event page & share it with others who could be interested!

With questions, please reach out to:
Arturs Kanepajs, AI for Animals Benchmarking Lead akanepajs@gmail.com
Constance Li, AI for Animals Founder constance@aiforanimals.org

There will be snacks and light refreshments.

Expected agenda:
12:00-12:30 - Introductions and overview of goals
12:30-14:00 - Working session on benchmark development
14:00-14:15 - Break for refreshments
14:15-15:45 - Continue working session
15:45-16:00 - Wrap-up and next steps

Some more materials - if you can, take a look before the session:

Draft paper on AHA Benchmark
A very short presentation on the AHA Benchmark
The benchmark repository
CSVs with public split results:
- questions (~3k) and answers from 11 models
- 23 runs (~70k)
- each answer assessed and scored by 3 LLMs-as-judges

To get updates about the outcomes and next steps: join Hive Slack (www.joinhive.org), #s-llm-benchmarking channel.

Location

Please register to see the exact location of this event.

Berkeley, California

Hosted By

12 Went