Weights & Biases Judgement Day Hackathon
Human annotations can be slow and expensive - using LLMs instead promises to solve this. However aligning a LLM Judge with human judgements is often hard with many implementation details to consider.
During this in-person hackathon, let's build LLM Judges together and move the field forward a little by:
Productionizing the latest LLM-evaluator research
Improving on your existing judge
Building annotation UIs
Designing wireframes for collaborative annotation between humans and AI
Is this for me?
This hackathon is for you if you are an AI Engineer who:
Runs LLMs in production or are planning to soon
Has built LLM Judges and found them to be unreliable
Wants to learn more about using LLMs as a judge
Are a LLM Judge skeptic ;)
Format
We're here for 2 days to hack on LLM-evaluators. Hack submissions will close at 2.30pm on Sunday.
Saturday: 10am registration - 10pm office closed
Sunday: 9.30am doors open - 5pm office closed
Coffee, lunch & dinner (Sat only) will be provided to registrants for both days. Visitors are welcome from 3pm on Sunday for the final demos.
Credits?
LLM API credits will be provided.
Prizes
$5,000 cash equivalent prizes will be awarded for top 3 overall projects with a bonus category for most on-theme projects.
Judges
Greg Kamradt, Founder, Data Independent
Eugene Yan, Senior Applied Scientist, Amazon
Charles Frye, AI Engineer, Modal Labs
Shreya Shankar, ML Engineer, PhD at UC Berkeley
Shawn Lewis, CTO and Co-founder, W&B
Anish Shah, Growth ML Engineer, W&B
Tim Sweeney, Staff Software Engineer, W&B
Rules
New projects only
Maximum team size: 4
Make friends
Prize eligibility:
Project is open sourced on GitHub
Use W&B Weave where applicable - zero pre-existing knowledge of Weave needed, our team onsite are there to help with any questions.
Timing - Please note this is an in-person event
Saturday, Sept 21: 10am-10pm
Sunday, Sept 22: 9:30am-5pm