

EvalOps unfiltered: Evaluating LLM-based applications
LLMs behave differently with the slightest prompt tweak, context change, or input variation. If you're building anything real with GenAI, you already know the outputs can surprise you: and not in a good way. That’s why testing isn’t optional: it’s essential.
EvalOps Unfiltered is a practical event series for GenAI teams tackling the real-world challenges of evaluating LLM applications. Focused on the emerging field of EvalOps, it goes beyond benchmarks to address unpredictable model behavior, adversarial risks, and production readiness.
Each session features live experiments, tool deep-dives, breakout discussions, and honest conversations about what truly works when deploying LLMs.
What to expect (on 17. Sept):
🔧 Lightning talks from three teams presenting their real testing challenges — the kind that don't show up in research papers
🧠 Breakout sessions where you'll dig deep into one challenge, discuss solutions, share experiences, and test ideas with fellow builders
🍺 Drinks while the conversations continue
No panels, no pitches — just builders sharing what's actually broken and collaborating on what might work. This isn't about theory. It's about the unglamorous, critical work of making Gen AI systems reliable enough for the real world.
Location: Berlin, Germany; more details upon registration.
Target Audience:
Gen AI engineers wrestling with evaluation pre-release
Technical leads managing LLM-powered products
Data scientists designing and fine-tuning LLM-based applications
Product owners responsible for delivering reliable AI-driven features
Please note: Attending the event is only possible upon confirmed registration.
Rhesis AI (www.rhesis.ai) proudly hosts this event in collaboration with K.I.E.Z (www.kiez.ai).