Cover Image for The One About AI Testing

Presented by

Lorong AI is a co-working hub for AI practitioners to connect and grow through curated programming and a collaborative environment. Register your interest here: go.gov.sg/lai-interest

Hosted By

Featured in

Singapore

The One About AI Testing

Name: The One About AI Testing
Start: 2025-07-16T15:00:00.000+08:00
End: 2025-07-16T17:00:00.000+08:00
Location: Lorong AI (WeWork@22 Cross St.)

Lorong AI

Lorong AI (WeWork@22 Cross St.)

Past Event

Please click on the button below to join the waitlist. You will be notified if additional spots become available.

You will be asked to verify token ownership with your wallet.

About Event

How do we ensure AI systems are both reliable and safe for our context? From enterprise deployments to local language safety, join us to explore critical aspects of making AI work in our region.

More About the Sharings:

Gabriel (Data Scientist, GovTech's AI Practice) will share more about RabakBench, a safety benchmark for Singapore’s context. Large language models (LLMs) and their safety classifiers often perform poorly on low-resource languages due to limited training data and evaluation benchmarks. Evaluations of 11 popular open-source and closed-source guardrail classifiers on RabakBench reveal significant performance degradation. In this talk, they will share how they built this new multilingual safety benchmark localized to Singapore’s unique linguistic context, covering Singlish, Chinese, Malay, and Tamil. In particular, they will share how they leveraged LLMs to scale human supervision for both annotation and translation. (Technical Level: 200)

Shameek (Executive Director, AI Verify Foundation) will share more about "GenAI Accuracy, Reliability in Real-World Scenarios". Hear more insights from their Global AI Assurance Pilot, which tested GenAI applications across 17 use cases and 30 global companies. While most efforts focus on model safety, real-world deployment demands attention to end-to-end system reliability. Through practical examples spanning 10 industries, discover how context and complexity shape AI performance at scale and learn more about testing frameworks and risk assessment methods that bridge the gap between lab performance and real-world success, helping organizations build GenAI solutions that truly deliver value. (Technical Level: 100 -200)

More About the Speakers:

Gabriel Chua is a Data Scientist at GovTech, focusing on MLOps, LLM solutions, and Responsible AI. He co-organizes AI Wednesdays and various community tech events, bringing together AI practitioners across Singapore. Previously a policy analyst at the Ministry of Health working on Healthcare Finance, he holds degrees from LSE (Economics) and MIT (Business Analytics). Outside of tech, you'll find him at pilates, hiking trails, or enjoying craft beer.

Shameek Kundu (Executive Director, AI Verify Foundation) is a senior Data and AI professional, with 25+ years of experience across AI safety and testing (AI Verify, TruEra), Financial Services (Group CDO at Standard Chartered) and Consulting (McKinsey). Before joining AI Verify, Shameek helped build and scale an AI testing software business at Silicon Valley startup TruEra. He serves/ has served on multiple consultative forums on AI governance, including those of the Bank of England, the Monetary Authority of Singapore and the OECD/ Global Partnership on AI.

Location