Safe-by-Design Open-Source AI w/ Himanshu Tyagi (alphaXiv x Sentient)
Join our second Community Meetup, featuring Himanshu Tyagi from Sentient.
He’ll guide us through the principles of building open-source AI that’s safe-by-design, followed by a live Q&A.
Abstract: People have been worried about making AI aligned with humans even before we had AI. But it was all vague, science-fiction articulation of the alignment problem. Now that we have seen what AI looks like, and know how it will start using software tools to take actions in the digital and the physical worlds, the exact nature of these attack vectors is becoming clear. The challenges are threefold: rogue models may not be identified; understanding what models truly believe is difficult; and complex agentic frameworks make many different attacks possible. In this talk, we present our research towards addressing these challenges. First, we present a fingerprinting primitive that allows us to add identity to open models, which is difficult to forge or remove. Next, we discuss our experiments on training a community-governed model with distinct personality to be aligned with values selected by the community, highlighting the challenges seen. Finally, we discuss attack vectors that we discover in a popular open source agentic framework by exploiting memory injection through multiple channels. The overall goal is to advocate an approach of open models, subjected to open audit and red-teaming, towards designing AI to be loyal to the community using it.
🔒 AI Security on αlphaXiv — 2nd Community Meetup📅 Monday, July 14, 2025 • 10 AM PT
🎤 Featuring: Himanshu Tyagi (Sentient)
💬 Format: Talk + Live Q&A