Safe-by-Design Open-Source AI w/ Himanshu Tyagi (alphaXiv x Sentient)

Name: Safe-by-Design Open-Source AI w/ Himanshu Tyagi (alphaXiv x Sentient)
Start: 2025-07-14T10:00:00.000-07:00
End: 2025-07-14T11:00:00.000-07:00
Location: Online Event

Hosted by Jian Cui & alphaXiv

Zoom

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Join our second Community Meetup, featuring Himanshu Tyagi from Sentient.

He’ll guide us through the principles of building open-source AI that’s safe-by-design, followed by a live Q&A.
Abstract: People have been worried about making AI aligned with humans even before we had AI. But it was all vague, science-fiction articulation of the alignment problem. Now that we have seen what AI looks like, and know how it will start using software tools to take actions in the digital and the physical worlds, the exact nature of these attack vectors is becoming clear. The challenges are threefold: rogue models may not be identified; understanding what models truly believe is difficult; and complex agentic frameworks make many different attacks possible. In this talk, we present our research towards addressing these challenges. First, we present a fingerprinting primitive that allows us to add identity to open models, which is difficult to forge or remove. Next, we discuss our experiments on training a community-governed model with distinct personality to be aligned with values selected by the community, highlighting the challenges seen. Finally, we discuss attack vectors that we discover in a popular open source agentic framework by exploiting memory injection through multiple channels. The overall goal is to advocate an approach of open models, subjected to open audit and red-teaming, towards designing AI to be loyal to the community using it.

🔒 AI Security on αlphaXiv — 2nd Community Meetup
📅 Monday, July 14, 2025 • 10 AM PT
🎤 Featuring: Himanshu Tyagi (Sentient)
💬 Format: Talk + Live Q&A

Hosted By

74 Went

AI