Demo Day
Demo Days are informal get-togethers where our team at Noema Research takes the opportunity to celebrate fresh advances across the "tech tree" with the broader builder community. The work we share focuses heavily on eliciting dual-use capabilities from AI systems, as well as on developing techniques for securing them. No slide decks are allowed, with live demos being at the heart of the series. Complementing them, however, are the communications we premiere to act as recaps of previous progress.
During the upcoming Demo Day, we plan to showcase the following:
Cybersecurity Simulator (Video by Beatrice). To pave the way for later updates, we first premiere upcoming materials documenting our procedural environment which allows users to engage with an endless stream of pentesting scenarios. This video recaps the broader context of our work, the architecture of the cyber range, and our experience field-testing it at DefCamp.
Cybersecurity Agent (Demo by Bogdan). Using the previously developed simulator, we elicited offensive cybersecurity capabilities in frontier models using a custom post-training pipeline which relies almost entirely on synthetic data. We showcase a live attempt of our autonomous pentester to navigate an end-to-end kill chain in an unseen scenario.
Engineering Simulator (Demo by Răzvan). Analogous to the cybersecurity simulator, we developed a simulator where players can tackle engineering challenges. Instead of exercises in breaking things, these are exercises in building things, such as a fully deployed product or a piece of research infrastructure. We showcase how a (human) player can use the simulator solve a challenge involving bringing to life a full deployment.
Structural Interpretability (Demo by Paul). Both autonomous hacking and autonomous R&D are dual-use capabilities. We are pursuing infrastructure for facilitating institutional oversight of their usage at the hardware level. As an early step, we explored a new flavor of interpretability focused on reverse engineering the structure of an inference deployment using memory profiling.
We are grateful for How To Web and Launch Romania for making this event possible.
Important Note: By registering for this event, you acknowledge that photographs and videos that may include you will be taken during the event for promotional and documentation purposes.