
SF Systems Meetup: Data Storage and Science
Summer may be coming to an end, but the SF Systems Meetup is back for another year of deep technical talks! We’re excited to kick off the next year of meetups with two exciting talks spanning topics from data storage at scale to modern interfaces for interacting with data.
Scott Andreas (Apple): When Faulty Hardware Happens to Flawless Databases: Checksumming in Modern Storage
Akshay Agrawal (Marimo): Representing Python Notebooks as Dataflow Graphs
This meetup is generously hosted by Eventual at their SF office, with food and drinks sponsored by Amplify Partners.
Agenda
5:30 PM: Doors Open, Food and Socializing
6:30 - 7:30: Introductions and Talks
7:30-: Community Socializing!
Talks
When Faulty Hardware Happens to Flawless Databases: Checksumming in Modern Storage (Scott Andreas)
Advancements in software validation, simulation, and test infrastructure have helped us build increasingly reliable systems. But computing itself is still a stochastic project. In a world where any bit can flip in systems that are more complicated than ever, how can any database operate safely?
We’ll start by surveying potential sites of data corruption and miscomputation in modern systems. We’ll go deep on techniques to enable rapid detection and recovery. And we’ll close looking ahead to future directions: can we project and assert a “clean room for bit flips” throughout the boundary of a distributed system? Designing systems for transactional integrity requires care at every level of the stack.
Representing Python Notebooks as Dataflow Graphs (Akshay Agrawal)
marimo is a new kind of (open-source) Python notebook that can be used as a reactive programming environment, a Python script, and as an interactive web app. In this talk, we discuss the design and implementation of marimo, with an emphasis on marimo's intermediate representation of notebooks as dataflow graphs on cells. This graph is inferred using static analysis, analyzing each cell to determine the variables it defines and the variables it references, with edges encoding data dependencies. We discuss the benefits of modeling notebooks in this way, including reproducibility in execution and callback-less interactive elements, in addition to reusability as regular Python programs. We also discuss practical affordances we provide to make interactive dataflow programming work in practice, and conclude with a sequence of examples illustrating how we leverage the graph for a variety of applications, such as notebook composition, top-level function extraction, and caching.