
South Bay Systems: Speedrunning the Lakehouse
The South Bay Systems meetup is back! This time we’re excited to have Jacopo Tagliabue, co-founder and CTO of Bauplan, to dive into Speedrunning the Lakehouse: shipping a FaaS that looks like a database (and vice versa).
This meetup is generously sponsored by AI+, The Multiverse, and DBOS, Inc. Food and drinks will be provided!
Agenda
6:00 PM: Doors Open, Food and Socializing
6:30 - 7:30 PM: Talk
7:30-: Community Socializing!
Abstract :
The lakehouse architecture has become a foundational design for modern data and AI workloads. But this flexibility comes at a cost: users and system developers must navigate multiple APIs, conflicting abstractions, and overlapping execution models. What if we started from scratch, with simplicity in mind? In this talk, we discuss the technical challenges of building a "Function-as-a-Service" (FaaS) lakehouse: if workloads were “just” chained functions, users and developers could easily reason about the full data lifecycle!
We argue that existing FaaS platforms were never designed for data-intensive workflows. To address this, we built a new system from the ground up using object storage and open formats. Re-purposing lessons from OpenLambda, we deploy functions up to 15× faster than AWS Lambda. By extending Apache Iceberg’s isolation with Git-like primitives, we support multi-language transactions with formal correctness proofs. Finally, we show how ephemeral functions, Arrow-native caching, and decoupled catalogs can simulate a full warehouse.
We conclude by emphasizing the role of user-facing APIs for adoption in real-world settings, and sharing late-breaking results from our ongoing research.
Speaker Bio: Jacopo Tagliabue is the co-founder and CTO of Bauplan. Educated in several acronyms across the globe (UNISR, SFI, MIT), Jacopo previously co-founded Tooso, an AI startup acquired by TSX:CVO in 2019. He led AI efforts at Coveo from scale-up to IPO and built Coveo Labs, a prolific R&D practice whose open libraries, models, and datasets have been downloaded millions of times.
Throughout his career, he has been fortunate to collaborate with remarkable folks in both industry and academia (e.g., Netflix, NVIDIA, Stanford, University of Wisconsin-Madison), and contribute to diverse fields including Information Retrieval (RecSys, SIGIR), Data Science (KDD), Artificial Intelligence and NLP (ICML, NAACL), Data Management (SIGMOD, VLDB), and Computer Systems (Middleware). While building his new company, he teaches ML Systems at NYU, which is notable (mostly) because it is the only job he ever had that his parents understand.
