Cover Image for Open Lakehouse Meetup | Amsterdam
Cover Image for Open Lakehouse Meetup | Amsterdam
Avatar for Open Lakehouse
Presented by
Open Lakehouse
46 Going

Open Lakehouse Meetup | Amsterdam

Register to See Address
Amsterdam, Noord-Holland
Registration
Approval Required
Your registration is subject to approval by the host.
Welcome! To join the event, please register below.
About Event

Open Lakehouse Meetup - Wednesday, August 27, 5:00 PM – 9:00 PM GMT+2 | Amsterdam, Netherlands

​We're bringing together the open source and data engineering community for an evening focused on the latest in open lakehouse and AI architectures! 🚀 Whether you work on data infrastructure, contribute to open source, or want to dive into the future of AI and interoperable lakehouse systems, you’ll fit right in.

​​Don't miss this opportunity to accelerate your data journey and contribute to shaping the future of data and AI! ​🌟

5:00 - 6:00PM: Registration & Mingling 

6:00 - 6:05PM: Welcome Remarks

​6:05 - 6:40PM: Session #1 – Scaling Multimodal AI Lakehouse with Lance & LanceDB

  • Chang She, Co-founder & CEO of LanceDB, Co-author of Pandas

6:40 - 7:15PM: Session #2: DuckLake - The SQL-Powered Lakehouse Format

7:15 - 7:50PM: Session #3: Composable Open Table Formats - integrating open table formats with the composable data stack

7:50PM: Closing Remarks

8:00 - 9:00PM: Reception with bites and beverages

9:00 PM: Goodnight

_________________

Session Abstracts

Scaling Multimodal AI Lakehouse with Lance & LanceDB

LanceDB’s Multimodal Lakehouse (MMLH) is the next-generation lakehouse built from day one to treat documents, video, audio, images, and sensor streams as first-class data. These multimodal workloads—powering innovators like Midjourney, WorldLabs, and Runway—unlock massive value, yet scaling AI-driven multimodal apps remains painful on traditional lakehouses.

MMLH provides a unified foundation optimized across the multimodal AI lifecycle:

  • AI application serving: low-latency random-access reads and search APIs for vectors, text, and binaries

  • Feature engineering + data curation: schema primitives that evolve seamlessly across blobs and metadata for model-driven inference and bulk backfills

  • Training & fine-tuning: high-throughput petabyte-scale data loading with efficient vector and full-text search

We’ll dive into the key capabilities—fast random-access at scale, vector + full-text search, and optimized schema primitives—so you can iterate rapidly without blowing your budget. By the end, you’ll have a concrete blueprint for running production-grade, petabyte-scale multimodal pipelines with LanceDB’s MMLH, freeing your team to focus on innovation instead of data plumbing.

DuckLake - The SQL-Powered Lakehouse Format

Managing changes to tables in data lakes has been very challenging in the past. The formats and systems involved did not exactly cooperate, and as a result sketchy workarounds were all-too-common. This is ostensibly solved by the advent of Lakehouse formats, that attempt to sanitize changes by specifying formats, processes and conventions to enable changes to tables.

However, common Lakehouse formats like Iceberg only appear majestic until one starts looking under the surface. There lurks a huge amount of complexity and  engineering decisions with trade-offs that no longer hold. And even after all that, the hard problems like transactional consistency are delegated to an opaque catalog server, e.g. Polaris or Unity Catalog.

DuckLake re-imagines the Lakehouse design by putting a SQL database in charge of managing metadata. This allows a very elegant design that still scales arbitrarily and greatly reduces complexity, with the actual table data still being on object stores in open format. For the first time, DuckLake allows a “multi-player” experience with DuckDB, where computation can happen anywhere and in parallel, but with centralized transactional safety.

Composable Open Table Formats - Integrating Open Table Formats with the Composable Data Stack

Lakehouse architecture and composable data systems are shaping the modern data landscape, driven by the need for interoperability between increasing number of compute engines and formats. Due to the fast paced adoption of technologies and standards, infrastructure has now grown to support the seamless exchange of data and logic but still has key gaps that need to be addressed. By making open table formats fully composable, we are able to create more extensible and reliable systems that can lead to the success of these technologies similar to Apache Arrow and other technologies. In this talk we will explore a novel set of APIs implementing open table formats like Apache Iceberg, Delta Lake, and more with a strong focus on composability and interoperability across query engines.

_________________

SPEAKER BIOS

Chang She is the CEO and cofounder of LanceDB, the developer-friendly, open-source database for multi-modal AI. A serial entrepreneur, Chang has been building DS/ML tooling for nearly two decades and is one of the original contributors to the pandas library. Prior to founding LanceDB, Chang was VP of Engineering at TubiTV, where he focused on personalized recommendations and ML experimentation.

Hannes Mühleisen is a creator of the DuckDB database management system and Co-founder and CEO of DuckDB Labs. He is a senior researcher at the Centrum Wiskunde & Informatica (CWI) in Amsterdam. He is also Professor of Data Engineering at Radboud University Nijmegen.

Robert Pack has extensive experience in designing and implementing Data & AI platforms within large multinational organizations. Through this work he has been an avid contributor to the open lakehouse ecosystem - specifically Delta Lake. Now at Databricks, his focus is entirely facilitating and contributing to the open source ecosystem for building lakehouse architectures.

Ion Koutsouris is a maintainer of the delta-rs project, with a strong background in business IT and data science. A “recovering data scientist,” Ion has shifted his focus from pure data science to engineering roles in the data and machine learning space.

Location
Please register to see the exact location of this event.
Amsterdam, Noord-Holland
Avatar for Open Lakehouse
Presented by
Open Lakehouse
46 Going