Cover Image for Apache Iceberg Bay Area Community Meetup

Presented by

Daft

High-performance data engine providing simple and reliable data processing for any modality and scale. www.getdaft.io

Hosted By

231 Went

Featured in

Generative AI San Francisco and Bay Area

Apache Iceberg Bay Area Community Meetup

Name: Apache Iceberg Bay Area Community Meetup
Start: 2024-11-04T17:00:00.000-08:00
End: 2024-11-04T20:00:00.000-08:00
Location: San Francisco, California

Daft

San Francisco, California

Registration Closed

This event is not currently taking registrations. You may contact the host or subscribe to receive updates.

About Event

Ice{berg} over Drinks ❄️

Join from the webinar link

https://amazon.webex.com/amazon/j.php?MTID=m1a9b14dfa4466b21a654e54d06e7f7cc

Join by the webinar number

Webinar number (access code): 2661 450 7102

Webinar password: NJuUHxVT948 (65884988 when dialing from a phone or video system)

We’re partnering with AWS, Snowflake, and the Apache Iceberg Community to co-host the next Bay Area Apache Iceberg Community Meetup!

Connect with fellow enthusiasts, share insights, and dive into the latest developments in the Apache Iceberg ecosystem! Whether you're a seasoned pro or new to Apache Iceberg, this meetup is the perfect place to exchange ideas and spark innovation.

Agenda

5:00p - 6:00p: Doors Open & Networking 💃

6:00p - 7:45p: Welcome Remarks & Presentations!

7:45p - 8:30p: More Networking 🕺

About Daft

Daft is an open source framework that powers ETL, analytics, and ML/AI at scale. Its familiar Dataframe API is built to outperform Spark in performance and ease of use.

💬 Join Distributed Data Community Slack

📚 Check out Daft Engineering Blog

📲 Follow Daft on LinkedIn & Twitter

🖥️ Subscribe to Daft YouTube

💜 We’re hiring, join our team

About AWS

Apache Iceberg is an open-source table format that simplifies table management while improving performance. AWS analytics services such as Amazon EMR, AWS Glue, Amazon Athena, and Amazon Redshift include native support for Apache Iceberg, so you can easily build transactional data lakes on top of Amazon Simple Storage Service (Amazon S3) on AWS.

Additional Resources and Information:

📚 Workshop: Running Apache Iceberg on AWS

📚 Blogs: Apache Iceberg on AWS

📚 AWS Prescriptive Guidance: Using Apache Iceberg on AWS

🖥️ Subscribe to AWS Events and AWS Developers

💜 We’re hiring, join our team

Presentations

🌟 Lessons From Building Iceberg Capabilities In Daft, A Distributed Query Engine

In this talk, we will share our experience building distributed Iceberg operations in Daft. We will walk through how we adapted PyIceberg for distributed workloads, including how we were able to build features such as partitioned writes into Daft. We will also discuss our challenges of using existing Python/Rust Iceberg tooling and what workarounds we implemented. Finally, we will talk about what it means for an Iceberg library to provide useful abstractions while giving the query engine proper control over execution, and what API interfaces we propose may enable that.

Kevin Wang is a founding engineer at Eventual and a primary contributor to the Daft open-source project. Prior to Eventual, he completed an undergraduate degree at UC Berkeley where he did research in AI and LLM systems and worked in quantitative finance at Arrowstreet and Akuna.

🌟 Accelerate Your Iceberg Workloads on S3

This talk discusses the recent improvements that Amazon S3 team has been doing in Iceberg FileIO and LocationProvider to improve Iceberg user experience on S3. This includes better retry and fault tolerant executions (#10433 & #11052), better hashing scheme to reduce throttling (#11112), and integration with S3 Data Acceleration Toolkit and AWS CRT client to improve read performance.

Jack Ye is a Sr. Software Engineer at AWS Open Data Analytics. His team focuses on the integration of open source storage layer solutions including Iceberg, Hudi, Delta, Parquet, Avro, etc. with AWS analytics products. Jack is also a PMC member of the Iceberg project.

Roni Burd is Dir of Product Engineering at AWS, and builds platform and developer tools. Roni brings 15+ years of experience working in the query engines, storage engines, and compute platform for database systems and ML processing.

🌟 How We Implemented the Iceberg Connector in Rust!

In this talk, we will discuss how we implemented the Iceberg connector in Rust, replacing the original Java-wrapped version to address performance bottlenecks in serialization and memory usage. By following the Apache Iceberg specification, we built a native Rust connector that supports Iceberg’s advanced features, such as multi-catalog compatibility and streaming updates. We’ve contributed this new version to the apache/iceberg-rust repository, and will share insights into the architectural improvements and best practices for leveraging Iceberg in streaming environments.

Yingjun Wu is the founder of RisingWave Labs, a database company developing RisingWave, a distributed SQL database for stream processing. Before running the company, Yingjun was a software engineer at the Redshift team, Amazon Web Services, and a researcher at the Database group, IBM Almaden Research Center. He has been working in the field of stream processing and database systems for over a decade.

🌟 Iceberg at Netflix

Netflix's Iceberg past, present, and future (call out to community for where they see the technology challenges). Netflix will briefly cover our journey from Hive to Iceberg, current systems with catalog, compaction, and replication, and the improvements we're making.

Snehal Chennuru is an engineering manager for the Big Data Warehouse team at Netflix, with over a decade of experience building distributed systems at Netflix, Skyhigh Networks, and Clearwell Systems.

Bryan Keller is a software engineer on the Big Data Warehouse team at Netflix, with over a decade of experience building big data systems. He is also an early Iceberg advocate and Iceberg committer.

Tim Jiang is a software engineer on the Big Data Warehouse team at Netflix. Over the past few years, he has focused on strengthening data security for Iceberg and query engines.

🌟 Lakekeeper: Rust based Iceberg Catalog

The Rust ecosystem in the data space is evolving quickly. With Lakekeeper, we are filling the gap of a Rust-native modular Iceberg Rest Catalog designed for decentralized deployments.

Christian Thiel is the CTO of HANSETAG GmbH and a data enthusiast building the future of Data Collaboration with Iceberg.

Location

Please register to see the exact location of this event.

San Francisco, California

Presented by

Daft

High-performance data engine providing simple and reliable data processing for any modality and scale. www.getdaft.io

Hosted By

231 Went

Apache Iceberg Bay Area Community Meetup

​Ice{berg} over Drinks ❄️

​Agenda

​About Daft

​About AWS

​Presentations

Ice{berg} over Drinks ❄️

Agenda

About Daft

About AWS

Presentations