Future of DataFrames and Data Systems with Wes McKinney
I'm really excited to host this talk as Wes is both a really thoughtful person and a great engineer!
We'll also host a discussion on Discord. Please post your questions there: https://discord.gg/QEWwCmNPbR
About the speaker
If you do anything with data today, you're likely using something created by Wes McKinney. Wes is the creator of some of the most popular data engineering tools, including pandas, Apache Arrow, and Ibis. He’s also a core contributor to Apache Parquet.
Wes McKinney’s work is all about making data systems composable. Composability requires everyone to follow the same open standards. He believes that decomposing a data system into modular and reusable components makes it much easier for companies to plug and play these components to build complex data systems.
pandas introduced a DataFrame API that has been adopted as the de facto API for data science.
Apache Arrow is a standard in-memory data format that simplifies data movement across platforms.
Ibis is a unified DataFrame API that allows you to work with over 20 data backends (DuckDB, Polars, pandas, Snowflake, Spark, Flink, etc.) without rewriting code. It’s the backbone of BigQuery DataFrame.
After 10 years, Wes’s idea of composable data systems is catching on. Key data players have been rushing to adopt open data standards. Earlier this month, Snowflake launched Polaris Catalog, built on top of Iceberg, which made Databricks rush to acquire Tabular for $1B+. Apache Arrow has been adopted by Snowflake, BigQuery, Databricks, and HuggingFace.
About this talk
His work has shaped the design of data systems over the last 10 years. In this talk, he will share what he thinks data systems will look like in the next 10 years.
This event starts with a presentation on why Wes McKinney created pandas, Apache Arrow, and Ibis. It’s followed by a casual chat between Wes and Chip, and Q&A from the audience.