Cover Image for Apache Spark™ and Lance Spark Connector
Cover Image for Apache Spark™ and Lance Spark Connector
Avatar for Apache Spark
Presented by
Apache Spark
3 Going

Apache Spark™ and Lance Spark Connector

YouTube
Registration
Welcome! To join the event, please register below.
About Event

📅 Date: September 25, 2025

Time: 9:30 AM - 10:30 AM PST

📍 Location: online

Agenda:

  • Welcome and Introductions

  • Talk 1:  Scalable Multimodal AI Data Processing on Apache Spark™ with Lance Spark Connector, Jack Ye, LanceDB

  • Q&A

Abstract: 

In this talk, we’ll introduce the Lance Spark Connector and show how it brings Lance’s high-performance, AI-native multimodal storage to Apache Spark™ for large-scale data processing. You’ll learn how Spark can leverage Lance’s unique capabilities—random access, built-in indexing, and native support for vector and blob data types—to work seamlessly with embeddings, images, videos, documents, and more. 

We’ll explore how the connector integrates with any Spark-compatible catalog, from Hive Metastore to Unity Catalog, enabling unified governance and discovery. Through real-world examples with Spark, we’ll demonstrate running ingestion, analytics, feature engineering, and retrieval-augmented generation workflows directly on the same multimodal Lance dataset—without costly format conversions—making it the ideal solution in a modern multimodal lakehouse.

Bio: Jack Ye is a software engineer at LanceDB. He is a PMC member of Apache Iceberg and contributor to various open source projects including Apache Spark and Trino. Prior to joining LanceDB, Jack was a tech lead at AWS for initiatives including SageMaker Lakehouse, S3 Tables, EMR & Athena integration with open table formats.

Avatar for Apache Spark
Presented by
Apache Spark
3 Going