Cover Image for Meetup: Low-Latency Data Systems for Real-Time AI (Mountain View, CA)

Presented by

Data acceleration platform that improves the performance of data-intensive AI training workloads by eliminating data loading bottlenecks and improving GPU utilization

Hosted By

149 Going

Featured in

Generative AI San Francisco and Bay Area

Meetup: Low-Latency Data Systems for Real-Time AI (Mountain View, CA)

Name: Meetup: Low-Latency Data Systems for Real-Time AI (Mountain View, CA)
Start: 2025-07-15T17:00:00.000-07:00
End: 2025-07-15T20:00:00.000-07:00
Location: StarTree Inc

Alluxio

StarTree Inc

Mountain View, California

Welcome! To join the event, please register below.

Registration closes at .

You will be asked to verify token ownership with your wallet.

About Event

Join us for a night of tech talks on the systems and data infrastructure that enable low-latency, real-time AI and analytics, including fast querying for object stores, MCP, RAG, unifying streaming & analytics, and LLM serving & inference.

🎤 Speakers

🌟 Tim Berglund, VP Developer Relations @ Confluent
🌟 Bin Fan, VP of Technology @ Alluxio
🌟 Songqiao Su, Staff Software Engineer @ StarTree
More speakers TBA

📅 Agenda

Doors open at 5:00 PM
5:00 PM: Arrival and check-in
5:30 PM: Presentations
- 💡 Meet You in the Middle: 1000× Performance for Parquet Queries on PB-Scale Data Lakes by Bin Fan
- 💡 Introduction to Apache Iceberg™ & Tableflow by Tim Berglund
- 💡 Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale by Songqiao Su
- More talks TBA
7:15 PM: Networking and pizza
Doors close at 8:00 PM

📖 Presentation Details

💡 Meet You in the Middle: 1000× Performance for Parquet Queries on PB-Scale Data Lakes

Storing data as Parquet files on S3 is increasingly used not just as a data lake but also as a lightweight feature store for ML training/inference or a document store for RAG. However, querying petabyte- to exabyte-scale data lakes directly from cloud object storage remains notoriously slow (e.g., latencies ranging from hundreds of milliseconds to several seconds on AWS S3).

In this talk, we show how architecture co-design, system-level optimizations, and workload-aware engineering can deliver over 1000× performance improvements for these workloads—without changing file formats, rewriting data paths, or provisioning expensive hardware.

We introduce a high-performance, low-latency S3 proxy layer powered by Alluxio, deployed atop hyperscale data lakes. This proxy delivers sub-millisecond Time-to-First-Byte (TTFB)—on par with Amazon S3 Express—while preserving compatibility with standard S3 APIs. In real-world benchmarks, a 50-node Alluxio cluster sustains over 1 million S3 queries per second, offering 50× the throughput of S3 Express for a single account, with no compromise in latency.

Beyond accelerating access to Parquet files byte-to-byte, we also offload partial Parquet processing from query engines via a pluggable interface into Alluxio. This eliminates the need for costly index scans and file parsing, enabling point queries with 0.3 microseconds latency and up to 3,000 QPS per instance (measured using a single-thread)—a 100× improvement over traditional query paths.

💡 Introduction to Apache Iceberg™ & Tableflow

The data lake is a fantastic, low-cost place to put data at rest for offline analytics, but we've built it under the terms of a terrible bargain: all that cheap storage at scale was a great thing, but we gave up schema management and transactions along the way. Apache Iceberg has emerged as king of the Open Table Formats to fix this very problem.

Built on the foundation of Parquet files, Iceberg adds a simple yet flexible metadata layer and integration with standard data catalogs to provide robust schema support and ACID transactions to the once ungoverned data lake. In this talk, we'll build Iceberg up from the basics, see how the read and write path work, and explore how it supports streaming data sources like Apache Kafka™. Then we'll see how Confluent's Tableflow brings Kafka together with open table formats like Iceberg and Delta Lake to make operational data in Kafka topics instantly visible to the data lake without the usual ETL—unifying the operational/analytical divide that has been with us for decades.

💡 Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale

Real-time OLAP databases are optimized for speed and often rely on tightly coupled storage-compute architectures using disks or SSDs. Decoupled architectures, which use cloud object storage, introduce an unavoidable tradeoff: cost efficiency at the expense of performance. This makes them unsuitable for databases that need to provide low-latency, real-time analytics, especially the new wave of LLM-powered dashboards, retrieval-augmented generation (RAG), and vector-embedding searches that thrive only when fresh data is milliseconds away. Can we achieve both cost efficiency and performance?

In this talk, we’ll explore the engineering challenges of extending Apache Pinot—a real-time OLAP system—onto cloud object storage while still maintaining sub-second P99 latencies.

We’ll dive into how we built an abstraction in Apache Pinot to make it agnostic to the location of data. We’ll explain how we can query data directly from the cloud (without needing to download the entire dataset, as with lazy-loading) while achieving sub-second latencies. We’ll cover the data fetch and optimization strategies we implemented, such as pipelining fetch and compute, prefetching, selective block fetches, index pinning, and more. We'll also share our latest work about integration with open table formats like iceberg, and how we will continue to achieve fast analytics directly on parquet files by implementing all the same techniques that apply to tiered storage

Location