Cover Image for Unstructured Data Meetup South Bay Edition

Presented by

meetups for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs. This meetup is sponsored by Zilliz.

188 Went

Featured in

Generative AI San Francisco and Bay Area

Unstructured Data Meetup South Bay Edition

Name: Unstructured Data Meetup South Bay Edition
Start: 2024-11-13T17:30:00.000-08:00
End: 2024-11-13T20:00:00.000-08:00
Location: Sunnyvale, California

Unstructured Data Meetup

Register to See Address

Sunnyvale, California

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

This is an in-person event! Registration is required in order to get in.

Topic: Connecting your unstructured data with Generative LLMs

What we’ll do:
Have some food and refreshments. Hear three exciting talks about unstructured data and generative AI.

5:30 - 6:00 - Welcome/Networking/Registration
6:05 - 6:30 - Dinesh Chandrasekhar, Challenges in Structured Document Data Extraction at Scale with LLMs
6:35 - 7:00 - James Luan, Dense Embeddings != Complete Search - a sneak peak of Milvus 2.5
7:05 - 7:30 - Rob Quiros, Beyond RAG Partitions: Per-User, Per-Chunk Access Policy
7:30 - 8:00 - Networking

Tech Talk 1: Challenges in Structured Document Data Extraction at Scale with LLMs
Speaker: Dinesh Chandrasekhar, Unstract
Abstract: All businesses have to deal with unstructured documents at some level. Some have to deal with them at scale. While an LLM-powered approach to this problem is most certainly head and shoulders above traditional machine learning-based approaches, it is not without its challenges. Top concerns being accuracy and cost, which can really begin to hurt at scale.

In this talk, we will look at how Unstract, an open source platform purpose-built for structured document data extraction, solves these challenges. Dealing with 5M+ pages of structured content extraction per month, Unstract uses various techniques to attain accuracy and cost efficiency.

Topics Covered
- Introduction to Unstructured Data Processing
- Processing Document Data
- Extraction Difficulties
- Unstract to the rescue
- Demo

Tech Talk 2: Dense Embeddings != Complete Search - a sneak peak of Milvus 2.5
Speaker: James Luan, VP of Engineering, Zilliz
Abstract:
Dense embeddings miss exact matches. Keyword search misses semantic meaning. Running two separate systems is a maintenance nightmare. We'll show how Milvus 2.5's hybrid search tackles this with a unified solution, preview its sparse-based BM25 implementation, and share performance numbers against current Elasticsearch-based architectures.

Key Points:

Where dense embeddings fall short and how a unified system architectures address the search needs
Sneak Peak of Milvus 2.5 - Quick look at our BM25 implementation and sparse vector optimizations
Benchmark results comparing hybrid search latency and throughput vs ElasticSsearch
What's Next - Brief overview of upcoming features in our technical roadmap

Tech Talk 3: Beyond RAG Partitions: Per-User, Per-Chunk Access Policy
Speaker: Rob Quiros, CEO & Co-Founder, Caber Systems, Inc.
Abstract: Partitioning vector databases has proven to be a useful tool for privacy and per-tenant isolation. Recent releases of vector db software, including Milvus, have continued to improve partitioning capabilities such as pushing the number of partitions into the millions and providing improved selection of partitions per tenant.

Despite these advances, management overhead increases with the number of partitions. Relative to the capabilities enterprises require and have come to expect from their existing storage systems and databases, there is still a shortfall. New capabilities specific to how vector databases store data and how they are used in RAG applications are needed.

Topics Covered:

Origins of enterprise requirements for granular access control and policy in storage systems.
Sensitive data identification: data classification versus access control.
The problem data-duplication in enterprise datasets presents when copying permissions from documents to chunks.
How enterprise access requirements can be met with per-user, per-chunk access control
Case study and example implementation.

When:
Nov 13, 2024
5:30PM

Where:
This is an in-person event. Registration is required to get into the event. Registration in advance will close 2 days before the event.

This event is sponsored by Zilliz (maintainers of Milvus)

Location

Please register to see the exact location of this event.

Sunnyvale, California

Presented by

Unstructured Data Meetup

meetups for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs. This meetup is sponsored by Zilliz.

188 Went