Unstructured Data Meetup New York
This is an in-person event! Registration is required to get in.
Topic: Connecting your unstructured data with Generative LLMs
What we’ll do:
Have some food and refreshments. Hear three exciting talks about unstructured data, vector databases and generative AI.
5:30 - 6:00 - Welcome/Networking/Registration
6:00 - 6:20 - Tim Spann, Principal DevRel, Zilliz
6:20 - 6:45 - Uri Goren, Urimax
7:00 - 7:30 - Lisa N Cao, Product Manager, Datastrato
7:30 - 8:00 - Naren, Unstract
8:00 - 8:30 - Networking
Intro Talk:
Hiring?
Need a Job?
Cool project?
Meetup Logistics
Trick-Or-Treat
Using Milvus as a Ghost Trap
Tech talk 1: Introduction to Vector search
Uri Goren, Argmx CEO
Deep learning has been a game-changer for modern AI, but deploying it in production environments poses significant challenges. Vector databases (VDBs) have become the go-to solution for real-time, embedding-based queries. In this talk, we’ll explore the problems VDBs address, the trade-offs between accuracy and performance, and what the future holds for this evolving technology.
Tech talk 2: Metadata Lakes for Next-Gen AI/ML
Lisa N Cao, Product Manager, Datastrato
As data catalogs evolve to meet the growing and new demands of high-velocity, unstructured data, we see them taking a new shape as an emergent and flexible way to activate metadata for multiple uses. This talk discusses modern uses of metadata at the infrastructure level for AI-enablement in RAG pipelines in response to the new demands of the ecosystem. We will also discuss Apache (incubating) Gravitino and its open source-first approach to data cataloging across multi-cloud and geo-distributed architectures.
Tech talk 3:
Unstructured Document Data Extraction at Scale with LLMs: Challenges and Solutions
Unstructured documents present a significant challenge for businesses, particularly those managing them at scale. Traditional Intelligent Document Processing (IDP) systems—let's call them IDP 1.0—rely heavily on machine learning and NLP techniques. These systems require extensive manual annotation, making them time-consuming and less effective as document complexity and variability increase.
The advent of Large Language Models (LLMs) is ushering in a new era: IDP 2.0. However, while LLMs offer significant advancements, they also come with their own set of challenges, particularly around accuracy and cost, which can become prohibitive at scale. In this talk, we will look at how Unstract, an open source IDP 2.0 platform purpose-built for structured document data extraction, solves these challenges. Processing over 5 million pages of unstructured documents per month, Unstract uses various techniques to extract structured data with accuracy and cost efficiency, chief among them—the use of vector databases.
Naren H - Co-founder/COO, Unstract
Naren H is the co-founder at Unstract, an open source startup building an LLM-powered platform that extracts data from unstructured documents, helping automate critical business processes. Before Unstract, Naren founded Mediavak, a digital marketing agency, and co-founded Social Animal and Tweeple Search, building tools that made social media analytics and content marketing a breeze. He holds a Master’s in Computer Science from the State University of New York at Buffalo. He has a knack for turning data chaos into order — occasionally, he even manages to keep his emails under control.
Speaker LinkedIn Profile: https://www.linkedin.com/in/naren87/
Who Should attend:
Anyone interested in talking and learning about Unstructured Data and Generative AI Apps.
159 West 25th Street 3rd Floor Mohammad Ali Room
When:
October 23, 2024
5:30PM
Where:
This is an in-person event! Registration is required to get in. Registration will close 2 days before the event. Sponsored by Zilliz maintainers of Milvus.