Cover Image for LLM Data Prep Workshop: Dealing with real-world documents
Cover Image for LLM Data Prep Workshop: Dealing with real-world documents
Avatar for Unstract
Presented by
Unstract
Hosted By
22 Went
Private Event

LLM Data Prep Workshop: Dealing with real-world documents

Register to See Address
Chennai, Tamil Nadu
Registration
Past Event
Welcome! To join the event, please register below.
About Event

Building LLM applications? One of the top problems you’ll face is going to be presenting the LLM with good input data.
Good LLM responses need good input data. Clean, native text PDFs that are used in explainer articles and example code are rarely what you’ll encounter in production use cases. Real-world data is wild to say the least!

Here are some challenges you’ll face:
- Scanned PDFs
- Scans with non-standard orientations
- PDF forms with checkboxes and radiobuttons
- Handwritten forms
- Smartphone-clicked documents
- Complex tables
- Tables that span pages

What will you be learning?

In this practical workshop, let’s compare the various libraries and techniques we have at our disposal, looking at their strengths and limitations.

This talk hopes to arm you with the knowledge of extracting raw text from real-world documents with the aim of sending that raw text to Large Language Models so that we can structure that data for easy processing downstream.

Who is speaking?

Your speaker, Shuveb Hussain, is the co-founder and CEO of Unstract, an open source startup building an LLM-powered platform that extracts data from unstructured documents, helping automate critical business processes.

Unstract currently extracts and structures millions of pages of real-word data every month. The two products they offer are LLMWhisperer, a Raw Text Extraction API and Unstract, an LLM-powered data structuring platform.

Location
Please register to see the exact location of this event.
Chennai, Tamil Nadu
Avatar for Unstract
Presented by
Unstract
Hosted By
22 Went