Cover Image for Open source data ingestion for RAGs with dlt
Cover Image for Open source data ingestion for RAGs with dlt
Avatar for DataTalks.Club events
429 Going

Open source data ingestion for RAGs with dlt

Registration
Welcome! To join the event, please register below.
About Event

Creating scalable data pipelines - Akela Drissner

About the event

In this hands-on workshop, we’ll learn how to build a data ingestion pipeline using dlt to load data from a REST API into LanceDB so you can have an always up-to-date RAG.

We’ll cover the following steps:

  • Extract data from REST APIs

  • Loading and vectorizing into LanceDB, which unlike other vector DBs stores the data and the embeddings

  • Keeping your data up to date with incremental loading

By the end of this workshop, you’ll be able to write a portable, OSS data pipeline for your RAG that you can deploy anywhere, such as Python notebooks, virtual machines, or orchestrators like Airflow, Dagster, or Mage.

About the speaker:

Akela is the Head of Solutions Engineering at dltHub, a company building open-source tooling for data ingestion. She has a background in machine learning with a focus on NLP, having previously worked in conversational AI at Rasa. 

This event is sponsored by dlthub. Thank you for supporting our community!

​​​DataTalks.Club is the place to talk about data. Join our slack community!

Location
https://www.youtube.com/@DataTalksClub
Avatar for DataTalks.Club events
429 Going