Synthetic data generation with Distilabel for embedding and LLM finetuning
Registration
Past Event
About Event
We're excited to announce a presentation by Daniel Auras from the German startup Ellamind, who will introduce us to the capabilities of Distilabel for managing the life-cycle of embedding models and LLMs.
Daniel will delve into several parts of synthetic data generation based on a customer case. He will cover:
- Introduction to ellamind & DiscoResearch
- Overview over the architecture of the customer app
- Synthetic data gen with Distilabel for embedding and LLM finetuning.
- distilabel: Query generation from a website crawl
- distilabel: Synthetic customer emails grounded in the website crawl
- distilabel: Evol-instruct of a limited set of seed customer emails
Hope to see you all there!