The Progression in Multimodal Document RAG
In this talk, Shi will present his recent work VisRAG, a fully visual retrieval-augmented generation (RAG) pipeline that eliminates the need for parsing. He will discuss the motivations behind VisRAG, its construction and evaluation, and potential directions for its future development.
👉 Presented paper: https://arxiv.org/pdf/2410.10594
About Speaker:
Shi Yu is a Ph.D. student in the Department of Computer Science and Technology at Tsinghua University, supervised by Prof. Zhiyuan Liu and affiliated with the THUNLP Lab. His current research interests focus on retrieval-augmented generation (RAG) and the data science of training language models. He has published several papers at top conferences, including SIGIR and COLING, and is the maintainer of OpenMatch, an information retrieval toolkit. In collaboration with ModelBest Inc., he led the development of a bilingual text embedding model MiniCPM-Embedding and a reranker model MiniCPM-Reranker, achieving 300k downloads in total on Hugging Face.