Paper Reading Session - Knowledge Distillation of Large Language Models
Registration
Past Event
About Event
In this session, we're excited to welcome Yuxian Gu, who will be presenting his work on: Knowledge Distillation of Large Language Models. Yuxian is a third year PhD student in the conversational AI group at Tsinghua University, advised by Prof. Minlie Huang. In this work, they replace the forward Kullback-Leibler divergence (KLD) objective in the standard KD approaches with reverse KLD. This prevents the student model from overestimating low-probability regions of the teacher distribution, and the resultant models (MiniLLMs) generate more precise responses with higher overall quality compared to KD baselines. Exciting stuff, see you there!