Unify

In this session, we're excited to welcome 

Knowledge Distillation of Large Language Models

. Yuxian is a third year PhD student in the conversational AI group at Tsinghua University, advised by Prof. Minlie Huang. In this work, they replace the forward Kullback-Leibler divergence (KLD) objective in the standard KD approaches with reverse KLD. This prevents the student model from overestimating low-probability regions of the teacher distribution, and the resultant models (MiniLLMs) generate more precise responses with higher overall quality compared to KD baselines. Exciting stuff, see you there!

Paper Reading Session - Knowledge Distillation of Large Language Models

Moiz Sajid

Sumeet Fefar

James Keane

Poulomi Saha

Vansh Prajapati 

Naman Bhagat

Mehmet Sencer

Michael Gubas

Tope

mohammadjavad mozaffar

geist-mono

Daniel Lenton

Yuxian Gu

Standard