Generative AI-focused workshops, hackathons, and more. Come build with us!

Arize AI

Accurate KV Cache Quantization with Outlier Tokens Tracing

, a deep dive into improving the efficiency of LLM inference. The authors enhance KV Cache quantization, a technique for reducing memory and compute costs during inference, by introducing a method to identify and exclude outlier tokens that hurt quantization accuracy, striking a better balance between efficiency and performance.

Community Paper Reading: Accurate KV Cache Quantization with Outlier Tokens Tracing

John Micheal Willis

Julie Ask

Valério Cardoso

Andy

Shantanu Sharma

Nouamane

Deepak Shisode

Vignesh Ramesh

Xin Ye

Antonio Jimeno Yepes

ivy-mode

Sarah

Standard