Community Paper Reading: Extending the Context Window of LLaMA Models
During this week’s paper reading event, we are thrilled to announce that we will be joined by Frank Liu of Zilliz, who will be sharing valuable insights with us. This paper examines Position Interpolation (PI), a method extending context window sizes of LLaMA models up to 32,768 positions with minimal fine-tuning. The extended models showed strong results on tasks requiring long context and retained their quality within the original context window. PI avoids catastrophic attention score issues by linearly down-scaling input position indices. The method’s stability was demonstrated, and existing optimization and infrastructure could be reused in the extended models.
Additionally, during the event, we will also discuss the write-up “Extending Context is Hard… But Not Impossible” available at https://kaiokendev.github.io/context.
Link to Paper: https://arxiv.org/pdf/2306.15595.pdf