Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation

Authors: Ofir Press, Noah A. Smith, Mike Lewis (2021)

Domains

ArchitectureLong Context

TLDR (English)

Converts position information into linear bias on attention, enabling extrapolation to several times training length with zero parameters. Representative early long-context solution, competing with RoPE as two alternative approaches.

TLDR（中文）

把位置信息变成 attention 上的线性偏置，零参数即可外推到训练长度数倍以上。是早期长上下文方案的代表，与 RoPE 形成两条路线之争。

Appears in These Articles

Positional Encoding：顺序从哪里来
长上下文：让模型读得更远
Positional Encoding: Where Does Order Come From
Long Context: Helping Models Read Farther

Co-cited Papers

These papers appear in the same articles as this one

Related Papers

Other papers in the same domain