Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation

作者： Ofir Press, Noah A. Smith, Mike Lewis (2021)

领域

架构长上下文

TLDR（中文）

把位置信息变成 attention 上的线性偏置，零参数即可外推到训练长度数倍以上。是早期长上下文方案的代表，与 RoPE 形成两条路线之争。

TLDR (English)

Converts position information into linear bias on attention, enabling extrapolation to several times training length with zero parameters. Representative early long-context solution, competing with RoPE as two alternative approaches.

出现在这些文章里

Positional Encoding：顺序从哪里来
长上下文：让模型读得更远
Positional Encoding: Where Does Order Come From
Long Context: Helping Models Read Farther

同被引用

这些论文与本文出现在同一篇文章中

Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation

领域

TLDR（中文）

TLDR (English)

出现在这些文章里

同被引用

相关论文