Efficient Streaming Language Models with Attention Sinks

作者： Guangxuan Xiao, Yuandong Tian, Beidi Chen, Song Han, Mike Lewis (2023)

领域

推理长上下文

TLDR（中文）

提出 Attention Sink 现象：在自回归生成中，模型始终关注开头的几个初始 token。利用这一发现，StreamingLLM 可以在不重新计算的情况下处理无限长输入流，同时保持性能稳定。

TLDR (English)

Discovers the Attention Sink phenomenon: in autoregressive generation, models consistently attend to a few initial tokens. StreamingLLM leverages this to handle infinite-length input streams without recomputation while maintaining stable performance.

出现在这些文章里

长上下文：让模型读得更远
Long Context: Helping Models Read Farther

同被引用

这些论文与本文出现在同一篇文章中

Efficient Streaming Language Models with Attention Sinks

领域

TLDR（中文）

TLDR (English)

出现在这些文章里

同被引用

相关论文