Attention Is All You Need

Authors: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin (2017)

arXiv: 1706.03762

Domains

Architecture

TLDR (English)

The foundational paper that introduced the Transformer architecture. The authors replaced RNNs and CNNs entirely with attention mechanisms, proposing multi-head self-attention and positional encoding. It dramatically outperformed prior models on machine translation. Every major LLM today is built on this architecture.

TLDR（中文）

Transformer 架构的奠基之作。作者完全用注意力机制替代了 RNN/CNN，提出多头自注意力与位置编码，在机器翻译任务上大幅超越此前所有模型。今天所有主流 LLM 的底层架构都源于此论文。

Appears in These Articles

Attention：让每个位置选择上下文
Positional Encoding：顺序从哪里来
Transformer Architecture：现代 LLM 的骨架
Attention: Choosing the Relevant Context
Positional Encoding: Where Does Order Come From
Transformer Architecture: The Skeleton of Modern LLMs

Co-cited Papers

These papers appear in the same articles as this one

Related Papers

Other papers in the same domain