Neural Machine Translation by Jointly Learning to Align and Translate

Authors: Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio (2014)

Domains

Architecture

TLDR (English)

The seminal attention mechanism paper (pre-Transformer). The authors found that seq2seq's fixed-length bottleneck vector limited translation quality, and proposed letting the decoder dynamically attend to all encoder hidden states when generating each word. This idea directly evolved into Transformer self-attention.

TLDR（中文）

注意力机制的开山之作（在 Transformer 之前）。作者发现 Seq2Seq 的固定长度瓶颈向量限制了翻译质量，提出让解码器在生成每个词时都能"回顾"编码器的所有隐状态，动态分配注意力权重。这个思想直接演化为 Transformer 的自注意力。

Appears in These Articles

Attention：让每个位置选择上下文
Attention: Choosing the Relevant Context

Co-cited Papers

These papers appear in the same articles as this one

Related Papers

Other papers in the same domain