Skip to content

Neural Machine Translation by Jointly Learning to Align and Translate

Authors: Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio (2014)

arXiv: 1409.0473

TLDR (English)

The seminal attention mechanism paper (pre-Transformer). The authors found that seq2seq's fixed-length bottleneck vector limited translation quality, and proposed letting the decoder dynamically attend to all encoder hidden states when generating each word. This idea directly evolved into Transformer self-attention.

TLDR(中文)

注意力机制的开山之作(在 Transformer 之前)。作者发现 Seq2Seq 的固定长度瓶颈向量限制了翻译 质量,提出让解码器在生成每个词时都能"回顾"编码器的所有隐状态,动态分配注意力权重。 这个思想直接演化为 Transformer 的自注意力。

Appears in These Articles