XLNet: Generalized Autoregressive Pretraining for Language Understanding

Authors: Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le (2019)

arXiv: 1906.08237

Domains

Architecture

TLDR (English)

Proposes Permutation LM to merge benefits of AR and AE, combined with Transformer-XL for long sequences. Shows "pre-training objective" is still an open question, most imaginative alternative after BERT.

TLDR（中文）

提出 Permutation LM 把 AR 和 AE 的好处合并，配合 Transformer-XL 长序列；展示"预训练目标"本身仍然是开放问题，是 BERT 之后最有想象力的替代品。

Appears in These Articles

Positional Encoding：顺序从哪里来
Positional Encoding: Where Does Order Come From

Co-cited Papers

These papers appear in the same articles as this one

Related Papers

Other papers in the same domain