ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

Authors: Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning (2020)

arXiv: 2003.10555

Domains

Pretraining

TLDR (English)

Uses replaced token detection instead of MLM, allowing small models to achieve BERT-large level performance. Representative work on "pre-training objective determines sample efficiency".

TLDR（中文）

用 replaced token detection 替代 MLM，让小模型也能拿到 BERT-large 级表现。是"预训练目标决定样本效率"这条线索的代表作。

Appears in These Articles

Embeddings：把离散符号放进连续空间
Embeddings: Putting Discrete Symbols into Continuous Space

Co-cited Papers

These papers appear in the same articles as this one

Related Papers

Other papers in the same domain