跳转到内容

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

作者: Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova (2018)

arXiv: 1810.04805

TLDR(中文)

BERT 用掩码语言模型(MLM)和下一句预测(NSP)在大规模文本上预训练双向 Transformer, 然后通过微调适配下游任务。BERT 一举刷新了 11 项 NLP 基准,确立了"预训练+微调"的现代 NLP 范式,是 GPT 系列和后续模型的主要竞争对手。

TLDR (English)

BERT uses masked language modeling (MLM) and next sentence prediction to pretrain a bidirectional Transformer on large text corpora, then fine-tunes for downstream tasks. It simultaneously surpassed SOTA on 11 NLP benchmarks, establishing the "pretrain+finetune" paradigm that dominates modern NLP.

出现在这些文章里