BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
arXiv: 1810.04805
TLDR(中文)
BERT 用掩码语言模型(MLM)和下一句预测(NSP)在大规模文本上预训练双向 Transformer, 然后通过微调适配下游任务。BERT 一举刷新了 11 项 NLP 基准,确立了"预训练+微调"的现代 NLP 范式,是 GPT 系列和后续模型的主要竞争对手。
TLDR (English)
BERT uses masked language modeling (MLM) and next sentence prediction to pretrain a bidirectional Transformer on large text corpora, then fine-tunes for downstream tasks. It simultaneously surpassed SOTA on 11 NLP benchmarks, establishing the "pretrain+finetune" paradigm that dominates modern NLP.