Skip to content

liu2019-roberta

arXiv: 1907.11692

TLDR (English)

Uses more data, longer training, removes NSP to prove BERT was far from fully trained. Important not just for stronger model, but for first clearly demonstrating that "training recipe" itself is a core research question.

TLDR(中文)

用更多数据、更长训练、去掉 NSP,证明 BERT 远未训练充分。重要意义不只是更强的模型,而是首次清晰展示"训练配方"本身就是核心研究问题。