howard2018-ulmfit
arXiv: 1801.06146
TLDR(中文)
第一篇明确提出"通用语言模型预训练 → 任务微调"流水线,并给出 discriminative LR、slanted triangular schedule 等关键 trick。和 ELMo 一起是 "BERT 之前最后一公里"。
TLDR (English)
First paper to explicitly propose the "universal language model pre-training → task fine-tuning" pipeline, with key tricks like discriminative LR and slanted triangular schedule. Together with ELMo, represents "the last mile before BERT".