Skip to content

howard2018-ulmfit

arXiv: 1801.06146

TLDR (English)

First paper to explicitly propose the "universal language model pre-training → task fine-tuning" pipeline, with key tricks like discriminative LR and slanted triangular schedule. Together with ELMo, represents "the last mile before BERT".

TLDR(中文)

第一篇明确提出"通用语言模型预训练 → 任务微调"流水线,并给出 discriminative LR、slanted triangular schedule 等关键 trick。和 ELMo 一起是 "BERT 之前最后一公里"。