Scaling Laws for Neural Language Models

作者： Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, Dario Amodei (2020)

arXiv： 2001.08361

领域

预训练

TLDR（中文）

OpenAI 的规模定律论文，发现语言模型的性能（cross-entropy loss）与模型参数量、数据集大小和计算量之间存在幂律关系。这使得在小规模实验中就可以预测大规模训练的结果，是 LLM 军备竞赛的理论依据，也直接导致了 GPT-3 的诞生。

TLDR (English)

OpenAI's scaling laws paper finds that language model performance (cross-entropy loss) follows power laws with model parameters, dataset size, and compute. This enables predicting large-scale training results from small experiments and provided the theoretical basis for the LLM scale-up race, directly leading to GPT-3.

出现在这些文章里

同被引用

这些论文与本文出现在同一篇文章中

Scaling Laws for Neural Language Models

领域

TLDR（中文）

TLDR (English)

出现在这些文章里

同被引用

相关论文