Scaling Laws for Neural Language Models

Authors: Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, Dario Amodei (2020)

arXiv: 2001.08361

Domains

Pretraining

TLDR (English)

OpenAI's scaling laws paper finds that language model performance (cross-entropy loss) follows power laws with model parameters, dataset size, and compute. This enables predicting large-scale training results from small experiments and provided the theoretical basis for the LLM scale-up race, directly leading to GPT-3.

TLDR（中文）

OpenAI 的规模定律论文，发现语言模型的性能（cross-entropy loss）与模型参数量、数据集大小和计算量之间存在幂律关系。这使得在小规模实验中就可以预测大规模训练的结果，是 LLM 军备竞赛的理论依据，也直接导致了 GPT-3 的诞生。

Scaling Laws for Neural Language Models

Domains

TLDR (English)

TLDR（中文）

Appears in These Articles

Co-cited Papers

Related Papers