Scaling Laws for Neural Language Models
arXiv: 2001.08361
TLDR(中文)
OpenAI 的规模定律论文,发现语言模型的性能(cross-entropy loss)与模型参数量、数据集大小 和计算量之间存在幂律关系。这使得在小规模实验中就可以预测大规模训练的结果,是 LLM 军备竞赛的 理论依据,也直接导致了 GPT-3 的诞生。
TLDR (English)
OpenAI's scaling laws paper finds that language model performance (cross-entropy loss) follows power laws with model parameters, dataset size, and compute. This enables predicting large-scale training results from small experiments and provided the theoretical basis for the LLM scale-up race, directly leading to GPT-3.