Training Compute-Optimal Large Language Models

作者： Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendrycks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, Laurent Sifre (2022)

arXiv： 2203.15556

领域

预训练

TLDR（中文）

提出了 Chinchilla 法则：在固定算力预算下，模型参数量和训练数据量应该同比例增长（而非此前主流认为的参数增长更重要）。这重新定义了 LLM 训练的最优策略， Chinchilla 70B 在多个基准上超越了 Gopher 280B。

TLDR (English)

Proposes the Chinchilla scaling laws: given a fixed compute budget, model parameters and training tokens should scale equally (challenging the prior belief that parameters matter more). Chinchilla 70B outperformed Gopher 280B, redefining optimal LLM training strategy.

出现在这些文章里

同被引用

这些论文与本文出现在同一篇文章中

Training Compute-Optimal Large Language Models

领域

TLDR（中文）

TLDR (English)

出现在这些文章里

同被引用

相关论文