Training Compute-Optimal Large Language Models
arXiv: 2203.15556
TLDR(中文)
提出了 Chinchilla 法则:在固定算力预算下,模型参数量和训练数据量应该同比例增长 (而非此前主流认为的参数增长更重要)。这重新定义了 LLM 训练的最优策略, Chinchilla 70B 在多个基准上超越了 Gopher 280B。
TLDR (English)
Proposes the Chinchilla scaling laws: given a fixed compute budget, model parameters and training tokens should scale equally (challenging the prior belief that parameters matter more). Chinchilla 70B outperformed Gopher 280B, redefining optimal LLM training strategy.