跳转到内容

deepseek2024-v3

arXiv: 2412.19437

TLDR(中文)

671B 参数(37B 激活)MoE,14.8T token 训练;首次大规模在生产 LLM 上跑通 FP8 训练 + Multi-Token Prediction,并把训练成本压到 $5.6M。震动整个行业。

TLDR (English)

671B parameters (37B activated) MoE, 14.8T token training; first large-scale production LLM to run FP8 training + Multi-Token Prediction, compressing training cost to $5.6M. Shook entire industry.