DeepSeek-V3 Technical Report

作者： DeepSeek-AI (2024)

arXiv： 2412.19437

领域

架构混合专家

TLDR（中文）

671B 参数（37B 激活）MoE，14.8T token 训练；首次大规模在生产 LLM 上跑通 FP8 训练 + Multi-Token Prediction，并把训练成本压到 $5.6M。震动整个行业。

TLDR (English)

671B parameters (37B activated) MoE, 14.8T token training; first large-scale production LLM to run FP8 training + Multi-Token Prediction, compressing training cost to $5.6M. Shook entire industry.

DeepSeek-V3 Technical Report

领域

TLDR（中文）

TLDR (English)

相关论文