DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

作者： DeepSeek-AI (2024)

领域

架构推理

TLDR（中文）

引入 Multi-head Latent Attention (MLA) 把 KV cache 砍到 1/13，让 236B MoE 推理价格碾压同档闭源。MLA 是 V3/R1 推理性价比的核心来源。

TLDR (English)

Introduces Multi-head Latent Attention (MLA) reducing KV cache to 1/13, making 236B MoE inference price crush same-tier closed-source. MLA is core source of V3/R1 inference cost-effectiveness.

出现在这些文章里

高效注意力：突破序列长度平方瓶颈
Efficient Attention: Breaking the Quadratic Sequence Bottleneck

同被引用

这些论文与本文出现在同一篇文章中

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

领域

TLDR（中文）

TLDR (English)

出现在这些文章里

同被引用

相关论文