跳转到内容

Mixtral of Experts

作者: Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven Le Scao, Théophile Gervet, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed (2024)

arXiv: 2401.04088

TLDR(中文)

Mixtral 8x7B 是第一个广泛开源的 MoE 语言模型:8 个专家网络,每个 token 选择 2 个, 实际激活参数约 13B 而总参数 47B。在推理成本接近 13B 密集模型的情况下, 性能媲美或超过 LLaMA 2 70B,证明了 MoE 在开源模型上的可行性。

TLDR (English)

Mixtral 8x7B is the first widely open-sourced MoE language model: 8 expert networks, each token routes to 2, so ~13B parameters are activated with 47B total. At inference cost similar to a 13B dense model, it matches or surpasses LLaMA 2 70B, proving MoE viability for open-source models.