Skip to content

jiang2023-mistral7b

arXiv: 2310.06825

TLDR (English)

Uses GQA + sliding window attention to make 7B model outperform LLaMA-2 13B; first enters scene with "Apache 2.0 + direct weight release" stance. Leads European open-source LLM force.

TLDR(中文)

用 GQA + sliding window attention,让 7B 模型干翻 LLaMA-2 13B;并第一次以"Apache 2.0 + 直接放权重"姿态进入舞台。引领欧洲开源 LLM 力量。