hendrycks2020-mmlu
arXiv: 2009.03300
TLDR (English)
57 subjects with 14K exam questions, since then "grinding MMLU" became de facto standard for measuring LLM general capability. Still first-line metric in model cards even in 2025; see also later MMLU-Pro.
TLDR(中文)
57 学科 1.4 万道考题,从此"刷 MMLU"成为衡量 LLM 通用能力的事实标准。即使在 2025 年仍是模型卡里第一行的指标;另见后续 MMLU-Pro。