Skip to content

hendrycks2020-mmlu

arXiv: 2009.03300

TLDR (English)

57 subjects with 14K exam questions, since then "grinding MMLU" became de facto standard for measuring LLM general capability. Still first-line metric in model cards even in 2025; see also later MMLU-Pro.

TLDR(中文)

57 学科 1.4 万道考题,从此"刷 MMLU"成为衡量 LLM 通用能力的事实标准。即使在 2025 年仍是模型卡里第一行的指标;另见后续 MMLU-Pro。