跳转到内容

chen2021-humaneval

arXiv: 2107.03374

TLDR(中文)

提出 Codex 模型 + HumanEval 基准(164 道编程题)。HumanEval 至今是 coding 模型的"心电图指标";这篇论文也是 GitHub Copilot 的根。

TLDR (English)

Proposes Codex model + HumanEval benchmark (164 programming problems). HumanEval remains "ECG metric" for coding models today; this paper is also root of GitHub Copilot.