OpenAI o1 System Card

作者： OpenAI (2024)

领域

推理能力安全

TLDR（中文）

OpenAI o1 的系统卡，展示了通过大规模强化学习训练"慢思考"模型的路线：模型在回答前进行长时间的内部推理链，在数学竞赛和代码题上大幅超越 GPT-4。这标志着 LLM 从"快思考"到"慢思考"的范式转变，也是 DeepSeek-R1 等模型的直接先驱。

TLDR (English)

OpenAI o1's system card reveals the approach of training "slow thinking" models via large-scale reinforcement learning: the model performs extended internal reasoning chains before answering, dramatically outperforming GPT-4 on math competitions and coding. This marks a paradigm shift from "fast thinking" to "slow thinking" LLMs.

出现在这些文章里

代码生成：模型如何写程序
Code Generation: How Models Write Programs

同被引用

这些论文与本文出现在同一篇文章中

OpenAI o1 System Card

领域

TLDR（中文）

TLDR (English)

出现在这些文章里

同被引用

相关论文