跳转到内容

KTO: Model Alignment as Prospect Theoretic Optimization

作者: Kawin Ethayarajh, Yejin Choi, Swabha Swayamdipta (2024)

arXiv: 2402.01306

领域

对齐

TLDR(中文)

提出 KTO(Kahneman-Tversky Optimization),仅需二元反馈(好/坏)即可对齐模型,无需像 DPO 那样需要成对偏好数据。将前景理论引入对齐优化,证明单条输出是否被喜欢的信号足以学习人类偏好。

TLDR (English)

Proposes KTO (Kahneman-Tversky Optimization), which aligns models using only binary feedback (good/bad) without requiring paired preference data like DPO. Introduces prospect theory into alignment optimization, proving that knowing whether a single output is desirable is sufficient to learn human preferences.

相关论文

同一领域的其他论文