Skip to content

KTO: Model Alignment as Prospect Theoretic Optimization

Authors: Kawin Ethayarajh, Yejin Choi, Swabha Swayamdipta (2024)

arXiv: 2402.01306

Domains

Alignment

TLDR (English)

Proposes KTO (Kahneman-Tversky Optimization), which aligns models using only binary feedback (good/bad) without requiring paired preference data like DPO. Introduces prospect theory into alignment optimization, proving that knowing whether a single output is desirable is sufficient to learn human preferences.

TLDR(中文)

提出 KTO(Kahneman-Tversky Optimization),仅需二元反馈(好/坏)即可对齐模型,无需像 DPO 那样需要成对偏好数据。将前景理论引入对齐优化,证明单条输出是否被喜欢的信号足以学习人类偏好。

Related Papers

Other papers in the same domain