KTO: Model Alignment as Prospect Theoretic Optimization

Authors: Kawin Ethayarajh, Yejin Choi, Swabha Swayamdipta (2024)

Domains

Alignment

TLDR (English)

Proposes KTO (Kahneman-Tversky Optimization), which aligns models using only binary feedback (good/bad) without requiring paired preference data like DPO. Introduces prospect theory into alignment optimization, proving that knowing whether a single output is desirable is sufficient to learn human preferences.

TLDR（中文）

提出 KTO（Kahneman-Tversky Optimization），仅需二元反馈（好/坏）即可对齐模型，无需像 DPO 那样需要成对偏好数据。将前景理论引入对齐优化，证明单条输出是否被喜欢的信号足以学习人类偏好。

Related Papers

Other papers in the same domain