跳转到内容

Stop Uploading Test Data in Plain Text: New Protocols for Dataset Release

作者: Alon Jacovi, Avi Caciularu, Omer Goldman, Yoav Goldberg (2023)

arXiv: 2307.03101

领域

评估

TLDR(中文)

提出检测和预防基准数据污染的系统方法。通过分析模型在污染数据上的异常表现模式(如逐字记忆测试集),可以可靠地检测预训练数据是否包含公开测试集。呼吁发布加密或延迟公开的测试集。

TLDR (English)

Proposes systematic methods for detecting and preventing benchmark data contamination. By analyzing anomalous performance patterns on contaminated data (such as verbatim memorization of test sets), it reliably detects whether pretraining data contains publicly available test sets. Calls for releasing encrypted or delayed-public test sets.

相关论文

同一领域的其他论文