Stop Uploading Test Data in Plain Text: New Protocols for Dataset Release
arXiv: 2307.03101
Domains
TLDR (English)
Proposes systematic methods for detecting and preventing benchmark data contamination. By analyzing anomalous performance patterns on contaminated data (such as verbatim memorization of test sets), it reliably detects whether pretraining data contains publicly available test sets. Calls for releasing encrypted or delayed-public test sets.
TLDR(中文)
提出检测和预防基准数据污染的系统方法。通过分析模型在污染数据上的异常表现模式(如逐字记忆测试集),可以可靠地检测预训练数据是否包含公开测试集。呼吁发布加密或延迟公开的测试集。
Related Papers
Other papers in the same domain