Extracting Training Data from Large Language Models
arXiv: 2012.07805
领域
TLDR(中文)
展示了从 GPT-2 等语言模型中提取训练数据片段的可行性。通过精心设计的解码策略,可以从模型中恢复出数百条逐字记忆的训练样本,揭示了大规模语言模型的隐私风险。
TLDR (English)
Demonstrates the feasibility of extracting training data fragments from language models like GPT-2. Through carefully designed decoding strategies, hundreds of verbatim memorized training examples can be recovered, revealing privacy risks in large language models.
相关论文
同一领域的其他论文