zou2023-universal-attack

TLDR（中文）

用 GCG 算法找到一段乱码后缀，能把对齐过的 LLaMA-2/Vicuna 全打穿，且攻击在多个闭源模型间迁移。震撼整个安全社区，让"对齐脆弱性"成为主流话题。

TLDR (English)

Uses GCG algorithm to find gibberish suffix that breaks through aligned LLaMA-2/Vicuna, with attacks transferring across multiple closed-source models. Shocked entire security community, making "alignment fragility" mainstream topic.