xiao2022-smoothquant
arXiv: 2211.10438
TLDR(中文)
把激活的 outlier 通过等价数学变换"挪"到权重上,使得 INT8 推理可行。是 GPU FP8/INT8 部署能 work 的关键工程发现。
TLDR (English)
Moves activation outliers to weights through equivalent mathematical transformation, making INT8 inference feasible. Key engineering discovery enabling GPU FP8/INT8 deployment.