跳转到内容

xiao2022-smoothquant

arXiv: 2211.10438

TLDR(中文)

把激活的 outlier 通过等价数学变换"挪"到权重上,使得 INT8 推理可行。是 GPU FP8/INT8 部署能 work 的关键工程发现。

TLDR (English)

Moves activation outliers to weights through equivalent mathematical transformation, making INT8 inference feasible. Key engineering discovery enabling GPU FP8/INT8 deployment.