Skip to content

lin2023-awq

arXiv: 2306.00978

TLDR (English)

Discovers "few critical weights correspond to large activations", applies per-channel scaling by importance. More robust and faster than GPTQ at 4-bit, one of mainstream INT4 deployment solutions today.

TLDR(中文)

发现"少数关键权重对应大激活",按重要性做 per-channel scaling。在 4-bit 上比 GPTQ 更鲁棒、推理更快,是当下 INT4 部署的主流方案之一。