lin2023-awq
arXiv: 2306.00978
TLDR (English)
Discovers "few critical weights correspond to large activations", applies per-channel scaling by importance. More robust and faster than GPTQ at 4-bit, one of mainstream INT4 deployment solutions today.
TLDR(中文)
发现"少数关键权重对应大激活",按重要性做 per-channel scaling。在 4-bit 上比 GPTQ 更鲁棒、推理更快,是当下 INT4 部署的主流方案之一。