跳转到内容

frantar2022-gptq

arXiv: 2210.17323

TLDR(中文)

第一次实现"在单卡上 4-bit 量化 175B 模型而几乎不掉精度"。把 LLM 推理硬件门槛从 8xA100 拉到一张消费级显卡,普及"开源大模型本地跑"。

TLDR (English)

First to achieve "4-bit quantization of 175B model on single GPU with almost no accuracy loss". Lowered LLM inference hardware barrier from 8xA100 to single consumer GPU, popularizing "run open-source LLMs locally".