QLoRA: Efficient Finetuning of Quantized LLMs

作者： Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer (2023)

领域

对齐推理

TLDR（中文）

4-bit NF4 + LoRA + paged optimizer，让 65B 在单张 48GB 显卡上 SFT。开源社区微调 LLaMA-2/3、Qwen 几乎 100% 用这套方案。

TLDR (English)

4-bit NF4 + LoRA + paged optimizer enables SFT of 65B on single 48GB GPU. Open-source community fine-tuning of LLaMA-2/3, Qwen uses this approach almost 100%.

出现在这些文章里

KV Cache 与量化：让大模型跑得更快
KV Cache and Quantization: Making Large Models Faster

同被引用

这些论文与本文出现在同一篇文章中

QLoRA: Efficient Finetuning of Quantized LLMs

领域

TLDR（中文）

TLDR (English)

出现在这些文章里

同被引用

相关论文