QLoRA: Efficient Finetuning of Quantized LLMs

Authors: Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer (2023)

Domains

AlignmentInference

TLDR (English)

4-bit NF4 + LoRA + paged optimizer enables SFT of 65B on single 48GB GPU. Open-source community fine-tuning of LLaMA-2/3, Qwen uses this approach almost 100%.

TLDR（中文）

4-bit NF4 + LoRA + paged optimizer，让 65B 在单张 48GB 显卡上 SFT。开源社区微调 LLaMA-2/3、Qwen 几乎 100% 用这套方案。

Appears in These Articles

KV Cache 与量化：让大模型跑得更快
KV Cache and Quantization: Making Large Models Faster

Co-cited Papers

These papers appear in the same articles as this one

Related Papers

Other papers in the same domain