LoRA: Low-Rank Adaptation of Large Language Models
arXiv: 2106.09685
TLDR (English)
LoRA freezes pretrained weights and only trains the product of two low-rank matrices (rank r much smaller than original dimensions), reducing trainable parameters by up to 10,000x. This makes fine-tuning large models on consumer GPUs feasible and has become the dominant parameter-efficient fine-tuning (PEFT) method.
TLDR(中文)
LoRA 通过冻结预训练模型权重,只训练两个低秩矩阵的乘积(秩 r 远小于原始维度), 把微调的可训练参数量降低了 10000 倍。这使得在消费级 GPU 上微调大模型成为可能, 几乎成为当今最主流的参数高效微调(PEFT)方法。