跳转到内容

chen2023-longlora

arXiv: 2309.12307

TLDR(中文)

用 shifted sparse attention + LoRA 把 7B 模型扩到 100K 上下文,且只用一台 8xA100。是长上下文微调的工程标杆;另见 YaRN、PoSE。

TLDR (English)

Uses shifted sparse attention + LoRA to extend 7B model to 100K context with just one 8xA100 machine. Engineering benchmark for long-context fine-tuning; see also YaRN, PoSE.