chen2023-longlora
arXiv: 2309.12307
TLDR(中文)
用 shifted sparse attention + LoRA 把 7B 模型扩到 100K 上下文,且只用一台 8xA100。是长上下文微调的工程标杆;另见 YaRN、PoSE。
TLDR (English)
Uses shifted sparse attention + LoRA to extend 7B model to 100K context with just one 8xA100 machine. Engineering benchmark for long-context fine-tuning; see also YaRN, PoSE.