dao2023-flashattention2
arXiv: 2307.08691
TLDR(中文)
用更激进的 warp 级并行和 work partition 把 FlashAttention 再翻倍。今天 vLLM / SGLang / Megatron 训练后端基本都升级到 FA-2。
TLDR (English)
Uses more aggressive warp-level parallelism and work partitioning to double FlashAttention performance. Today vLLM/SGLang/Megatron training backends have all upgraded to FA-2.