跳转到内容

dao2023-flashattention2

arXiv: 2307.08691

TLDR(中文)

用更激进的 warp 级并行和 work partition 把 FlashAttention 再翻倍。今天 vLLM / SGLang / Megatron 训练后端基本都升级到 FA-2。

TLDR (English)

Uses more aggressive warp-level parallelism and work partitioning to double FlashAttention performance. Today vLLM/SGLang/Megatron training backends have all upgraded to FA-2.