FlashAttention-3: Fast and Accurate Attention with Asympotic IO Complexity

作者： Jay Shah, Ganesh Bikshandi, Ying Zhang, Vijay Thakkar, Pradeep Ramani, Tri Dao (2024)

领域

推理

TLDR（中文）

利用 H100 的异步 TMA 与 FP8，把 attention 推到 1.2 PFLOPs，并保持数值精度。是 Hopper 架构上长上下文 + FP8 训练的关键依赖。

TLDR (English)

Leverages H100's async TMA and FP8 to push attention to 1.2 PFLOPs while maintaining numerical precision. Key dependency for long-context + FP8 training on Hopper architecture.

出现在这些文章里

高效注意力：突破序列长度平方瓶颈
Efficient Attention: Breaking the Quadratic Sequence Bottleneck

同被引用

这些论文与本文出现在同一篇文章中

FlashAttention-3: Fast and Accurate Attention with Asympotic IO Complexity

领域

TLDR（中文）

TLDR (English)

出现在这些文章里

同被引用

相关论文