Skip to content

H2O: Heavy-Hitter Oracle for Accurate KV Cache Compression

Authors: Zichang Liu, Jue Wang, Tri Dao, Tianyi Zhou, Binhang Yuan (2023)

arXiv: 2306.14048

Domains

Inference

TLDR (English)

Discovers Heavy Hitters in KV Cache: a small set of tokens contributes most attention weights. H2O preserves these heavy-hitter KV pairs, maintaining near-lossless performance with only 20-30% of the original KV cache.

TLDR(中文)

发现 KV Cache 中存在"重击者"(Heavy Hitters)现象:少数关键 token 贡献了绝大部分注意力权重。H2O 通过保留这些重击者 token 的 KV,可以在仅保留 20-30% KV Cache 的情况下保持几乎无损的性能。

Related Papers

Other papers in the same domain