borgeaud2022-retro
arXiv: 2112.04426
TLDR (English)
DeepMind introduces chunked retrieval during pre-training, making 7B model match 175B GPT-3. Proves retrieval isn't just RAG inference trick, but another possible pre-training paradigm.
TLDR(中文)
DeepMind 在预训练阶段就引入 chunked retrieval,让 7B 模型匹敌 175B GPT-3。证明检索不只是 RAG 推理时招式,也是预训练范式的另一种可能。