跳转到内容

alayrac2022-flamingo

arXiv: 2204.14198

TLDR(中文)

用 Perceiver Resampler 把图像特征接到冻结的 LLM 上做 few-shot 视觉问答。是"插件式多模态"主流路线(LLaVA、IDEFICS 等)的鼻祖。

TLDR (English)

Uses Perceiver Resampler to connect image features to frozen LLM for few-shot visual QA. Ancestor of mainstream "plug-in multimodal" approach (LLaVA, IDEFICS, etc.).