跳转到内容

Needle in a Haystack — Pressure Testing LLMs

作者: Greg Kamradt (2023)

领域

长上下文评估

TLDR(中文)

提出"大海捞针"(Needle-in-a-Haystack)测试方法:在长文本中随机插入一个关键事实,测试模型能否在回答问题准确定位该事实。成为评估长上下文模型事实检索能力的事实标准方法,揭示了大多数模型在长文本中的"lost in the middle"问题。

TLDR (English)

Proposes the Needle-in-a-Haystack test: inserting a key fact at random positions in a long document and testing whether the model can locate it when answering questions. Became the de facto standard for evaluating factual retrieval in long-context models, revealing the "lost in the middle" problem in most models.

相关论文

同一领域的其他论文