Skip to content

Needle in a Haystack — Pressure Testing LLMs

Authors: Greg Kamradt (2023)

Domains

Long ContextEvaluation

TLDR (English)

Proposes the Needle-in-a-Haystack test: inserting a key fact at random positions in a long document and testing whether the model can locate it when answering questions. Became the de facto standard for evaluating factual retrieval in long-context models, revealing the "lost in the middle" problem in most models.

TLDR(中文)

提出"大海捞针"(Needle-in-a-Haystack)测试方法:在长文本中随机插入一个关键事实,测试模型能否在回答问题准确定位该事实。成为评估长上下文模型事实检索能力的事实标准方法,揭示了大多数模型在长文本中的"lost in the middle"问题。

Related Papers

Other papers in the same domain