Needle in a Haystack — Pressure Testing LLMs
Domains
TLDR (English)
Proposes the Needle-in-a-Haystack test: inserting a key fact at random positions in a long document and testing whether the model can locate it when answering questions. Became the de facto standard for evaluating factual retrieval in long-context models, revealing the "lost in the middle" problem in most models.
TLDR(中文)
提出"大海捞针"(Needle-in-a-Haystack)测试方法:在长文本中随机插入一个关键事实,测试模型能否在回答问题准确定位该事实。成为评估长上下文模型事实检索能力的事实标准方法,揭示了大多数模型在长文本中的"lost in the middle"问题。
Related Papers
Other papers in the same domain