Needle in a Haystack — Pressure Testing LLMs

Authors: Greg Kamradt (2023)

Domains

Long ContextEvaluation

TLDR (English)

Proposes the Needle-in-a-Haystack test: inserting a key fact at random positions in a long document and testing whether the model can locate it when answering questions. Became the de facto standard for evaluating factual retrieval in long-context models, revealing the "lost in the middle" problem in most models.

TLDR（中文）

提出"大海捞针"（Needle-in-a-Haystack）测试方法：在长文本中随机插入一个关键事实，测试模型能否在回答问题准确定位该事实。成为评估长上下文模型事实检索能力的事实标准方法，揭示了大多数模型在长文本中的"lost in the middle"问题。

Appears in These Articles

长上下文：让模型读得更远
Long Context: Helping Models Read Farther

Co-cited Papers

These papers appear in the same articles as this one

Related Papers

Other papers in the same domain