Needle in a Haystack — Pressure Testing LLMs

作者： Greg Kamradt (2023)

领域

长上下文评估

TLDR（中文）

提出"大海捞针"（Needle-in-a-Haystack）测试方法：在长文本中随机插入一个关键事实，测试模型能否在回答问题准确定位该事实。成为评估长上下文模型事实检索能力的事实标准方法，揭示了大多数模型在长文本中的"lost in the middle"问题。

TLDR (English)

Proposes the Needle-in-a-Haystack test: inserting a key fact at random positions in a long document and testing whether the model can locate it when answering questions. Became the de facto standard for evaluating factual retrieval in long-context models, revealing the "lost in the middle" problem in most models.

出现在这些文章里

长上下文：让模型读得更远
Long Context: Helping Models Read Farther

同被引用

这些论文与本文出现在同一篇文章中

Needle in a Haystack — Pressure Testing LLMs

领域

TLDR（中文）

TLDR (English)

出现在这些文章里

同被引用

相关论文