Open Questions in LLMs (2026)

LLMs are evolving rapidly, yet many fundamental questions remain unanswered. Here are the core open questions that LLM Primer tracks:

Understanding and Reasoning

Do models truly “understand” language and the world, or are they performing sophisticated pattern matching? Chain-of-Thought improves reasoning performance, but does it reflect genuine step-by-step reasoning, or merely learning to generate reasoning formats that meet expectations?

Scale and Efficiency

Will scaling laws continue indefinitely? Is there a threshold beyond which returns diminish? Can smaller models with better data and algorithms match the capabilities of large models?

One simplified view is that capability depends on data, compute, and alignment together:

\text{capability} \approx f(\text{data}, \text{compute}, \text{alignment})

Alignment and Safety

Do RLHF and DPO truly change models’ internal objectives, or merely suppress surface behavior? How can we guarantee that alignment generalizes against unknown attacks?

Multimodality and World Models

Will the fusion of vision, audio, and text give models “physical intuition”? Is code generation the best test of true reasoning ability?

The Evaluation Dilemma

When model capabilities approach or exceed human performance, who judges? Have existing benchmarks been “gamed”? How do we design evaluation systems that resist manipulation?

These questions have no easy answers, but they drive LLM Primer’s continuous updates. We welcome community contributions: proposing new questions, adding evidence, and correcting outdated views.

Interactive: Turn an open question into a research question

Pick one open question and check whether it is concrete enough.

It has observable inputs and outputs It has at least one baseline or control group A negative result would still teach us something

Suggested interpretation

When all three are true, the question is closer to an executable research plan than a broad topic list.