Prompt Engineering: The Art of Talking to Models

Intuition: prompts are the model’s “context script”

Prompt engineering is not about tricking the model; it is about providing the most helpful context for the task. Like giving an actor a good script, the model needs clear instructions, relevant background, and appropriate examples to perform at its best.

Basic techniques include: explicit task descriptions, providing a few examples (few-shot), breaking down complex problems, and asking the model to think step by step (Chain-of-Thought). These methods require no weight updates yet can significantly improve performance.

Engineering view: from simple prompts to systematic design

In practice, prompt design has evolved from “write a paragraph” to a systematic engineering discipline:

Role assignment: Giving the model a clear role (“You are a senior data analyst”) stabilizes output style.
Structured templates: Use XML, Markdown, or JSON to separate instructions, context, input, and output format, reducing parsing errors.
Few-shot selection: The number, order, and quality of examples all matter. Similar examples are usually more effective than random ones; example ordering can introduce position bias.
Chain-of-Thought: Adding “Let’s think step by step” or providing reasoning examples significantly improves math and logic performance, but increases token cost.
Self-consistency: Sampling multiple times and voting is more reliable than single greedy decoding, especially for tasks with clear answers.

In production systems, prompt version management is equally important: changing a prompt may affect hundreds of downstream use cases. Establish prompt version control, regression tests, and A/B evaluation frameworks.

Research view: the nature of prompting and automatic optimization

Research questions include: what capabilities does prompt engineering reveal about LLMs? Is it in-context learning extracting patterns from examples, or does the prompt merely “unlock” abilities already acquired during pretraining? This relates to our fundamental understanding of LLM generalization.

Automatic Prompt Engineering (APE) attempts to discover optimal prompts through search, reinforcement learning, or gradient-based methods. Directions include discrete prompt search, soft prompts / prefix tuning, and letting models generate and evaluate candidate prompts themselves. The future may be “prompts as code”: using programming languages and type systems to constrain and compose prompt templates.

References

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models — Jason Wei et al. (2022)
Introduces chain-of-thought prompting: adding intermediate reasoning steps to prompts dramatically improves LLM performance on math, logic, and commonsense reasoning tasks. This simple technique brought LLM reasoning capabilities close to human-level performance.
kojima2022-zeroshot-cot
A single phrase "Let's think step by step" boosts math accuracy from ~17% to ~78%. CoT capability is inherent in models, triggered by prompts—this discovery shocked the entire community.
Self-Consistency Improves Chain of Thought Reasoning in Language Models — Xuezhi Wang et al. (2022)
Self-Consistency is a key improvement to CoT: instead of greedy decoding a single reasoning chain, sample multiple diverse reasoning paths and take the most frequent answer (majority vote). This simple trick improves accuracy by 10-20 percentage points on multiple reasoning benchmarks.
zhou2022-least-to-most
"Break hard problems into easy ones, solve sequentially" is another reasoning paradigm parallel to CoT, especially effective for compositional generalization. Together with CoT/ToT forms the trio of "how to guide LLM step-by-step thinking".