Skip to content
- Token: The smallest numbered text unit consumed by a model; see Tokenization.
- Tokenizer: The component that maps text to token IDs.
- Vocabulary: The fixed mapping between tokens and IDs.
- Embedding: A continuous vector for a discrete symbol; see Embeddings.
- Context window: The maximum number of tokens a model can read at once.
- Attention: A mechanism for dynamically selecting relevant context; see Attention.
- Self-attention: Attention among tokens in the same sequence.
- Q/K/V: Query, Key, and Value projections in attention.
- Transformer block: A layer made of attention, MLP, residuals, and normalization.
- MLP: The feed-forward network applied to each token representation.
- Residual connection: Adding a layer input back to its output for stable deep training.
- LayerNorm: Normalization that stabilizes activation statistics.
- Positional encoding: Information that tells a model token order and distance.
- RoPE: Rotary positional embeddings for relative position information.
- Logit: An unnormalized score before softmax.
- Softmax: A function that converts logits into probabilities.
- Temperature: A decoding parameter controlling distribution sharpness.
- Top-k: Sampling only from the k highest-probability candidates.
- Top-p: Sampling from the smallest set whose cumulative probability reaches p.
- Decoding: The strategy for turning probabilities into generated text.
- Prompt: The instruction, context, and examples sent to a model.
- Few-shot: Guiding a task by including a small number of examples.
- Chain-of-thought: Prompting the model to write intermediate reasoning steps.
- Pretraining: Learning general language patterns from large-scale data.
- Fine-tuning: Continuing training on task-specific data.
- RLHF: Optimizing outputs with rewards learned from human preferences.
- RAG: Retrieving external evidence and inserting it into context.
- KV cache: Cached Keys and Values used to speed autoregressive inference.
- Quantization: Lower-precision weights or activations to reduce cost.
- Hallucination: A plausible but unreliable model-generated claim.