New Training, Inference, and Applications Modules
New Training, Inference, and Applications Modules
Section titled “New Training, Inference, and Applications Modules”LLM Primer started with foundational concepts like Tokenization, Attention, and Transformer. As the content matured, we expanded into three new modules:
Training
Section titled “Training”- Pretraining and Scaling Law: From data engineering to compute-optimal training, understanding how models “learn to predict the next word.”
- Fine-Tuning and Alignment: How SFT, RLHF, DPO, and related methods turn a “generalist” into an “obedient assistant.”
Inference
Section titled “Inference”- KV Cache and Quantization: The two pillars of inference optimization—caching computed results and reducing precision to save resources.
- Efficient Attention: FlashAttention, sparse attention, and frontier explorations toward linear complexity.
Applications
Section titled “Applications”- RAG and Retrieval Augmentation: Giving models updatable external memory to reduce hallucinations and improve traceability.
- Agents and Tool Use: From chat to action—how models interact with the real world through tools.
Each article follows LLM Primer’s three-tier design: intuition first, engineering trade-offs next, and research questions when useful. Contributions via PR—new content, corrections, or adopting inbox papers—are always welcome.