Skip to content

New Training, Inference, and Applications Modules

New Training, Inference, and Applications Modules

Section titled “New Training, Inference, and Applications Modules”

LLM Primer started with foundational concepts like Tokenization, Attention, and Transformer. As the content matured, we expanded into three new modules:

  • Pretraining and Scaling Law: From data engineering to compute-optimal training, understanding how models “learn to predict the next word.”
  • Fine-Tuning and Alignment: How SFT, RLHF, DPO, and related methods turn a “generalist” into an “obedient assistant.”
  • KV Cache and Quantization: The two pillars of inference optimization—caching computed results and reducing precision to save resources.
  • Efficient Attention: FlashAttention, sparse attention, and frontier explorations toward linear complexity.

Each article follows LLM Primer’s three-tier design: intuition first, engineering trade-offs next, and research questions when useful. Contributions via PR—new content, corrections, or adopting inbox papers—are always welcome.