2026
- Common Probability Distributions: Variance And Standard Deviation
- Optimizers: From SGD To AdamW
- vLLM Scheduler: How Request Queues Become SchedulerOutput
- vLLM ModelRunner: How SchedulerOutput Becomes a GPU Forward
- Numeric Types in Neural Networks: FP32, BF16, FP8, INT8, and INT4
- Loss Functions: What a Model Is Really Optimizing
- LLM Inference Sampling: What Temperature, Top-p, and Top-k Actually Control
- Three Routes For Embodied Models: VLA, World Models, And WAM
- Activation Functions: The Small Nonlinearity That Shapes a Network
- Streaming Design: Why The Application Layer Still Matters
- vLLM Request Lifecycle: From OpenAI API to One Forward Pass
- Prefill vs Decode: Why One Model Has Two Very Different Bottlenecks
- LLM Attention Kernels and GPU Primitives
- LLM Quantization and Low-Precision Serving
- LLM Inference Lab Reports: Experiments and Benchmarks for Serving Systems
- vLLM / SGLang Source Reading: From Request to Forward Pass
- LLM Inference Internals: Core Mechanisms for Serving Engines
- A Survey of LLM Quantization: From Linear Quantization to Codebooks
- From Absolute Positional Encoding to RoPE: Why Position Can Be a Rotation
- Estimating Compute and Memory Requirements for LLM Training and Inference