2026
- Disaggregated Prefill: Splitting Compute Across Machines
- Prefix Caching: Reusing KV Cache Across Requests
- Chunked Prefill: Slicing the Prefill to Protect Decode Latency
- Continuous Batching: Scheduling at Iteration Granularity
- Paged Attention: Virtual Memory for the GPU
- Online Softmax: Tiling for Arbitrarily Large Rows
- Why KV Cache Works in LLM Inference
- Fused Softmax in Triton
- SSH Port Forwarding: Local and Remote Tunnels Explained
- Mitmproxy + Tampermonkey = better {llm, …} viewer
- Batch vs Stochastic Gradient Descent
- Forward & Backward Propagation