Archive – k4i's blog

2026

posts Apr 22 Disaggregated Prefill: Splitting Compute Across Machines
posts Apr 22 Prefix Caching: Reusing KV Cache Across Requests
posts Apr 22 Chunked Prefill: Slicing the Prefill to Protect Decode Latency
posts Apr 22 Continuous Batching: Scheduling at Iteration Granularity
posts Apr 22 Paged Attention: Virtual Memory for the GPU
posts Apr 21 Online Softmax: Tiling for Arbitrarily Large Rows
posts Apr 20 Why KV Cache Works in LLM Inference
posts Apr 20 Fused Softmax in Triton
posts Apr 19 SSH Port Forwarding: Local and Remote Tunnels Explained
posts Mar 22 Mitmproxy + Tampermonkey = better {llm, …} viewer
posts Feb 16 Batch vs Stochastic Gradient Descent
posts Feb 16 Forward & Backward Propagation

2024

2023

posts Dec 19 Cycle Finding Algorithms
posts Jan 21 Connect to your android wirelessly
posts Jan 18 Tiling WM (i3)

2022

posts Oct 21 Public, Private and Hybrid Cloud
posts Apr 18 Use Random in C++

1
2
3