Paged Attention: Virtual Memory for the GPU📅 Apr 22, 2026 · 📝 Apr 26, 2026 · ☕ 10 min read · ✍️ k4iHow vLLM borrows the OS paging idea to eliminate KV cache memory fragmentation, pushing GPU utilization from ~30% to ~96%.