Memory
Paged Attention: Virtual Memory for the GPU
· ☕ 10 min read · ✍️ k4i
How vLLM borrows the OS paging idea to eliminate KV cache memory fragmentation, pushing GPU utilization from ~30% to ~96%.
Paged Attention: Virtual Memory for the GPU