Kv-Cache

Prefix Caching: Reusing KV Cache Across Requests

📅 Apr 22, 2026 · 📝 May 30, 2026 · ☕ 8 min read · ✍️ k4i

When thousands of requests share the same system prompt, recomputing its KV cache each time is pure waste. Prefix caching stores and reuses those vectors, cutting TTFT by up to 97% in common deployments.

Why KV Cache Works in LLM Inference

📅 Apr 20, 2026 · 📝 May 30, 2026 · ☕ 8 min read · ✍️ k4i

why the key-value cache avoids redundant computation in autoregressive decoding, and the memory/compute tradeoffs it introduces.