Prefix Caching: Reusing KV Cache Across Requests

sky_io@outlook.com (K4i) — Wed, 22 Apr 2026 11:30:00 +0800

the repeated-prefix problem

the KV cache eliminates redundant computation within a single request. but in production, a different kind of redundancy is pervasive: many requests share the same prefix.

three scenarios account for the majority of production traffic:

system prompts. every request to a code assistant, agent, or customer-facing chatbot begins with the same multi-kilotoken system prompt. on each new request, the server re-runs prefill over those identical tokens — then throws away the resulting KV cache when the request ends.

Caching on k4i's blog

Prefix Caching: Reusing KV Cache Across Requests

the repeated-prefix problem