优化 on k4i's blog

优化 on k4i's bloghttps://k4i.top/zh/tags/%E4%BC%98%E5%8C%96/Recent content in 优化 on k4i's blogHugo -- gohugo.iozhsky_io@outlook.com (K4i)sky_io@outlook.com (K4i)All content is subject to the license of <a rel="license noopener" href="https://creativecommons.org/licenses/by-nc-sa/4.0/" target="_blank">CC BY-NC-SA 4.0</a> .Mon, 20 Apr 2026 12:00:00 +0800LLM 推理中为什么 K、V 可以被缓存https://k4i.top/zh/posts/kv-cache/Mon, 20 Apr 2026 12:00:00 +0800sky_io@outlook.com (K4i)Wed, 22 Apr 2026 01:27:12 +0800https://k4i.top/zh/posts/kv-cache/<h2 id="introduction">引言</h2> <p>大语言模型以<strong>自回归</strong>方式生成文本——每次生成一个 token，每个新 token 依赖于之前所有 token。这种串行特性带来了一个根本的优化机会：每一步中大部分计算是<strong>冗余</strong>的。</p>K4ifeatured imagellm推理kv-cachetransformer优化AI