<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:media="http://search.yahoo.com/mrss/"><channel><title>Model-Runner on k4i's blog</title><link>https://k4i.top/tags/model-runner/</link><description>Recent content in Model-Runner on k4i's blog</description><generator>Hugo -- gohugo.io</generator><language>en</language><managingEditor>sky_io@outlook.com (K4i)</managingEditor><webMaster>sky_io@outlook.com (K4i)</webMaster><copyright>All content is subject to the license of &lt;a rel="license noopener" href="https://creativecommons.org/licenses/by-nc-sa/4.0/" target="_blank"&gt;CC BY-NC-SA 4.0&lt;/a&gt; .</copyright><lastBuildDate>Tue, 23 Jun 2026 10:30:00 +0800</lastBuildDate><atom:link href="https://k4i.top/tags/model-runner/index.xml" rel="self" type="application/rss+xml"/><item><title>vLLM ModelRunner: How SchedulerOutput Becomes a GPU Forward</title><link>https://k4i.top/posts/model-runner-scheduler-output-to-gpu-forward/</link><pubDate>Tue, 23 Jun 2026 10:30:00 +0800</pubDate><author>sky_io@outlook.com (K4i)</author><atom:modified>Tue, 23 Jun 2026 10:30:00 +0800</atom:modified><guid>https://k4i.top/posts/model-runner-scheduler-output-to-gpu-forward/</guid><description>&lt;p&gt;Scheduler decides &lt;strong&gt;what to run this step&lt;/strong&gt;. ModelRunner decides &lt;strong&gt;how to run it on GPU&lt;/strong&gt;. If Scheduler compresses dynamic request queues into &lt;code&gt;SchedulerOutput&lt;/code&gt;, ModelRunner translates that output into contiguous tensors, KV cache slots, attention metadata, forward context, logits, and sampled tokens.&lt;/p&gt;
&lt;p&gt;So yes, ModelRunner is the execution core of inference. It does not own HTTP serving or global queue policy, but once &lt;code&gt;SchedulerOutput&lt;/code&gt; exists, the model starts running here.&lt;/p&gt;
&lt;p&gt;Read this after the &lt;a href="https://k4i.top/posts/scheduler-request-queue-to-scheduler-output/"&gt;Scheduler&lt;/a&gt; post. If the whole path is still fuzzy, start from the &lt;a href="https://k4i.top/posts/request-lifecycle-openai-to-forward-pass/"&gt;request lifecycle&lt;/a&gt; overview.&lt;/p&gt;</description><dc:creator>K4i</dc:creator><media:content url="https://k4i.top//images/posts/vllm-sglang-source-reading/source-reading-code-path-icon.svg" medium="image"><media:title type="html">featured image</media:title></media:content><category>llm</category><category>inference</category><category>vllm</category><category>source-reading</category><category>model-runner</category><category>ai-infra</category><category>AI</category><category>vLLM and SGLang Source Reading</category></item></channel></rss>