VLLM and SGLang Source Reading
vLLM Scheduler: How Request Queues Become SchedulerOutput
· ☕ 6 min read · âœī¸ k4i
A source-reading walkthrough of vLLM V1 Scheduler: how it decides across running/waiting queues, token budget, KV cache blocks, prefix-cache hits, and preemption to produce SchedulerOutput for ModelRunner.
vLLM Scheduler: How Request Queues Become SchedulerOutput
vLLM Request Lifecycle: From OpenAI API to One Forward Pass
· ☕ 5 min read · âœī¸ k4i
A source-reading walkthrough of the vLLM V1 request path: OpenAI-compatible HTTP entrypoint, serving render, AsyncLLM, EngineCore client, Tensor IPC, scheduler, and one GPUModelRunner forward pass.
vLLM Request Lifecycle: From OpenAI API to One Forward Pass