Sglang – k4i's blog

vLLM 请求生命周期：从 OpenAI API 到一次 Forward

📅 2026年06月07日 · ☕ 5 分钟 · ✍️ k4i

沿 vLLM V1 的 OpenAI-compatible server 源码追踪一次请求：HTTP 入口、serving render、AsyncLLM、EngineCore client、Tensor IPC、scheduler，以及 GPUModelRunner 的一次 forward。

LLM Inference Lab Reports：推理实验与 Benchmark 路线

📅 2026年06月05日 · ☕ 2 分钟 · ✍️ k4i

LLM 推理实验系列索引：vLLM/SGLang benchmark、TTFT/TPOT、prefix cache、chunked prefill、PagedAttention、量化和 profiler dashboard。

vLLM / SGLang 源码阅读：从请求到一次 Forward

📅 2026年06月04日 · ☕ 1 分钟 · ✍️ k4i

vLLM / SGLang 源码阅读系列索引：请求生命周期、scheduler、KV cache 分配、block manager、radix cache 和 benchmark。

LLM Inference Internals：推理引擎核心机制路线

📅 2026年06月04日 · ☕ 1 分钟 · ✍️ k4i

LLM 推理引擎核心机制系列索引：prefill/decode、KV cache、PagedAttention、continuous batching、prefix caching 和 PD 分离。