LLM Inference Lab Reports: Experiments and Benchmarks for Serving Systems

sky_io@outlook.com (K4i) — Fri, 05 Jun 2026 10:00:00 +0800

This series is for experiment reports. Unlike mechanism explainers or source-reading notes, each post should include a reproducible environment, commands, metrics, tables or figures, and concrete tuning conclusions.

For inference-engine interviews, knowing the names PagedAttention, prefix cache, and chunked prefill is only the first layer. The stronger signal is being able to answer: which workload benefits, how much did the metric improve, where did the bottleneck move, and what should we inspect first if production metrics regress?

Profiling on k4i's blog

LLM Inference Lab Reports: Experiments and Benchmarks for Serving Systems