LLM Inference Sampling: What Temperature, Top-p, and Top-k Actually Control

sky_io@outlook.com (K4i) — Thu, 18 Jun 2026 21:20:00 +0800

Why does lowering temperature make an answer more stable? Why does lowering top_p make it more conservative? Why does a small top_k feel like asking the model to choose only from the obvious candidates? These are not three separate magic styles. They are three concrete operations on the next-token probability distribution.

This post builds the intuition with a 5-token example, then maps the same idea to the vLLM V1 sampler.

Sampling on k4i's blog

LLM Inference Sampling: What Temperature, Top-p, and Top-k Actually Control