Continuous Batching: Scheduling at Iteration Granularity
· â 7 min read · âī¸ k4i
How iteration-level scheduling eliminates GPU idle time by inserting new requests the moment a slot opens, and the math behind mixing prefill and decode in a single forward pass.