Prefill

解释 LLM 推理为什么会分成 compute-bound 的 prefill 和 memory-bandwidth-bound 的 decode，以及这如何决定 TTFT、TPOT、batching、KV cache 压力和推理引擎设计。