Optimization

why the key-value cache avoids redundant computation in autoregressive decoding, and the memory/compute tradeoffs it introduces.