Online Softmax: Tiling for Arbitrarily Large Rowsđ Apr 21, 2026 · đ Apr 22, 2026 · â 6 min read · âī¸ k4ihow online softmax extends the fused kernel to handle rows that exceed sram capacity, using a numerically stable 2-pass tiling algorithm.
Why KV Cache Works in LLM Inferenceđ Apr 20, 2026 · đ Apr 22, 2026 · â 8 min read · âī¸ k4iwhy the key-value cache avoids redundant computation in autoregressive decoding, and the memory/compute tradeoffs it introduces.
Fused Softmax in Tritonđ Apr 20, 2026 · đ Apr 22, 2026 · â 7 min read · âī¸ k4ihow to write a fused softmax kernel in triton that eliminates redundant memory accesses and outperforms pytorch's native implementation.
SSH Port Forwarding: Local and Remote Tunnels Explainedđ Apr 19, 2026 · â 4 min read · âī¸ k4iA practical guide to SSH local and remote port forwarding, with examples, comparison, and persistent configuration via ~/.ssh/config.
Mitmproxy + Tampermonkey = better {llm, âĻ} viewerđ Mar 22, 2026 · â 3 min read · âī¸ k4ithis is a description
Batch vs Stochastic Gradient Descentđ Feb 16, 2026 · đ Apr 19, 2026 · â 4 min read · âī¸ k4iunderstand batch gradient descent, stochastic gradient descent, and mini-batch gradient descent.
Forward & Backward Propagationđ Feb 16, 2026 · đ Apr 19, 2026 · â 5 min read · âī¸ k4iunderstand how backward propagation works in gradient descent.
Key Management With GnuPGđ Jun 1, 2024 · đ Jan 25, 2025 · â 11 min read · âī¸ k4ilearn how to manage you keys with GPG, and use it with ssh and git and pass.
Shortest Paths Algorithmsđ Feb 10, 2024 · đ Jan 25, 2025 · â 3 min read · âī¸ k4icompare shortest path algorithms: dijkstra, floyd, bellman-ford
Cycle Finding Algorithmsđ Dec 19, 2023 · đ Jan 25, 2025 · â 1 min read · âī¸ k4ithis is a description