Why KV Cache Works in LLM Inference
· ☕ 9 min read · âœī¸ k4i
why the key-value cache avoids redundant computation in autoregressive decoding, and the memory/compute tradeoffs it introduces.
Why KV Cache Works in LLM Inference
Fused Softmax in Triton
· ☕ 7 min read · âœī¸ k4i
how to write a fused softmax kernel in triton that eliminates redundant memory accesses and outperforms pytorch's native implementation.
Fused Softmax in Triton
Batch vs Stochastic Gradient Descent
· ☕ 4 min read · âœī¸ k4i
understand batch gradient descent, stochastic gradient descent, and mini-batch gradient descent.
Batch vs Stochastic Gradient Descent
Key Management With GnuPG
· ☕ 11 min read · âœī¸ k4i
learn how to manage you keys with GPG, and use it with ssh and git and pass.
Key Management With GnuPG
DSU on Tree (Sack)
· ☕ 9 min read · âœī¸ k4i
DSU on tree answers subtree queries by keeping the largest child's contribution and rebuilding only the small parts. The trick is not union-find; it is small-to-large merging hidden inside a DFS.
DSU on Tree (Sack)
Shortest Paths Algorithms
· ☕ 3 min read · âœī¸ k4i
compare shortest path algorithms: dijkstra, floyd, bellman-ford
Shortest Paths Algorithms
Tiling WM (i3)
· ☕ 4 min read · âœī¸ k4i
this is my i3 window manager configuration
Tiling WM (i3)
Use Random in C++
· ☕ 1 min read · âœī¸ k4i
introduce randomness into your C++ programs, the right way.
Use Random in C++
Manage Passwords with Pass
· ☕ 2 min read · âœī¸ k4i
manage passwords on all your devices (add, generate, edit, delete, sync).
Manage Passwords with Pass
File Sharing with Samba
· ☕ 3 min read · âœī¸ k4i
use samba to share files across linux and windows.
File Sharing with Samba
Linux Hypervisor Setup
· ☕ 4 min read · âœī¸ k4i
setup hypervisor with qemu and kvm, the best linux based open source virtualization solution
Linux Hypervisor Setup