Numeric Types in Neural Networks: FP32, BF16, FP8, INT8, and INT4

sky_io@outlook.com (K4i) — Tue, 23 Jun 2026 10:30:00 +0800

The Short Answer

Models do not use only floating-point types. Integers appear too. The useful distinction is not simply “float versus int”, but where the type is used.

Location	Common types	Purpose
Training compute	FP32, TF32, FP16, BF16	Keep gradients and activations stable while using Tensor Cores
Inference compute	BF16, FP16, FP8, INT8	Reduce bandwidth and compute cost
Weight storage	BF16, FP16, FP8, INT8, INT4, NF4	Shrink model files and GPU memory
KV cache / activation	BF16, FP16, FP8, INT8	Save memory for long context and high concurrency
token ids / masks / indices	INT32, INT64, bool	Represent discrete structure, not quantized parameters

One sentence is enough for the main idea: training is usually dominated by floating-point compute; inference and storage often use low-precision floating point and integers; when integers represent model values, they usually need a scale, zero point, or codebook to become approximate real numbers again.

Bf16 on k4i's blog

Numeric Types in Neural Networks: FP32, BF16, FP8, INT8, and INT4

The Short Answer