Loss Functions: What a Model Is Really Optimizing

sky_io@outlook.com (K4i) — Tue, 23 Jun 2026 10:00:00 +0800

Forward propagation produces a prediction. Backpropagation computes gradients. Gradient descent updates parameters. But one question sits between those steps:

what exactly counts as being wrong, and how wrong is it?

That is the job of a loss function. It turns a model output and a target into one scalar:

$$\text{loss} = L(\hat{y}, y)$$

During training, we usually do not optimize abstract goals such as “looks good”, “is accurate”, or “answers like a human” directly. We optimize a differentiable, computable proxy objective that can produce gradients. Choosing a loss function means telling the model which mistakes are expensive and which update direction is useful.

Cross-Entropy on k4i's blog

Loss Functions: What a Model Is Really Optimizing