<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:media="http://search.yahoo.com/mrss/"><channel><title>Cross-Entropy on k4i's blog</title><link>https://k4i.top/tags/cross-entropy/</link><description>Recent content in Cross-Entropy on k4i's blog</description><generator>Hugo -- gohugo.io</generator><language>en</language><managingEditor>sky_io@outlook.com (K4i)</managingEditor><webMaster>sky_io@outlook.com (K4i)</webMaster><copyright>All content is subject to the license of &lt;a rel="license noopener" href="https://creativecommons.org/licenses/by-nc-sa/4.0/" target="_blank"&gt;CC BY-NC-SA 4.0&lt;/a&gt; .</copyright><lastBuildDate>Tue, 23 Jun 2026 10:00:00 +0800</lastBuildDate><atom:link href="https://k4i.top/tags/cross-entropy/index.xml" rel="self" type="application/rss+xml"/><item><title>Loss Functions: What a Model Is Really Optimizing</title><link>https://k4i.top/posts/loss-functions-cross-entropy/</link><pubDate>Tue, 23 Jun 2026 10:00:00 +0800</pubDate><author>sky_io@outlook.com (K4i)</author><atom:modified>Tue, 23 Jun 2026 10:00:00 +0800</atom:modified><guid>https://k4i.top/posts/loss-functions-cross-entropy/</guid><description>&lt;p&gt;Forward propagation produces a prediction. Backpropagation computes gradients. Gradient descent updates parameters. But one question sits between those steps:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;what exactly counts as being wrong, and how wrong is it?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;That is the job of a &lt;strong&gt;loss function&lt;/strong&gt;. It turns a model output and a target into one scalar:&lt;/p&gt;
&lt;p&gt;$$\text{loss} = L(\hat{y}, y)$$&lt;/p&gt;
&lt;p&gt;During training, we usually do not optimize abstract goals such as &amp;ldquo;looks good&amp;rdquo;, &amp;ldquo;is accurate&amp;rdquo;, or &amp;ldquo;answers like a human&amp;rdquo; directly. We optimize a differentiable, computable proxy objective that can produce gradients. Choosing a loss function means telling the model which mistakes are expensive and which update direction is useful.&lt;/p&gt;</description><dc:creator>K4i</dc:creator><media:content url="https://k4i.top//images/icons/gradient-descent.png" medium="image"><media:title type="html">featured image</media:title></media:content><category>deep-learning</category><category>loss-function</category><category>cross-entropy</category><category>gradient-descent</category><category>AI</category></item></channel></rss>