<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:media="http://search.yahoo.com/mrss/"><channel><title>Activation-Function on k4i's blog</title><link>https://k4i.top/tags/activation-function/</link><description>Recent content in Activation-Function on k4i's blog</description><generator>Hugo -- gohugo.io</generator><language>en</language><managingEditor>sky_io@outlook.com (K4i)</managingEditor><webMaster>sky_io@outlook.com (K4i)</webMaster><copyright>All content is subject to the license of &lt;a rel="license noopener" href="https://creativecommons.org/licenses/by-nc-sa/4.0/" target="_blank"&gt;CC BY-NC-SA 4.0&lt;/a&gt; .</copyright><lastBuildDate>Thu, 18 Jun 2026 10:00:00 +0800</lastBuildDate><atom:link href="https://k4i.top/tags/activation-function/index.xml" rel="self" type="application/rss+xml"/><item><title>Activation Functions: The Small Nonlinearity That Shapes a Network</title><link>https://k4i.top/posts/activation-functions-neural-networks/</link><pubDate>Thu, 18 Jun 2026 10:00:00 +0800</pubDate><author>sky_io@outlook.com (K4i)</author><atom:modified>Thu, 18 Jun 2026 10:00:00 +0800</atom:modified><guid>https://k4i.top/posts/activation-functions-neural-networks/</guid><description>&lt;p&gt;Activation functions look like small details. In a neural-network layer, the heavy computation is usually the matrix multiplication:&lt;/p&gt;
&lt;p&gt;$$z = Wx + b$$&lt;/p&gt;
&lt;p&gt;Then we apply a simple function elementwise:&lt;/p&gt;
&lt;p&gt;$$a = \phi(z)$$&lt;/p&gt;
&lt;p&gt;It is tempting to treat \(\phi\) as a plug-in choice: sigmoid, tanh, ReLU, GELU, SiLU, Mish, or one of hundreds of proposed variants. But the activation is not decoration. It decides whether stacked layers can represent nonlinear functions, whether gradients keep flowing, whether hidden values stay centered, and whether the model pays a large runtime cost for a tiny accuracy gain.&lt;/p&gt;</description><dc:creator>K4i</dc:creator><media:content url="https://k4i.top//images/posts/activation-functions-neural-networks/activation-function-icon.svg" medium="image"><media:title type="html">featured image</media:title></media:content><category>deep-learning</category><category>activation-function</category><category>neural-network</category><category>gradient-descent</category><category>AI</category></item></channel></rss>