<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:media="http://search.yahoo.com/mrss/"><channel><title>Rope on k4i's blog</title><link>https://k4i.top/tags/rope/</link><description>Recent content in Rope on k4i's blog</description><generator>Hugo -- gohugo.io</generator><language>en</language><managingEditor>sky_io@outlook.com (K4i)</managingEditor><webMaster>sky_io@outlook.com (K4i)</webMaster><copyright>All content is subject to the license of &lt;a rel="license noopener" href="https://creativecommons.org/licenses/by-nc-sa/4.0/" target="_blank"&gt;CC BY-NC-SA 4.0&lt;/a&gt; .</copyright><lastBuildDate>Thu, 28 May 2026 21:53:12 +0800</lastBuildDate><atom:link href="https://k4i.top/tags/rope/index.xml" rel="self" type="application/rss+xml"/><item><title>From Absolute Positional Encoding to RoPE: Why Position Can Be a Rotation</title><link>https://k4i.top/posts/positional-encoding-to-rope/</link><pubDate>Thu, 28 May 2026 21:53:12 +0800</pubDate><author>sky_io@outlook.com (K4i)</author><atom:modified>Thu, 04 Jun 2026 01:12:37 +0800</atom:modified><guid>https://k4i.top/posts/positional-encoding-to-rope/</guid><description>&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Self-attention has a surprising weakness: by itself, it does not know word order.&lt;/p&gt;
&lt;p&gt;Consider these two sentences:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;I like you&lt;/li&gt;
&lt;li&gt;You like me&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;They contain almost the same tokens, but their meanings are different. RNNs read inputs step by step, and CNNs preserve local neighborhoods through convolution windows. Standard self-attention, however, mostly compares all token vectors with all other token vectors. Its core formula is:&lt;/p&gt;
&lt;p&gt;$$\text{Attention}(Q,K,V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d}}\right)V$$&lt;/p&gt;
&lt;p&gt;If we permute the input tokens and permute the output in the same way, the attention computation does not naturally resist that permutation. Attention is good at modeling content relationships, but order must be injected separately.&lt;/p&gt;</description><dc:creator>K4i</dc:creator><media:content url="https://k4i.top//images/posts/positional-encoding-to-rope/rope-rotation-icon.svg" medium="image"><media:title type="html">featured image</media:title></media:content><category>llm</category><category>transformer</category><category>attention</category><category>position-encoding</category><category>rope</category><category>AI</category></item></channel></rss>