<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:media="http://search.yahoo.com/mrss/"><channel><title>World-Model on k4i's blog</title><link>https://k4i.top/zh/tags/world-model/</link><description>Recent content in World-Model on k4i's blog</description><generator>Hugo -- gohugo.io</generator><language>zh</language><managingEditor>sky_io@outlook.com (K4i)</managingEditor><webMaster>sky_io@outlook.com (K4i)</webMaster><copyright>All content is subject to the license of &lt;a rel="license noopener" href="https://creativecommons.org/licenses/by-nc-sa/4.0/" target="_blank"&gt;CC BY-NC-SA 4.0&lt;/a&gt; .</copyright><lastBuildDate>Thu, 18 Jun 2026 10:00:00 +0800</lastBuildDate><atom:link href="https://k4i.top/zh/tags/world-model/index.xml" rel="self" type="application/rss+xml"/><item><title>具身智能模型的三条路线：VLA、世界模型与 WAM</title><link>https://k4i.top/zh/posts/embodied-models-vla-jepa-wam/</link><pubDate>Thu, 18 Jun 2026 10:00:00 +0800</pubDate><author>sky_io@outlook.com (K4i)</author><atom:modified>Thu, 18 Jun 2026 10:00:00 +0800</atom:modified><guid>https://k4i.top/zh/posts/embodied-models-vla-jepa-wam/</guid><description>&lt;p&gt;如果大语言模型只需要回答一句话，那么具身智能模型要多回答一个问题：&lt;strong&gt;这句话接下来要变成什么动作？&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;比如你对桌面机器人说：“把红色杯子推到盘子旁边。”模型不只要识别杯子、理解“旁边”，还要决定机械臂下一步往哪里移动、夹爪什么时候闭合、动作失败后如何修正。这里的难点不是多模态本身，而是语言、视觉、物理状态和连续动作之间有闭环：动作会改变世界，新的世界又会改变下一步动作。&lt;/p&gt;</description><dc:creator>K4i</dc:creator><media:content url="https://k4i.top//images/posts/embodied-models-vla-jepa-wam/embodied-models-cover.svg" medium="image"><media:title type="html">featured image</media:title></media:content><category>embodied-ai</category><category>robotics</category><category>vla</category><category>world-model</category><category>jepa</category><category>notes</category></item></channel></rss>