Motubrain: the World Action Model for embodied AI

An independent bilingual guide to ShengShu Technology's Motubrain launch, benchmark claims, architecture, and current access status.

Independent resource. Motubrain.org is not affiliated with ShengShu Technology.

Benchmark snapshot

Key figures reported by official or benchmark sources as of April 29, 2026.

63.77 WorldArena EWM Score reported by ShengShu for Motubrain

63.77

WorldArena EWM Score reported by ShengShu for Motubrain

95.8 / 96.1 RoboTwin 2.0 clean and randomized scores reported on the official page

95.8 / 96.1

RoboTwin 2.0 clean and randomized scores reported on the official page

Apr 29, 2026 Public launch date in the ShengShu PRNewswire announcement

Apr 29, 2026

Public launch date in the ShengShu PRNewswire announcement

Motubrain appears at #1 on both benchmark tables

These leaderboard screenshots make the reported RoboTwin 2.0 and WorldArena rankings easier to inspect at a glance.

RoboTwin 2.0 Leaderboard

The official screenshot shows Motubrain at 95.8 in the Clean setting and 96.1 in the Randomized setting.

WorldArena Leaderboard

The official screenshot shows Motubrain ranked first with a 63.77 EWMScore.

What is Motubrain?

Motubrain is presented by ShengShu Technology as a World Action Model: a unified embodied AI model that connects what a robot sees with the actions it should take.

World Action Model

The official framing moves beyond video-only world modeling by joining perception, prediction, and robot action in one system.

Embodied AI Focus

The launch describes robots acting across homes, industrial spaces, and commercial environments rather than a consumer chatbot or memory app.

Source-Led Status

As of April 29, 2026, official pages explain the model and its benchmark claims, but do not show a public self-serve API or download.

Name Variants

Searches may use Motubrain or MotuBrain. This guide treats both as the same ShengShu World Action Model unless a source says otherwise.

How the World Action Model works

The core idea is to learn video, language, and action together so a robot can reason about what changes next and what to do next.

ShengShu describes Motubrain as learning video and action as continuous modalities, so world prediction and action generation are trained together rather than bolted together.

From video worlds to robot action

Motubrain sits on a visible ShengShu research path: video-based embodied priors, unified latent actions, and a World Action Model for physical execution.

1

Vidar connects video priors to manipulation

Vidar framed video diffusion as a scalable prior for robot manipulation, using multi-view trajectories and minimal demonstrations to adapt to new embodiments.

2

Motus unifies five modeling modes

Motus introduced a latent-action world model that can switch between VLA control, world modeling, inverse dynamics, video generation, and joint video-action prediction.

3

Motubrain turns prediction into action

Motubrain extends that direction into a World Action Model: one architecture for understanding scenes, anticipating change, and generating robot actions.

How to read the launch claims

Use the source trail before treating any model score as independently settled.

1

Start with the official page

Use ShengShu's Motubrain page for the stated capabilities, partner context, and benchmark figures.

2

Check benchmark mechanics

WorldArena explains EWM Score, while RoboTwin 2.0 documents the dual-arm manipulation benchmark context.

3

Separate access from awareness

The current public materials explain Motubrain, but this site found no official self-serve API, downloadable model, or public demo.

Where to verify Motubrain information

Use these primary and technical sources to separate official claims, benchmark context, related research, and current access status.

ShengShu's product page is the first stop for the stated model positioning, key capabilities, partner logos, and leaderboard claims.

The PRNewswire release adds the public launch date, architecture narrative, benchmark summary, deployment claims, partners, and Vidu connection.

WorldArena explains EWMScore and how embodied world models are judged across video quality, functional utility, and action-planning roles.

RoboTwin 2.0 documents the dual-arm manipulation setting, 50-task evaluation, five robot embodiments, and domain randomization context.

Motus is useful background for understanding latent actions, Mixture-of-Transformers, and the five-mode unified modeling idea behind the lineage.

Public materials currently describe Motubrain, but they do not show a self-serve API, downloadable model, or public demo hosted on this site.

Why robotics teams are watching Motubrain

The launch frames Motubrain as a shift from task-specific robot systems toward scalable embodied intelligence.

One brain, many skills

Official materials say task variety improves multi-task performance instead of requiring isolated skill training for every behavior.

One brain, any robot

Motubrain is positioned as cross-embodiment, designed to adapt across robot types rather than being tied to one hardware platform.

Long-horizon execution

The model is described as learning full task sequences directly, including complex multi-step work beyond short atomic actions.

WorldArena context

WorldArena evaluates embodied world models across perceptual and functional utility, including action-planning roles.

RoboTwin 2.0 context

RoboTwin 2.0 is a large-scale dual-arm manipulation benchmark with 50 tasks and domain randomization.

Careful source posture

Benchmark numbers on this site are reported as source claims, not independent certification by Motubrain.org.

Frequently asked questions about Motubrain