BEIJING, April 29, 2025 /PRNewswire/ — PsiBot’s Robots Are Playing Mahjong While Others Are Still Learning to Walk
While most embodied robots are still focusing on basic motion control such as walking and running, PsiBot’s robots have pioneered long-horizon complex tasks in open environments, playing Mahjong with humans.
And Mahjong is no simple task.
It’s a strategic, multi-layered game requiring precise manipulation—drawing, discarding, organizing tiles—and even more importantly, long-horizon planning: evaluating one’s own tiles, interpreting the game state, and predicting opponents’ moves. This blend of dexterous action and complex reasoning makes Mahjong one of the most demanding challenges for robotics today.
Redefining Dexterity: From Pick & Place to Long-Horizon Manipulation
Robotic manipulation can be broken down into three levels of complexity:
L1: Basic pick-and-place tasks in static environments without autonomous reasoning and dexterous manipulation
L2: Human-like manipulation in dynamic environments—featuring grips like lateral pinch, tripod, or power sphere without cognition chain to complete complex multi-modal tasks.
L3: Autonomous reasoning system based on Chain of Action Thought (CoAT) to make decisions and perform long-horizon complicated manipulation in open environments.
Playing Mahjong is a typical L3 task.
Only with L3-level CoAT capabilities to perform long-horizon tasks can robots understand their environment, make decisions, learn from experience, and adapt to new tasks—skills essential for real-world deployment. PsiBot addresses this challenge with its novel R1 model, built on a hierarchical end-to-end architecture and powered by reinforcement learning (RL).
In real-world Mahjong scenarios, R1 has demonstrated sustained and coherent CoAT capability for up to 30 minutes, executing open-ended tasks and reasoning autonomously. It empowers human-robot interaction, interaction between robots, and environment interaction, demonstrating the VLA’s superb reasoning ability and the RL’s ability that exceeds the upper ceiling of human thinking and manipulation. This breakthrough marks a key leap from performing a single action to completing the closed loop of perception, reasoning and execution in the complex physical world, providing a technical paradigm for embodied intelligence to land in commercial scenarios.
A Breakthrough in Architecture: PsiBot’s hierarchical end-to-end model
Founded in 2024, PsiBot has taken the lead to introduce a hierarchical end-to-end architecture, composed of a planner and a controller, which are connected implicitly by Action Tokenizer and jointly perform long-horizon dexterous manipulation, a paradigm that has now become a consensus in the industry. In addition, PsiBot also uses RL throughout the system, improving task success rates, extending CoAT durations, and creating a data flywheel that continually enhances data utilization efficiency.
From Tech Demo to Industrial Edge
The R1 model is not just a lab prototype.
With L3 capabilities, it can perform in open environments —reasoning, adapting, and manipulating over long time horizons, which can be deployed in industrial, logistical, retailing, and To-C scenarios. Currently, PsiBot has partnered with leading firms in the manufacturing, retailing, and e-commerce industries to explore opportunities in commercialized scenarios.
In the end, PsiBot isn’t just building robots that can walk or talk. It’s building robots that can reason, plan, and perform—— in the real world.