BimanualShift: A Modular Transfer Framework for Generalizing Unimanual Policies to Bimanual Manipulation

Yechen Fan, Xianyou Ji, Chenyang Song, Huixin He, Haibin Wu, Jinhua Ye, Gengfeng Zheng
Corresponding authors

Overview

Abstract image

BimanualShift achieves skill transfer from arbitrary pretrained unimanual policies to complex bimanual manipulation tasks by adapting fully frozen unimanual policy priors.

BimanualShift Framework

Framework

The framework is built upon frozen pretrained unimanual policies and consists of three core learnable modules. Module 1 (Visual Tracker) uses semantic masks to decouple the shared workspace into arm-specific visual inputs, eliminating attention interference. Module 2 (Action Generator) acts as an adapter that transforms high-level instructions into dynamic skill weights and compensation vectors, guiding the low-level policies to achieve coordinated behaviors. Module 3 (Skill Memory) retrieves relevant past experience and incorporates closed-loop reflection, enabling continual learning and rapid adaptation to new tasks.

Simulation in RLBench2

We evaluate BimanualShift on six representative bimanual manipulation tasks from RLBench2. RLBench2 extends the widely used unimanual benchmark RLBench to bimanual manipulation scenarios. In simulation, we use two Franka Panda robotic arms equipped with parallel grippers. To fully cover the workspace, we deploy six noise-free RGB-D cameras (256 × 256 resolution) at the front, left shoulder, right shoulder, left wrist, right wrist, and top-down viewpoints.

Push Two Buttons

Straighten Rope

Lift a Ball

Pick Up a Plate

Sweep DustPan

Take Tray Out of Oven

Real-World Tasks

The evaluation involves eight real-world tasks covering diverse challenges: Flower Arrangement and Pouring Water require coordinated stability and smooth motion control; Toasting Completion and Vegetable Sorting test synchronous grasping and timing; Quilt Folding and Cable Routing demand robust control under deformable object dynamics; while Toaster Activation and Block Threading involve heterogeneous actions and high-precision spatial alignment in constrained spaces.

Vegetable Sorting

Pouring Water

Cable Routing

Block Threading

Toaster Activation

Toasting Completion

Flower Arrangement

Quilt Folding

Generalization

To evaluate the generalization capability of BimanualShift under unseen conditions, we conduct a generalization study on the Block Threading task using the best-performing configuration, BimanualShift-PerAct. Five types of perturbations are considered: unseen object color, unseen object shape, lighting change, left–right task exchange, and unseen background.

Unseen Object Color

Unseen Object Shape

Lighting Change

Left-Right Task Exchange

Unseen Background

Reliability in Industrial Environments

To validate the reliability of BimanualShift in industrial deployment, we introduce two representative sources of real-world disturbances on the physical robot platform: 1) Extreme Glare, induced by a high-intensity point light to create severe reflections and shadows, and 2) Camera Perturbations, simulated by injecting a 1 cm translational error and a 2° rotational error into the camera extrinsic parameters.

Extreme Glare

Camera Perturbation

Lifelong Learning Capability

To evaluate the lifelong learning capability of the skill memory module in BimanualShift, we construct a long-horizon task that requires sequential composition of multiple actions, after the model has learned the Toasting Completion and Toaster Activation skills. The task consists of four actions: Action 1 (inserting the toast into the toaster), Action 2 (pressing the activation button), Action 3 (grasping the toasted bread), and Action 4 (placing the bread onto a plate).

Long-Horizon Toast Preparation Task