Unit 6: Run and Improve Your Bimanual Policy — DK1 Learning Path

Inference Setup for Two Arms

Bimanual inference runs a single policy network that outputs actions for both arms simultaneously. The observation-action loop runs at 50Hz — the same frequency as your training data — with both follower arms executing their respective action chunks in sync.

source ~/dk1-env/bin/activate

# Keep your hand near the E-stop for the first 3 evaluation episodes

python -m lerobot.scripts.eval \
  --policy-checkpoint ~/dk1-policies/cube-handoff-v1/checkpoint_XXXXX \
  --robot-path ~/dk1-config.yaml \
  --robot-type dk1_bimanual \
  --device cuda \
  --num-eval-episodes 10 \
  --record-video \
  --output-dir ~/dk1-evals/v1

# Replace XXXXX with your best checkpoint step (from Unit 5 loss curve analysis)
# --record-video saves both arm views as separate mp4 files for failure analysis

For the first evaluation run, allow the policy to execute without interruption unless a physical collision is imminent. Bimanual policies often produce unexpected motions in the first 1–2 episodes as they adapt to the real environment. Episodes 3–10 are the meaningful evaluation data. Note whether the policy consistently reaches the same phases of the task (approach, grasp, transfer, place, home) even when it ultimately fails — partial success is diagnostic information.

Bimanual Evaluation Protocol

Use a structured protocol. Informal evaluation — "it looks like it's working" — is unreliable for bimanual policies because partial successes are much more common and can mask a fundamentally broken handoff.

Protocol Item	Bimanual Specification
Number of episodes	10 minimum; 20 for high-confidence results before adding more data
Cube starting position	Fixed, tape-marked position — same as your Unit 4 training setup
Lighting	Must match training conditions. Even opening a window can shift lighting enough to affect the workspace camera
What counts as full success	Cube starts on right side, ends on left side, both arms return to home pose, no human contact during episode
What counts as partial success	Correct grasp achieved but transfer fails, or transfer succeeds but placement is off-target. Log these separately.
Failure classification	Log: (A) grasp failure, (B) handoff failure — arm-to-arm transfer drops, (C) placement failure, (D) timeout. The handoff failure category (B) is unique to bimanual and most informative for improvement.
Report metric	Full success rate (episodes with all 4 phases correct). Also report partial success rate. Example: "4/10 full, 7/10 reached handoff phase".

Common Bimanual Failure Modes

These failure modes are distinct from single-arm failures and require bimanual-specific fixes:

Arms arrive at handoff point asynchronously: One arm reaches the handoff position and waits; the other arrives late. The policy has not learned the relative timing between arms. Fix: add 20 demonstrations where both arms explicitly pause at the handoff point for 1–2 seconds before completing the transfer. This makes the synchronization requirement explicit in the data.
Handoff drop — cube falls between the two arms: The most common bimanual-specific failure. The receiving arm closes its gripper too early or too late relative to the giving arm's release. Fix: collect 15 slow-motion handoff demonstrations specifically at 25% speed. The exaggerated timing gives the policy a clearer signal about the gripper state transition sequence.
Policy converges on a single-arm strategy: The policy learns to complete the task with one arm only, ignoring the other arm's capabilities. This happens when one arm's demonstrations are more consistent than the other's. Fix: review each arm's action error from the training curves (Unit 5) and collect additional demos specifically targeting the weaker arm's phases.
Inter-arm collision: Both arms attempt to occupy the same workspace location. This is a safety event — enable collision avoidance in the DK1 hardware server (collision_avoidance: true in dk1-config.yaml) during evaluation. Training on demonstrations that consistently respect safe arm separation will prevent most collisions; the hardware-level guard handles edge cases.
Phase desynchronization at deployment: The policy executes the correct actions but not in the right temporal order — e.g., right arm places before left arm has transferred. This is an action chunking artifact where the chunk boundaries don't align with task phase transitions. Fix: reduce chunk_size from 100 to 50 and retrain.

The Data Flywheel for Bimanual Improvement

The same improvement loop that works for single-arm policies works for bimanual — with one bimanual-specific addition: always target the first failure mode in the task sequence. The handoff (phase B) cannot be improved if grasp (phase A) is still inconsistent. Fix failures in task sequence order.

Evaluate

Run 10 episodes. Classify each failure by phase (A/B/C/D)

Target

Identify the first failure phase. Collect 20–30 demos specifically covering that phase

Retrain

Add targeted demos to dataset. Retrain from scratch or fine-tune the best checkpoint

Evaluate

Run 10 episodes again. Did full success rate improve? Move to next failure phase.

What's Next

You now have a working bimanual learning pipeline. The cube handoff is the foundation — the same architecture scales to significantly more complex tasks:

Variable-Speed Teleoperation

Speed-adaptive teleoperation for contact-rich tasks where force feedback changes the optimal motion speed.

Add Dexterous Hands

Combine the DK1 arms with the Orca Hand for finger-level dexterity on tasks requiring precise in-hand manipulation.

Scale Your Dataset

Techniques for scaling bimanual data collection across operators, tasks, and hardware configurations.

Share Your Results

Post your success rate, dataset, and policy in the DK1 forum. Bimanual results are among the most valuable the community collects.

Unit 6 Complete When...

Your DK1 completes the cube handoff task autonomously with a full success rate of at least 6/10 in a structured evaluation run. You have classified all failure episodes by phase (A/B/C/D) and identified which phase is responsible for most failures. You have watched the failure videos and can articulate specifically what went wrong. You understand the bimanual data flywheel well enough to plan your next improvement iteration.

Run and Improve Your Bimanual Policy

Inference Setup for Two Arms

Bimanual Evaluation Protocol

Common Bimanual Failure Modes

The Data Flywheel for Bimanual Improvement

Evaluate

Target

Retrain

Evaluate

What's Next

Variable-Speed Teleoperation

Add Dexterous Hands

Scale Your Dataset

Share Your Results

Unit 6 Complete When...

You built a working bimanual robot learning system.