Inference Setup for Two Arms

Bimanual inference runs a single policy network that outputs actions for both arms simultaneously. The observation-action loop runs at 50Hz — the same frequency as your training data — with both follower arms executing their respective action chunks in sync.

source ~/dk1-env/bin/activate # Keep your hand near the E-stop for the first 3 evaluation episodes python -m lerobot.scripts.eval \ --policy-checkpoint ~/dk1-policies/cube-handoff-v1/checkpoint_XXXXX \ --robot-path ~/dk1-config.yaml \ --robot-type dk1_bimanual \ --device cuda \ --num-eval-episodes 10 \ --record-video \ --output-dir ~/dk1-evals/v1 # Replace XXXXX with your best checkpoint step (from Unit 5 loss curve analysis) # --record-video saves both arm views as separate mp4 files for failure analysis

For the first evaluation run, allow the policy to execute without interruption unless a physical collision is imminent. Bimanual policies often produce unexpected motions in the first 1–2 episodes as they adapt to the real environment. Episodes 3–10 are the meaningful evaluation data. Note whether the policy consistently reaches the same phases of the task (approach, grasp, transfer, place, home) even when it ultimately fails — partial success is diagnostic information.

Bimanual Evaluation Protocol

Use a structured protocol. Informal evaluation — "it looks like it's working" — is unreliable for bimanual policies because partial successes are much more common and can mask a fundamentally broken handoff.

Protocol Item Bimanual Specification
Number of episodes10 minimum; 20 for high-confidence results before adding more data
Cube starting positionFixed, tape-marked position — same as your Unit 4 training setup
LightingMust match training conditions. Even opening a window can shift lighting enough to affect the workspace camera
What counts as full successCube starts on right side, ends on left side, both arms return to home pose, no human contact during episode
What counts as partial successCorrect grasp achieved but transfer fails, or transfer succeeds but placement is off-target. Log these separately.
Failure classificationLog: (A) grasp failure, (B) handoff failure — arm-to-arm transfer drops, (C) placement failure, (D) timeout. The handoff failure category (B) is unique to bimanual and most informative for improvement.
Report metricFull success rate (episodes with all 4 phases correct). Also report partial success rate. Example: "4/10 full, 7/10 reached handoff phase".

Common Bimanual Failure Modes

These failure modes are distinct from single-arm failures and require bimanual-specific fixes:

  • Arms arrive at handoff point asynchronously: One arm reaches the handoff position and waits; the other arrives late. The policy has not learned the relative timing between arms. Fix: add 20 demonstrations where both arms explicitly pause at the handoff point for 1–2 seconds before completing the transfer. This makes the synchronization requirement explicit in the data.
  • Handoff drop — cube falls between the two arms: The most common bimanual-specific failure. The receiving arm closes its gripper too early or too late relative to the giving arm's release. Fix: collect 15 slow-motion handoff demonstrations specifically at 25% speed. The exaggerated timing gives the policy a clearer signal about the gripper state transition sequence.
  • Policy converges on a single-arm strategy: The policy learns to complete the task with one arm only, ignoring the other arm's capabilities. This happens when one arm's demonstrations are more consistent than the other's. Fix: review each arm's action error from the training curves (Unit 5) and collect additional demos specifically targeting the weaker arm's phases.
  • Inter-arm collision: Both arms attempt to occupy the same workspace location. This is a safety event — enable collision avoidance in the DK1 hardware server (collision_avoidance: true in dk1-config.yaml) during evaluation. Training on demonstrations that consistently respect safe arm separation will prevent most collisions; the hardware-level guard handles edge cases.
  • Phase desynchronization at deployment: The policy executes the correct actions but not in the right temporal order — e.g., right arm places before left arm has transferred. This is an action chunking artifact where the chunk boundaries don't align with task phase transitions. Fix: reduce chunk_size from 100 to 50 and retrain.

The Data Flywheel for Bimanual Improvement

The same improvement loop that works for single-arm policies works for bimanual — with one bimanual-specific addition: always target the first failure mode in the task sequence. The handoff (phase B) cannot be improved if grasp (phase A) is still inconsistent. Fix failures in task sequence order.

1

Evaluate

Run 10 episodes. Classify each failure by phase (A/B/C/D)

2

Target

Identify the first failure phase. Collect 20–30 demos specifically covering that phase

3

Retrain

Add targeted demos to dataset. Retrain from scratch or fine-tune the best checkpoint

4

Evaluate

Run 10 episodes again. Did full success rate improve? Move to next failure phase.

What's Next

You now have a working bimanual learning pipeline. The cube handoff is the foundation — the same architecture scales to significantly more complex tasks:

Unit 6 Complete When...

Your DK1 completes the cube handoff task autonomously with a full success rate of at least 6/10 in a structured evaluation run. You have classified all failure episodes by phase (A/B/C/D) and identified which phase is responsible for most failures. You have watched the failure videos and can articulate specifically what went wrong. You understand the bimanual data flywheel well enough to plan your next improvement iteration.

You built a working bimanual robot learning system.

You configured a leader/follower architecture, collected synchronized two-arm demonstrations, trained a coordinated policy from scratch, and deployed it on real hardware. Bimanual manipulation at this level is where research labs operate. The foundation you have built here scales to assembly, cooking, and contact-rich tasks that were out of reach before you started this path.