The Thirty-Minute Pianist
The Thirty-Minute Pianist
Simulation-trained robot policies know what to do but not quite how this body does it. The gap between simulated and real finger positions — fractions of a millimeter — is the difference between striking a piano key and missing it. For dexterous bimanual piano playing, where ten fingers must hit precise keys at precise times, the sim-to-real gap is not a nuisance but a barrier.
HandelBot bridges it in thirty minutes. A two-stage pipeline first corrects spatial alignment — adjusting lateral finger joints based on physical rollouts to calibrate where the simulated fingers think they are versus where they actually land. Then residual reinforcement learning adds fine-grained corrective actions on top of the simulation policy, letting the system learn micro-adjustments that no simulator can predict.
The result: 1.8x better performance than direct simulation deployment across five compositions, using only thirty minutes of physical interaction data.
The through-claim: the sim-to-real gap in dexterous manipulation is not a single problem but two problems with different solutions. Spatial misalignment — fingers being in slightly wrong positions — is a calibration problem solvable by structured correction. Dynamic misalignment — fingers responding slightly wrong to commands — is a control problem solvable by residual learning. Treating them as a single “domain gap” conflates a static offset with a dynamic residual, and solutions that address only one fail at the other. The thirty-minute efficiency comes from solving each problem with the right tool, not from a faster version of the wrong tool.