DexDiffuser: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation

  • 1The University of Hong Kong,
  • 2UC Berkeley,
  • 3Shanghai AI Laboratory,
  • 4Tianjin University

Corresponding Authors

Abstract

Dexterous manipulation with contact-rich interactions is crucial for advanced robotics. While recent diffusion-based planning approaches show promise for simpler manipulation tasks, they often produce unrealistic ghost states (e.g., the object automatically moves without hand contact) or lack adaptability when handling complex sequential interactions. In this work, we introduce DexDiffuser, an interaction-aware diffusion planning framework for adaptive dexterous manipulation. DexDiffuser models joint state-action dynamics through a dual-phase diffusion process which consists of pre-interaction contact alignment and post-contact goal-directed control, enabling goal-adaptive generalizable dexterous manipulation. Additionally, we incorporate dynamics model-based dual guidance and leverage large language models for automated guidance function generation, enhancing generalizability for physical interactions and facilitating diverse goal adaptation through language cues. Experiments on physical interaction tasks such as door opening, pen and block re-orientation, and hammer striking demonstrate DexDiffuser's effectiveness on goals outside training distributions, achieving over twice the average success rate (59.2% vs. 29.5%) compared to existing methods. Our framework achieves 70.0% success on 30-degree door opening, 40.0% and 36.7% on pen and block half-side re-orientation respectively, and 46.7% on hammer nail half drive, highlighting its robustness and flexibility in contact-rich manipulation.


Overview of DexDiffuser. (a) Previous diffusers directly apply goal guidance to object states, which leads to ghost states where objects move independently leaving hand states unchanged. (b) DexDiffuser introduces contact guidance that jointly influences both hand/object states and hand actions, while maintaining tight state-action coupling. It not only prevents ghost states, but also enables precise goal adaptation through coordinated hand-object motion. (c) Quantitative comparisons with previous methods on goal-adapted interaction tasks.

Results

Hand Door Task

Door Held in Position, Hand Released


Door Open 30o

Door Open 50o

Door Open 70o

Door Open 90o

Door Open 110o

Door Closing

Hand Pen Task


Right Half Re-orientation (Pen Aligned, Hand Stabilizes)

Left Half Re-orientation (Pen Aligned, Hand Stabilizes)

Dynamic Goal Rotation (With Goal Yaw Rotating, Pen Rotating around Z-axis)

Hand Hammer Task


Nail Fully Driven

Nail Partially Driven, Hammer Retracts

Manipulate Block Task


Goal Yaw Positive

Goal Yaw Negative