Abstract
GR-Dexter introduces a hardware-model-data framework for bimanual dexterous-hand robot manipulation using vision-language-action models, combining teleoperation data and multimodal datasets to achieve robust generalization.
Vision-language-action (VLA) models have enabled language-conditioned, long-horizon robot manipulation, but most existing systems are limited to grippers. Scaling VLA policies to bimanual robots with high degree-of-freedom (DoF) dexterous hands remains challenging due to the expanded action space, frequent hand-object occlusions, and the cost of collecting real-robot data. We present GR-Dexter, a holistic hardware-model-data framework for VLA-based generalist manipulation on a bimanual dexterous-hand robot. Our approach combines the design of a compact 21-DoF robotic hand, an intuitive bimanual teleoperation system for real-robot data collection, and a training recipe that leverages teleoperated robot trajectories together with large-scale vision-language and carefully curated cross-embodiment datasets. Across real-world evaluations spanning long-horizon everyday manipulation and generalizable pick-and-place, GR-Dexter achieves strong in-domain performance and improved robustness to unseen objects and unseen instructions. We hope GR-Dexter serves as a practical step toward generalist dexterous-hand robotic manipulation.
Community
VLAs go from grippers to 21 DoF dexterous ByteDexter V2 :)
wooooow that demo video..... gosh that's impressive,
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- METIS: Multi-Source Egocentric Training for Integrated Dexterous Vision-Language-Action Model (2025)
- RoboCOIN: An Open-Sourced Bimanual Robotic Data COllection for INtegrated Manipulation (2025)
- MiVLA: Towards Generalizable Vision-Language-Action Model with Human-Robot Mutual Imitation Pre-training (2025)
- TWIST2: Scalable, Portable, and Holistic Humanoid Data Collection System (2025)
- EveryDayVLA: A Vision-Language-Action Model for Affordable Robotic Manipulation (2025)
- Emergence of Human to Robot Transfer in Vision-Language-Action Models (2025)
- MILE: A Mechanically Isomorphic Exoskeleton Data Collection System with Fingertip Visuotactile Sensing for Dexterous Manipulation (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper