In recent years, roboticists have developed a wide range of systems designed to accomplish a variety of real-world tasks, from completing household chores to delivering packages or searching for target objects in bounded environments.
A key goal in this area has been to develop algorithms that enable the reliable transfer of specific skills between robots with different bodies and characteristics, which would help to quickly train robots for new tasks, thereby expanding their capabilities.
Researchers at UC Berkeley have developed RoVi-Aug, a new computational framework designed to augment robotic data and facilitate the transfer of skills between different robots. The proposed approach, described in a pre-published article on arXiv and scheduled to be presented at the 2024 Conference on Robot Learning (CoRL), uses cutting-edge generative models to augment image data and create demonstrations of synthesized visual tasks with different camera views for distinct robots.
“The success of modern machine learning systems, particularly generative models, demonstrates impressive generalizability and motivates robotics researchers to explore how to achieve similar generalizability in robotics,” Lawrence Chen (PhD candidate, AUTOLab, EECS & IEOR, BAIR, UC Berkeley) and Chenfeng Xu (PhD candidate, Pallas Lab & MSC Lab, EECS & ME, BAIR, UC Berkeley), told Tech Xplore.
“We have been studying the generalization problem of cross-viewpoints and robots since the beginning of this year.”
In their previous research, Chen, Xu and their colleagues identified some of the challenges of generalizing learning across different robots. Specifically, they found that when scenes included in robotic datasets are unevenly distributed, for example, containing a predominance of specific robot visuals and camera angles over others, this makes them less effective to teach the same skills to different robots.
Interestingly, researchers have found that many existing robot training datasets are unbalanced, including some of the most well-established. For example, even the Open-X Embodiment (OXE) dataset, a dataset widely used for training robotic algorithms and containing demonstrations of different robots performing various tasks, contains more data for some robots, such as the Franka and xArm manipulators.
“Such biases in the dataset make the robot policy model tend to be overfit to specific robot types and viewpoints,” Chen and Xu said.
“To alleviate this problem, in February 2024, we proposed a test-time adaptation algorithm, Mirage, which uses cross-painting to transform an invisible target robot into the source robot seen during training, creating the illusion that the source robot is performing the task at the time of testing.
Mirage, the algorithm the researchers presented in their previous paper, was found to enable non-shooting skill transfer to invisible target robots. However, the model was found to have various limitations.
First, to work well, Mirage requires accurate robot models and camera arrays. Additionally, the algorithm does not support fine-tuning of robot policies and is limited to processing images with few changes in camera pose, as it is prone to errors in reprojecting the depth of the image.
“In our latest work, we present an alternative algorithm called RoVi-Aug,” said Chen and Xu. “The goal of this algorithm is to overcome the limitations of Mirage by improving the robustness and generalizability of policies during training, focusing on handling diverse robot visuals and camera poses, rather than relying on the “Cross-painting approach at test time with strict assumptions about known camera poses and robot URDFs (unified robot description formats).”
RoVi-Aug, the new robotic data augmentation framework introduced by the researchers, is based on state-of-the-art diffusion models. These are computer models capable of augmenting images of a robot’s trajectories, generating synthetic images showing different robots performing tasks, seen from different points of view.
The researchers used their framework to compile a dataset containing a wide range of synthetic robot demonstrations, then trained robot policies on this dataset. This in turn allows for the transfer of skills to new robots that have not previously been exposed to the task included in the demonstration, known as zero learning.
Notably, the robot’s policies can also be fine-tuned to achieve ever-better performance in a given task. Additionally, unlike the Mirage model presented in the team’s previous article, their new algorithm can support drastic changes in camera angles.
“Unlike test-time adaptation methods like Mirage, RoVi-Aug does not require any additional processing during deployment, does not rely on knowing camera angles in advance, and supports policy fine-tuning” , explained Chen and Xu. “It also goes beyond traditional co-training on multi-robot, multi-task datasets by actively encouraging the model to learn the full range of robots and skills across the datasets.”
The RoVi-Aug model has two distinct components, namely the robot augmentation (Ro-Aug) and viewpoint augmentation (Vi-Aug) modules. The first of these components is designed to synthesize demonstration data featuring different robotic systems, while the second can produce demonstrations viewed from different angles.
“Ro-Aug has two key features: a fine-tuned SAM model to segment the robot and a fine-tuned ControlNet to replace the original robot with another,” Chen and Xu said. “Meanwhile, Vi-Aug leverages ZeroNVS, a new state-of-the-art view synthesis model, to generate new perspectives of the scene, making the model adaptable to different camera viewpoints.”
As part of their study, the researchers used their model to produce an augmented robot dataset, then tested the effectiveness of this dataset for training policies and skill transfer between different robots. Their initial findings highlight the potential of Rovi-Aug, as the algorithm was found to enable the formation of policies that generalize well to different robots and camera configurations.
“Its key innovation lies in the application of generative models, such as image-to-image generation and synthesis of new views, to the challenge of learning robots between embodiments,” Chen and Xu explained.
“While previous work has used generative augmentation to improve the robustness of policies to distracting objects and backgrounds, RoVi-Aug is the first to show how this approach can facilitate the transfer of skills between different robots.”
This recent work by Chen and Xu could contribute to the advancement of robots, helping robotics researchers easily expand the skill sets of their systems. In the future, it could be used by other teams to transfer skills between different robots or develop more effective general robotic policies.
“For example, imagine a scenario where a researcher has spent considerable effort collecting data and training policy on a Franka robot to perform a task, but you only have an UR5 robot,” Chen and Xu said .
“RoVi-Aug allows you to reuse Franka data and deploy policy on the UR5 robot without additional training. This is particularly useful because robot policies are often sensitive to changes in camera viewpoint, and setting up identical camera angles on different robots is a challenge. RoVi-Aug eliminates the need for such precise configurations. »
Since collecting large quantities of real-world robot demonstrations can be very expensive and time-consuming, RoVi-Aug could provide a cost-effective alternative for easily compiling reliable robot training datasets.
Even though the images in these datasets would be synthetic (i.e., generated by AI), they could still prove useful in producing reliable robotic policies. The researchers are currently working with colleagues at Toyota Research Laboratories and other institutes to apply and extend their approach to other robotics datasets.
“We now aim to further refine RoVi-Aug by integrating recent developments in generative modeling techniques, such as video generation in place of image generation,” added Chen and Xu.
“We also plan to apply RoVi-Aug to existing datasets such as the Open-X Embodiment (OXE) dataset, and we are excited about the potential to improve the performance of general-purpose robotic policies trained on this data . Expanding RoVi-Aug’s capabilities could significantly improve the flexibility and robustness of these policies across a wider range of robots and tasks.
More information:
Lawrence Yunliang Chen et al, RoVi-Aug: Robot and Viewpoint Augmentation for Cross-incarnation Robot Learning, arXiv (2024). DOI: 10.48550/arxiv.2409.03403
arXiv
© 2024 Science X Network
Quote: A new data augmentation algorithm could facilitate the transfer of skills between robots (October 10, 2024) retrieved on October 10, 2024 from
This document is subject to copyright. Except for fair use for private study or research purposes, no part may be reproduced without written permission. The content is provided for informational purposes only.