Submitted:
04 November 2024
Posted:
05 November 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
- Time per grasp. We use the full CPU in order to optimise the grasp generation process, making it around 18 times faster than their approach, without loosing grasp accuracy.
- Similarity to manually defined grasp poses. The grasps estimated by our system are more similar to a manually defined ground truth.
- System requirements. Our system runs on CPU and adapts to the amount of threads this CPU has. It does not need a dedicated RTX-Enabled GPU.
2. System Architecture
2.1. GPD Integration
- Initial grasp sampler service: This service receives the object model (CAD), as well as the information about the gripper and the fingers, this is, the height, width and depth of the fingers, the distance between them and the grasp mode it allows, either inner grasps, outer grasps or both. With all this information, it produces the grasping point candidates, each including the 6D pose of the object with respect to the gripper. Originally, GPD returned the pose of the gripper in the object’s coordinate system, but we inverted the pose to obtain the pose of the object in gripper coordinates in order to ease the validation process in the simulation. Finally, more information about the initial candidate estimation is presented later in this very section.
- File generation service: The service receives as input a list of 6D grasping poses after being validated in the simulator, as well as the list of poses "discovered" in the validation process and outputs a confirmation message, finally generating a XML file containing the validated grasps. The term of "discovered" grasping poses is further explored in Section 3.
2.2. Simulation Environment
- All tools have been oriented and translated in order to make their closed fingertips touch the zero of the world coordinates, as seen in Figure 4.
- Each gripper features a MuJoCo body called "g_ref" defined as a child of the base of the gripper. This is done in order to have a reference point for the gripper with a standarised name.
- The grippers have had their finger actuator renamed to "finger_actuator" and the control range remapped to the 0 to 255 range in order to control it via script without the need of adapting the simulation to each actuator.
- Each tool now includes an additional joint and actuator (with a standarised name) that lifts the gripper vertically 10 cm. The modeling of the joint and actuator is based on the "lift" modeled in the "Hello Robot Stretch 3", available in the menagerie as well.
- Headless Single-Thread (HST) mode: The simulation runs fully on a single thread, going as fast as that thread allows thanks to being run in headless mode. This alternative also implements the option to record videos of the validation process, useful to debug the process. The video recording function renders and saves the frames for each iteration, thus consuming RAM and computational power, and overall, reducing notably the speed of the validation process, but still achieving faster than real (FTR) times.
- Headless Multi-Thread (HMT) mode: The simulation uses all the available threads in parallel. To achieve parallelisation, the system divides the list of grasp candidates into as many groups as threads the CPU has. Then, and using Python’s multiprocessing library, each group is loaded into a thread. The information of each grasp candidate is stored in a queue, and finally merged together into a single list. As this mode is used to make the validation process go as fast as possible, the parallelised culling phase does not support video generation. This mode is the go-to option for fast grasp validation, as it is the mode that achieves the fastest FTR time.
- Real-Time Interactive mode: This is the only mode that runs on real-time and does not implement the video saving tool. As it runs on real-time, it is not fit to validate a vast amount of grasps. For reference, a validation of 20k grasp candidates on this mode could take a maximum of approximately 16 hours and 36 minutes. This mode should only be used for debugging purposes.
3. Validation Method
3.1. The Validation
- Falling: The piece falls. In this case, the iteration is terminated the moment the piece moves certain distance away from the gripper, and the score assigned to this candidate is 0.
- Bad grasp: The candidate does not fall, but the movement inside the piece is high enough to score a bad grasp quality estimation. In this case the grasp is culled from the candidates.
- Good grasp: The candidate was good enough not only not to fall, but also to achieve better score than the threshold.
- Discovery grasp: During the development of the system, we observed that there where some grasps that started with a lot of movement, but ended completely still and stable. For example, if we pick a rectangle from the side, with a certain amount of inclination, the rectangle will rotate to be parallel to the gripper. The initial grasp was incorrect, but, if we consider this pose as a grasp, it is actually good. To search for these cases, we also calculate the score of the object taking into account only the poses starting from the half of the operative. We also define a more strict score threshold to this score. If the grasp is good enough, we record the pose of the object with respect to the gripper in the last step and add it to the list of grasps that passed the culling phase.
3.2. The Metric
- : Is the weight applied to the rotation score, bounded between 0 and 1 inclusive.
- : Is the score calculated for the translation.
- : Is the weight applied to the rotation score bounded between 0 and 1 inclusive.
- : Is the score calculated for the rotation.
- : The mean translation between all the consecutive recorded poses.
- and : Two consecutive vectors that contains the euclidean coordinates of the grasp pose.
- n: The number of poses stored.
- : Is the upper bound of the translation. Values greater than this should be zero. This parameter has been defined iteratively trying and analysing the results.
- : Mean of orientation distances between al consecutive recorded quaternions.
- and : Two consecutive quaternions storing the rotational information of their respective grasp pose.
- n: The number of poses stored.
- : Is the upper bound of the rotation. Values greater than this are set to zero. This parameter has been defined iteratively trying and analysing the results.
4. Experimentation
4.1. MultiGripperGrasp
4.2. The Objects
4.3. Modeling the Experiments
- Validation at the real system: The most important part is to evaluate the quality of the grasps based on the capability of the robot to pick the objects from that point. To validate the grasps we have used the Robotiq 2f85 two finger gripper, due to the lack of availability of a Franka Panda Hand. To carry out the validation, our system did not require any kind of adjustment, as we estimate the grasps with respect to the grippers closed fingertips, but we took into account the difference between the dimensions of the Robotiq 2f85 and the Franka Panda Hand in order to validate correctly MGG, and modified the grasping points in order to apply this difference. This transformation is required as the validation application estimates the grasping point with respect to the tip of the fingers, while MGG returns the pose with respect to the gripper base. The grasping points have not been further modified. Then, we programmed in the UR10 a script that given an object, finds that object in the scene and picks the object from the defined grasping point. As validating each proposed grasping point for each object is not viable, we randomly sampled 10 grasps for each object and carried out this validation for each one of those grasps. Note-worthily, regarding the sampling process on the MGG dataset, we only sampled 10 grasps from the grasps that where hold more or equal to 3 seconds, this is, their theoretic best grasps. This adds up to 200 grasps in total, a hundred from each approach. As both systems estimated the grasping poses for the objects when they where isolated, if the robot could not find a collision-free trajectory to the object due to it colliding with the table, we helped the robot by elevating the object and thus isolating it from the environment. The experiment itself is straightforward; the robot goes to the defined grasping pose and lifts it, holding it in place for 3 seconds. Then, evaluate if the piece has been correctly picked or not, marking as successful only the grasps that manage to stay in grasp during the 3 second period.
- Comparison with manually defined grasps: We also compare the grasps obtained from both methods with a ground truth in the form of manually defined grasps to try to quantify how much distance would there be between the ground truth and the systems’ outputs, as we argue that the grasps defined manually by the operators tend to be near-optimal grasps for each object, as they are validated on the spot. Adding to that, the manually defined grasps cover the entire graspable area of the object. Thus, comparing the obtained grasps with the ground truth will also tell the sparsity of the grasps proposed by the system.
- Grasp centering: We measure if the grasps are actually centered on the piece (see Section 5.3), as in some applications it is critical that the grasp is centered on the object, not to hit nor move the objects besides the one that is going to be picked.
- Time per grasp: We analyse the system in order to measure how much time is needed to obtain a valid grasp.
4.4. Hardware
5. Results
5.1. Real Robot Validation
5.2. Similarity with Real World Grasps
- S: The similarity between two grasps.
- : The weight given to the displacement or translation similarity. In this case it is 0.7.
- : The weight assigned to the rotational similarity. In this case is set to 0.3.
- : The translation distance between two poses. Calculated using the euclidean distance.
- : The maximum euclidean distance two poses can have. In this case is 8 cm (the maximum size an object can have).
- : The rotational distance between two poses. It compares two quaternions9, taking a value between 0 and 1, where 0 is the same quaternion and 1 the adverse.
5.3. Grasp Centering
5.3.1. Time per Grasp
6. Conclusions and Further Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Kleeberger, K.; Bormann, R.; Kraus, W.; Huber, M.F. A survey on learning-based robotic grasping. Current Robotics Reports 2020, 1, 239–249. [Google Scholar] [CrossRef]
- Xie, Z.; Liang, X.; Roberto, C. Learning-based robotic grasping: A review. Frontiers in Robotics and AI 2023, 10, 1038658. [Google Scholar] [CrossRef] [PubMed]
- Todorov, E.; Erez, T.; Tassa, Y. MuJoCo: A physics engine for model-based control. 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2012, pp. 5026–5033. [CrossRef]
- Mittal, M.; Yu, C.; Yu, Q.; Liu, J.; Rudin, N.; Hoeller, D.; Yuan, J.L.; Singh, R.; Guo, Y.; Mazhar, H.; Mandlekar, A.; Babich, B.; State, G.; Hutter, M.; Garg, A. Orbit: A Unified Simulation Framework for Interactive Robot Learning Environments. IEEE Robotics and Automation Letters 2023, 8, 3740–3747. [Google Scholar] [CrossRef]
- Robotics, U. Unity Robotics Hub. https://github.com/Unity-Technologies/Unity-Robotics-Hub, 2022.
- Coumans, E.; Bai, Y. PyBullet, a Python module for physics simulation for games, robotics and machine learning. http://pybullet.org, 2016–2021.
- Zhang, L.; Bai, K.; Li, Q.; Chen, Z.; Zhang, J. A Collision-Aware Cable Grasping Method in Cluttered Environment. 2024 IEEE International Conference on Robotics and Automation (ICRA), 2024, pp. 2126–2132. [CrossRef]
- Ni, P.; Zhang, W.; Zhu, X.; Cao, Q. Pointnet++ grasping: Learning an end-to-end spatial grasp generation algorithm from sparse point clouds. 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2020, pp. 3619–3625.
- Tian, H.; Song, K.; Li, S.; Ma, S.; Xu, J.; Yan, Y. Data-driven robotic visual grasping detection for unknown objects: A problem-oriented review. Expert Systems with Applications 2023, 211, 118624. [Google Scholar] [CrossRef]
- Zhai, D.H.; Yu, S.; Xia, Y. FANet: fast and accurate robotic grasp detection based on keypoints. IEEE Transactions on Automation Science and Engineering 2023. [Google Scholar] [CrossRef]
- Lenz, I.; Lee, H.; Saxena, A. Deep learning for detecting robotic grasps. The International Journal of Robotics Research 2015, 34, 705–724. [Google Scholar] [CrossRef]
- Depierre, A.; Dellandréa, E.; Chen, L. Jacquard: A large scale dataset for robotic grasp detection. 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2018, pp. 3511–3516.
- Zheng, L.; Ma, W.; Cai, Y.; Lu, T.; Wang, S. GPDAN: Grasp pose domain adaptation network for sim-to-real 6-DoF object grasping. IEEE Robotics and Automation Letters 2023, 8, 4585–4592. [Google Scholar] [CrossRef]
- Fang, H.S.; Wang, C.; Gou, M.; Lu, C. Graspnet-1billion: A large-scale benchmark for general object grasping. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11444–11453.
- Eppner, C.; Mousavian, A.; Fox, D. ACRONYM: A Large-Scale Grasp Dataset Based on Simulation. Under Review at ICRA 2021, 2020. [Google Scholar]
- Deng, X.; Xiang, Y.; Mousavian, A.; Eppner, C.; Bretl, T.; Fox, D. Self-supervised 6d object pose estimation for robot manipulation. 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2020, pp. 3665–3671.
- Eppner, C.; Mousavian, A.; Fox, D. A billion ways to grasp: An evaluation of grasp sampling schemes on a dense, physics-based grasp data set. The International Symposium of Robotics Research. Springer, 2019, pp. 890–905.
- Kleeberger, K.; Völk, M.; Moosmann, M.; Thiessenhusen, E.; Roth, F.; Bormann, R.; Huber, M.F. Transferring experience from simulation to the real world for precise pick-and-place tasks in highly cluttered scenes. 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2020, pp. 9681–9688.
- Rohmer, E.; Singh, S.P.; Freese, M. V-REP: A versatile and scalable robot simulation framework. 2013 IEEE/RSJ international conference on intelligent robots and systems. IEEE, 2013, pp. 1321–1326.
- Bauza, M.; Bronars, A.; Hou, Y.; Taylor, I.; Chavan-Dafle, N.; Rodriguez, A. SimPLE, a visuotactile method learned in simulation to precisely pick, localize, regrasp, and place objects. Science Robotics 2024, 9, eadi8808. [Google Scholar] [CrossRef] [PubMed]
- Eppner, C.; Mousavian, A.; Fox, D. ACRONYM: A Large-Scale Grasp Dataset Based on Simulation. 2021 IEEE International Conference on Robotics and Automation (ICRA), 2021, pp. 6222–6227. [CrossRef]
- Casas, L.F.; Khargonkar, N.; Prabhakaran, B.; Xiang, Y. MultiGripperGrasp: A Dataset for Robotic Grasping from Parallel Jaw Grippers to Dexterous Hands. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2024.
- Savva, M.; Chang, A.X.; Hanrahan, P. Semantically-enriched 3D models for common-sense knowledge. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2015, pp. 24–31.
- Macklin, M.; Müller, M.; Chentanez, N.; Kim, T.Y. Unified particle physics for real-time applications. ACM Transactions on Graphics (TOG) 2014, 33, 1–12. [Google Scholar] [CrossRef]
- Downs, L.; Francis, A.; Koenig, N.; Kinman, B.; Hickman, R.; Reymann, K.; McHugh, T.B.; Vanhoucke, V. Google scanned objects: A high-quality dataset of 3d scanned household items. 2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 2553–2560.
- Calli, B.; Singh, A.; Bruce, J.; Walsman, A.; Konolige, K.; Srinivasa, S.; Abbeel, P.; Dollar, A.M. Yale-CMU-Berkeley dataset for robotic manipulation research. The International Journal of Robotics Research 2017, 36, 261–268. [Google Scholar] [CrossRef]
- Miller, A.T.; Allen, P.K. Graspit! a versatile simulator for robotic grasping. IEEE Robotics & Automation Magazine 2004, 11, 110–122. [Google Scholar]
- Corporation, N. NVIDIA Isaac Sim, 2024.
- Ten Pas, A.; Gualtieri, M.; Saenko, K.; Platt, R. Grasp pose detection in point clouds. The International Journal of Robotics Research 2017, 36, 1455–1473. [Google Scholar] [CrossRef]
- Morrison, D.; Corke, P.; Leitner, J. Egad! an evolved grasping analysis dataset for diversity and reproducibility in robotic manipulation. IEEE Robotics and Automation Letters 2020, 5, 4368–4375. [Google Scholar] [CrossRef]
- Mahler, J.; Liang, J.; Niyaz, S.; Laskey, M.; Doan, R.; Liu, X.; Ojea, J.A.; Goldberg, K. Dex-net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics. arXiv preprint, 2017; arXiv:1703.09312. [Google Scholar]
- Nielsen, J. Usability engineering; Morgan Kaufmann, 1994.
- Collins, J.; Brown, R.; Leitner, J.; Howard, D. Traversing the Reality Gap via Simulator Tuning. Proceedings of the Australasian Conference on Robotics and Automation (ACRA 2021). Australian Robotics and Automation Association (ARAA), 2021, pp. 1–10.
| 1 | Version 3.1.5 of MuJoCo |
| 2 | V-HACD accessible in GitHub here. |
| 3 | CoACD accessible in GitHub here. |
| 4 | MuJoCo menagerie accessible in GitHub here. |
| 5 | In the case of opening grasping candidates, order is inverted. First the fingers are closed, then the piece is set into position and finally the order to open the fingers is sent. |
| 6 | The distance has been implemented following this approach. |
| 7 | Nvidia Omniverse documentation page accessible here. |
| 8 | Phrasing the documentation of Nvidia, "More RAM and VRAM is recomended for advanced usage of IsaacSim." |
| 9 | The comparison between two quaternions is described here. |
| 10 | GraspIt! does the estimation at the same time for all objects, while their IsaacSim algorithm validates the objects sequentially using multi-stations. |












| Grasp # | Time (s) | Time per grasp | |
|---|---|---|---|
| MGG | 8712 | 22970 | 2.6372 s/grasp |
| Ours | 10180 | 1514 | 0.1487 s/grasp |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).