Submitted:
22 May 2026
Posted:
25 May 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
- 1)
- Strip-Swift Pyramid Pooling Module (SSPPM): We propose an SSPPM for the effective multiscale representation of elongated branch structures. It replaces conventional parallel pyramid designs with a progressive pooling strategy while introducing a dedicated strip pooling mechanism for anisotropic context modeling. This design captures long-range directional continuity and improves computational efficiency by reducing GPU load imbalance, thus yielding consistent accuracy improvements across datasets.
- 2)
- Deployment-Friendly Backbone with Edge Refinement: We develop a high-efficiency backbone that integrates reparameterized golden cudgel blocks (GCBlocks) with a boundary-optimization module (BOM). The architecture leverages multibranch training for enhanced representation while collapsing into a streamlined single-path structure during inference, thus achieving improved boundary precision for slender targets without additional deployment overhead.
- 3)
- Topology-Guided Grasp Localization Framework: We propose a topology-guided grasp localization pipeline that bridges semantic perception and interaction-oriented geometric reasoning. By performing topology-aware skeleton pruning to remove structurally unstable regions and applying directional kernel-based extraction to identify geometrically consistent branch segments, the framework generates feasible grasp candidates aligned with stable interaction orientations. This design improves the structural reliability and temporal stability of grasp localization for onboard UAV branch interaction.
2. Existing Studies
- 1)
- Single-branch Architectures: Models such as STDC [31] and Fast-SCNN [32] prioritize inference efficiency through lightweight backbones and aggressive downsampling strategies. Although these architectures achieve high frame rates, they typically sacrifice spatial continuity in deeper layers, thus resulting in fragmented predictions for slender branch structures.
- 2)
- Multi-branch Architectures: Methods such as BiSeNet [33] and DDRNet [34] employ bilateral pathways to simultaneously preserve spatial details and semantic context. While these architectures improve boundary quality, their frequent feature-fusion operations and complex memory-access patterns may introduce additional computational overhead during deployment.
- 1)
- Inadequate multiscale representation for anisotropic branch structures – Existing lightweight segmentation models rely on isotropic square receptive fields, thus resulting in fragmented predictions for slender branches.
- 2)
- Boundary precision vs. deployment efficiency in dual-path architectures – Current dual-path models improve context but necessitate frequent fusion operations and incur memory overheads, thus rendering them inefficient in terms of boundary refinement for slender targets.
- 3)
- Inadequate topology-guided reasoning for grasp localization under dynamic flight – Most pipelines end at semantic recognition, which implies structural stability (e.g., bifurcations, curved segments) and temporal consistency are disregarded, thereby resulting in jittery and infeasible grasp candidates during UAV operations.
3. System Architecture and Workflow
- 1)
- Perception and Feature Refinement: To accommodate the slender and multiscale nature of tree branches, the system utilizes a lightweight backbone enhanced by the SSPPM. This stage yields a high-fidelity semantic mask that captures long-range directional context, while a BOM ensures sharp edge definition, thus providing a clean input for subsequent structural analysis.
- 2)
- Topology-Aware Structural Analysis: Because branch junctions and overlapping segments result in unstable perching, this module evaluates the skeletonized mask to identify potential collision or slip risks. By analyzing the local connectivity and transition density, the system removes complex topological singularities and forms “safety buffers,” thereby retaining only skeleton segments that offer a stable contact geometry.
- 3)
- Physical Scale-Based Candidate Extraction: This stage bridges topological features with executable grasping. A set of directional kernels designed based on the gripper’s physical dimensions is applied to the safe skeleton to extract continuous branch segments. By projecting pixel-level candidates into three-dimensional space using depth information, the system filters segments based on real-world metric constraints (e.g., length and width), thus yielding a set of physically feasible grasp boxes.
- 4)
- Spatiotemporal Decision Optimization: To mitigate perception fluctuations caused by UAV ego-motion and environmental noise, we introduce a temporal stabilization mechanism. It employs a SORT-inspired tracking module with a hybrid IoU–Euclidean metric to maintain consistent identity for grasp candidates. A distance-dependent scoring strategy is further implemented to adaptively transition between long-term stability and precise spatial alignment as the UAV approaches the target. Finally, an inertia-based switching constraint is applied to prevent jitter, thus ensuring a smooth and reliable grasp selection during dynamic flight.
4. Improved DDRNet
4.1. GCBlock
4.2. SSPPM
4.3. BOM
5. Topology-Guided Grasp Detection and Stabilization
5.1. Topology-Aware Structural Analysis
5.2. Physical Scale-Based Candidate Extraction
5.3. Spatio-Temporal Decision Optimization
- 1)
- Stable mode (large d): The system prioritizes and to lock onto large, consistently observed branches.
- 2)
- Centering mode (small d): The weight is increased to emphasize spatial centering, thus minimizing the landing offset for a precise grasp.
6. Experiment
6.1. Experimental Setup and Evaluation Metrics
6.1.1. Dataset Description
6.1.2. Experimental Setup and Implementation Details
6.1.3. Evaluation Metrics
6.2. Ablation and Comparative Experiments
6.2.1. Ablation Experiment
6.2.2. Model Comparison Experiment
6.3. Edge-Side Visual Validation
7. Conclusion and Future Endeavors
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Wu, D.; Yuan, X.; Guan, L. UAV intelligent forest inspection system based on computer vision. In Proceedings of the 2023 IEEE 3rd International Conference on Power, Electronics and Computer Applications (ICPECA), Beijing, China, 29–31 January 2023; pp. 1150–1154.
- Li, Q.; Fu, Y.; Qu, S. Research On Forest Resource Supervision Technology Based on Digital Twin UAV. In Proceedings of the 2024 IEEE 7th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Chongqing, China, 8–10 March 2024; pp. 1327–1330.
- Bi, Z.; Chi, J.; Zhang, W.J.; et al. A Proposal to Decouple Aerial Manipulation by Multi-Functional End-Effector. In Proceedings of the 2025 IEEE International Conference on Robotics and Biomimetics (ROBIO), Location, Date 2025; pp. 1751–1756.
- Nekoo, S.R.; Sanchez-Laulhe, E.; Durán, R.G.; et al. Increasing repeatability of the perching on branch for flapping-wing flying robot. In Proceedings of the 2024 International Conference on Unmanned Aircraft Systems (ICUAS), Chania, Greece, 4–7 June 2024; pp. 618–623.
- Hu, J.; Chen, P.; Xie, F.; et al. Design and experiment of a sloth-inspired UAV perching climbing grasping mechanism. In Proceedings of the 2022 IEEE International Conference on Robotics and Biomimetics (ROBIO), Jinghong, China, 5–9 December 2022; pp. 1283–1288.
- von Frankenberg, F.; Nokleby, S. Detection of long narrow landing features for autonomous UAV perching. In Proceedings of the 2020 11th IEEE Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), Vancouver, BC, Canada, 4–7 November 2020; pp. 0565–0570.
- Liu, Y.; Shen, J.; Zhai, C.; et al. A retinal vessel segmentation network with dual-stage network and vessel pixel emendation. IEEE Trans. Instrum. Meas. 2024, 74, 1–17.
- Fernandes, M.; et al. Grapevine Winter Pruning Automation: On Potential Pruning Points Detection through 2D Plant Modeling using Grapevine Segmentation. In Proceedings of the 2021 IEEE 11th Annual International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER), Jiaxing, China, 29 July–2 August 2021; pp. 13–18.
- Tong, S.; Zhang, J.; Li, W.; et al. An image-based system for locating pruning points in apple trees using instance segmentation and RGB-D images. Biosyst. Eng. 2023, 236, 277–286.
- Li, W.; Zhang, J.; Li, J.; et al. Unpaved road segmentation of UAV imagery via a global vision transformer with dilated cross window self-attention for dynamic map. Vis. Comput. 2025, 41, 1273–1291.
- Sun, L.; Yang, Y.; Yang, Z.; et al. DUCTNet: An effective road crack segmentation method in UAV remote sensing images under complex scenes. IEEE Trans. Intell. Transp. Syst. 2024, 25, 12682–12695.
- Cao, H.; Shen, J.; Zhang, Y.; et al. Proximal cooperative aerial manipulation with vertically stacked drones. Nature 2025, 646, 576–583.
- Kaya, Y.F.; Orr, L.; Kocer, B.B.; et al. Aerial additive manufacturing: Toward on-site building construction with aerial robots. Sci. Robot. 2025, 10, eado6251.
- Muthusamy, P.K.; Mohiuddin, M.B.; Peringal, A.; et al. Aerial manipulation of long objects using adaptive neuro-fuzzy controller under battery variability. Sci. Rep. 2025, 15, 10941.
- Kim, D.; Chang, D.E. An Onboard Integrated Perception and Control Framework for Autonomous Quadrotor UAV Perching on Markerless Hurdles. Drones 2026, 10, 270.
- Yin, X.; Wen, S.; Xie, J.; et al. Helical morphology-inspired bistable gripper for UAV upward perching and grasping in field environment. Bioinspir. Biomim. 2026, 21, 016015.
- Hamelin, P.; Dandurand, P.; Parkison, S.A.; et al. Shared Autonomy for Safe and Efficient Drone Landing and Takeoff on Power Lines. IEEE Trans. Field Robot. 2026, in press.
- Chen, C.; Yang, M.; Pu, H. Bionic bird claws enable UAV perching and landing. In Proceedings of the Journal of Physics: Conference Series; IOP Publishing: Bristol, UK, 2025; Volume 3032, p. 012045.
- Li, H.; Zhao, Z.; Wu, Z.; et al. Tendon-driven Grasper Design for Aerial Robot Perching on Tree Branches. In Proceedings of the 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Munich, Germany, 18–24 October 2025; pp. 21182–21188.
- Kariyawasam, S.A.; Saikot, M.H.; Cheng, B.; et al. A Hybrid Perching Mechanism for Aerial Robots. IEEE Robot. Autom. Lett. 2026, in press.
- D’Antonio, D.S.; Wu, T.; Bhattacharya, S.; et al. From Hitch to Lift: Autonomous Cable Interlacing by Multi-UAV Teams for Aerial Grasping and Transportation. IEEE Trans. Robot. 2026, in press.
- Yadav, R.D.; Jones, B.; Gupta, S.; et al. An integrated approach to aerial grasping: Combining a bistable gripper with adaptive control. IEEE/ASME Trans. Mechatron. 2025, in press.
- Albaroudi, M.; Alahmad, R.; Alraie, H.; et al. Estimation of Branch Geometry and Hierarchy in Orchard Trees for Robotic Pruning. J. Robot. Mechatron. 2026, 38, 495–512.
- Sun, H.; Wu, G.; Xu, H.; et al. Real-time detection and characterization of trunks and upright branches of pear trees for automatic dormant pruning. Precis. Agric. 2026, 27, 23.
- Kefalas, A.; Kalampokas, T.; Vrochidou, E.; et al. A vision-based pruning algorithm for cherry tree structure elements segmentation and exact pruning points determination. Comput. Electron. Agric. 2025, 237, 110735.
- Dukić, J.; Pejić, P.; Vidović, I.; et al. Towards Robotic Pruning: Automated Annotation and Prediction of Branches for Pruning on Trees Reconstructed Using RGB-D Images. Sensors 2025, 25, 5648.
- Tong, S.; Wang, J.; Zhang, J.; et al. An apple tree pruning robot system based on branch segmentation and decision-making control. Artif. Intell. Agric. 2026, in press.
- Li, Y.; Han, J.; Li, H.; et al. Branch-YOLO: An efficient object detector for thin structure objects like pantograph. Digit. Signal Process. 2025, 162, 105121.
- Fathi, N. EDFNet: Early Fusion of Edge and Depth for Thin-Obstacle Segmentation in UAV Navigation. arXiv 2026, arXiv:2604.09694.
- Bilal, H.; Bendechache, M.; Direkoglu, C. Optimized KiU-Net: A Convolutional Autoencoder for Retinal Vessel Segmentation in Medical Images. IEEE Access 2025, 14, 2784–2799.
- Fan, M.; Lai, S.; Huang, J.; et al. Rethinking bisenet for real-time semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 9716–9725.
- Poudel, R.P.K.; Liwicki, S.; Cipolla, R. Fast-scnn: Fast semantic segmentation network. arXiv 2019, arXiv:1902.04502.
- Yu, C.; Wang, J.; Peng, C.; et al. Bisenet: Bilateral segmentation network for real-time semantic segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 325–341.
- Pan, H.; Hong, Y.; Sun, W.; et al. Deep dual-resolution networks for real-time and accurate semantic segmentation of traffic scenes. IEEE Trans. Intell. Transp. Syst. 2022, 24, 3448–3460.
- Hou, Q.; Zhang, L.; Cheng, M.M.; et al. Strip pooling: Rethinking spatial pooling for scene parsing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 4003–4012.
- Yang, G.; Wang, Y.; Shi, D.; et al. Golden cudgel network for real-time semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Location, Date 2025; pp. 25367–25376.
- Zhen, M.; Wang, J.; Zhou, L.; et al. Joint semantic segmentation and boundary detection using iterative pyramid contexts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 13666–13675.
- Lin, T.Y.; Goyal, P.; Girshick, R.; et al. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988.
- Yang, T.; Zhou, S.; Huang, Z.; Xu, A.; Ye, J.; Yin, J. Tree Dataset of Urban Street: Branch. Available online: https://ytt917251944.github.io/dataset_jekyll/ (accessed on 12 May 2026).
- Chen, L.C.; Papandreou, G.; Schroff, F.; et al. Rethinking atrous convolution for semantic image segmentation. arXiv 2017, arXiv:1706.05587.
- Chen, L.C.; Zhu, Y.; Papandreou, G.; et al. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818.
- Howard, A.; Sandler, M.; Chu, G.; et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019; pp. 1314–1324.
- Yu, C.; Gao, C.; Wang, J.; et al. Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation. Int. J. Comput. Vis. 2021, 129, 3051–3068.








| Model Training Workstation | |||
|---|---|---|---|
| CPU | Intel Core i9-14900K | Optimizer | SGD |
| GPU | NVIDIA RTX 4080 | Batch Size | 4 |
| CUDA | 12.1 | PyTorch | 2.5.1 |
| Python | 3.10 | Epochs | 800 |
| Edge Inference Platform (UAV Onboard) | |||
| Device | Jetson Orin Nano | CUDA | 11.4 |
| JetPack | 5.1.3 | PyTorch | 2.1 |
| Python | 3.8 | Precision | FP16 |
| Model | Efficiency | Drone-Branch | UrbanStreet | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| a | b | c | Params(T/D)/M | FPSPC | latencyJet/ms | mIoU | mAcc | aAcc | mIoU | mAcc | aAcc |
| 5.73 / 5.73 | 127.9 | 14.135 | 87.64 | 94.57 | 99.00 | 85.00 | 91.70 | 97.12 | |||
| ✓ | 20.69 / 9.21 | 149.6 | 13.734 | 87.58 | 94.74 | 98.97 | 84.91 | 91.61 | 96.83 | ||
| ✓ | 5.61 / 5.61 | 186.9 | 13.877 | 89.11 | 95.53 | 99.13 | 85.83 | 92.38 | 97.29 | ||
| ✓ | 5.73 / 5.73 | 126.4 | 14.15 | 87.91 | 95.44 | 98.99 | 89.20 | 95.74 | 97.95 | ||
| ✓ | ✓ | 20.57 / 9.09 | 214.4 | 13.752 | 88.43 | 95.49 | 99.05 | 84.95 | 92.07 | 96.94 | |
| ✓ | ✓ | 5.61 / 5.61 | 188.3 | 13.953 | 89.96 | 95.73 | 99.12 | 89.89 | 95.96 | 98.11 | |
| ✓ | ✓ | ✓ | 20.57 / 9.09 | 221.4 | 13.769 | 89.94 | 95.62 | 99.05 | 89.81 | 95.91 | 98.09 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).