Submitted:
03 June 2026
Posted:
03 June 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
- We propose a cooperative compensatory multimodal sample selection framework (CC-MBS) for UAV swarm perception under modality missingness and dynamic environments, which unifies modality quality modeling, cross-UAV collaboration, and sample selection into a single framework, significantly improving robustness and adaptability;
- We develop a unified modality confidence modeling method that characterizes modality reliability from multiple aspects, including modality missingness, signal degradation, and cross-modal asynchrony, providing an interpretable and computable basis for both cooperative compensation and sample valuation;
- We design a cooperative compensatory modality-balanced sample selection strategy, which effectively alleviates performance degradation and catastrophic forgetting under incomplete modalities. Experiments conducted on small-scale UAV swarms (2–5 agents) demonstrate that the proposed method achieves strong stability and effectiveness across various modality missing rates.
2. Related Work
2.1. Cooperative Learning in UAV Swarms
2.2. Modality Missingness and Degradation in Multimodal Perception
2.3. Existing Methods for Robustness to Modality Missingness
3. Basic Model
3.1. Multimodal Incremental Learning in UAV Swarms
3.2. Modality Missingness and Degradation Modeling
3.2.1. Modality Missingness Rate
3.2.2. Modality Degradation Level
3.2.3. Modality Asynchrony Rate
3.2.4. Modality Confidence Vector
4. Main Method
4.1. Multimodal Collaborative Compensation Mechanism Under UAV Swarms
| Algorithm 1: Multimodal Collaborative Compensation Mechanism |
|
Input: - Local modality confidence ; - neighbor set ; - thresholds ; - compensation strength ; - constant . Output: - Compensated confidence vector . Steps:
|
4.2. Compensatory Collaborative Modality-Balanced Sample Selection (CC-MBS)
| Algorithm 2: Compensatory Collaboration Modality-Balanced Sample Selection |
|
Input: - Batch data ; - Replay buffer with capacity ; - Modality encoders ; - Class prototypes ; - Compensated confidence vectors ; - Constant . Output: - Updated replay buffer . Steps:
|
5. Evaluation
5.1. Experimental Setup
5.1.1. Hardware, System Setup, and Datasets
5.1.2. Experimental Scenarios
5.2. Experiment A: Collaborative Compensation Under Dual-UAV Setting
5.2.1. Baseline Performance Under Single-Modality Missingness
- Pretraining stage (20 epochs): Used to compute the MBS score for each sample, simulating early-stage estimation of modality balance;
- Formal training stage (100 epochs): Conducted using the selected samples to allow the model to approach convergence.
5.2.2. Evaluation of Collaborative Compensation Under Dual-UAV Setting
- Modality missing ratios are set to 10%, 30%, 50%, and 70% for both audio and visual modalities;
- UAV is assigned modality missingness, where missing samples are distributed across the time window;
- UAV maintains complete modalities and serves as the collaborative reference;
- Both UAVs are assigned identical modality asynchrony factors to simulate realistic sensing and communication delays.
5.3. Experiment B: Collaborative Compensation Under Multi-UAV Setting
5.3.1. Baseline Performance of CC-MBS Under Multi-UAV Setting
- Sample pruning ratio is fixed at 20%;
- Modality missing ratios are set to 30% and 50% on the dominant modality;
- Missingness is applied to UAV , while other neighboring UAVs are assumed to have complete modalities.
- As the neighborhood expands, the modality information from different UAVs may differ in quality and distribution, introducing potential noise accumulation or redundant interference that weakens the compensation effectiveness;
- In the current collaborative mechanism, the contribution from each node is weighted primarily by modality confidence. When the neighborhood size grows large, weight distribution may become more diffuse, diluting the contribution of high-quality nodes and reducing the precision of compensation.
5.3.2. Performance Comparison of Modality Missingness Compensation
- AVG aggregation: Average the model parameters from multiple nodes to obtain the global model for performance evaluation;
- PFM aggregation: Weight the aggregation based on each node's local performance and compute a weighted average for the global model;
- POW aggregation: Assign higher aggregation weights to nodes with stronger local performance (e.g., 60%/40% split for 2 UAVs) when calculating the global model.
5.4. Experiment C: Evaluation of Sample Selection
5.5. Comprehensive Analysis
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| UAV | Unmanned aerial vehicle |
| MBS | Modality Balance Score |
| CC-MBS | Compensatory Collaboration Modality-Balanced (Sample) Selection framework |
| AVG | Average Aggregation Method |
| PFM | Performance-based Aggregation Method |
| POW | Power-based Aggregation Method |
References
- Xu, Y.; Chen, B.; Hu, F.; et al. MBS: A Modality-Balanced Strategy for Multimodal Sample Selection[J]. Mach. Learn. Knowl. Extr. 2026, 8(1), 17. [Google Scholar] [CrossRef]
- Zhang, H.; Hanzo, L. Federated learning assisted multi-UAV networks[J]. IEEE Trans. Veh. Technol. 2020, 69(11), 14104–14109. [Google Scholar] [CrossRef]
- He, G.; Li, C.; Song, M.; et al. A hierarchical federated learning incentive mechanism in UAV-assisted edge computing environment[J]. Ad. Hoc Netw. 2023, 149, 103249. [Google Scholar] [CrossRef]
- Tong, Z.; Wang, J.; Hou, X.; et al. Blockchain-based trustworthy and efficient hierarchical federated learning for UAV-enabled IoT networks[J]. IEEE Internet Things J. 2024, 11(21), 34270–34282. [Google Scholar] [CrossRef]
- Wang, Z.; Cheng, P.; Chen, M.; et al. Drones help drones: A collaborative framework for multi-drone object trajectory prediction and beyond[J]. Adv. Neural Inf. Process. Syst. 2024, 37, 64604–64628. [Google Scholar]
- Lin, Z.; Chen, W.; Jin, X.; et al. MCOP: Multi-UAV Collaborative Occupancy Prediction[C]. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025; pp. 27242–27251. [Google Scholar]
- Bocheng, Z.; Mingying, H. U. O.; Zheng, L. I.; et al. Graph-based multi-agent reinforcement learning for collaborative search and tracking of multiple UAVs[J]. Chin. J. Aeronaut. 2025, 38(3), 103214. [Google Scholar]
- Havaei, M.; Guizard, N.; Chapados, N.; et al. Hemis: Hetero-modal image segmentation[C]//International conference on medical image computing and computer-assisted intervention; Springer International Publishing: Cham, 2016; pp. 469–477. [Google Scholar]
- Ma, M.; Ren, J.; Zhao, L.; et al. Smil: Multimodal learning with severely missing modality[C]//Proceedings of the AAAI conference on artificial intelligence. 2021, 35(3), 2302–2310. [Google Scholar]
- Zhao, J.; Li, R.; Jin, Q. Missing modality imagination network for emotion recognition with uncertain missing modalities[C]. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing 2021, Volume 1, 2608–2618. [Google Scholar]
- Poklukar, P.; Vasco, M.; Yin, H.; et al. Geometric multimodal contrastive representation learning[C]//International Conference on Machine Learning. PMLR 2022, 17782–17800. [Google Scholar]
- Lee, K.; Lee, S.; Hahn, S.; et al. Learning missing modal electronic health records with unified multi-modal data embedding and modality-aware attention[C]//Machine Learning for Healthcare Conference. PMLR 2023, 423–442. [Google Scholar]
- Liu, H.; Wei, D.; Lu, D.; et al. M3AE: multimodal representation learning for brain tumor segmentation with missing modalities[C]//Proceedings of the AAAI conference on artificial intelligence. 2023, 37(2), 1657–1665. [Google Scholar]
- Lin, R.; Hu, H. Missmodal: Increasing robustness to missing modality in multimodal sentiment analysis[J]. Trans. Assoc. Comput. Linguist. 2023, 11, 1686–1702. [Google Scholar] [CrossRef]
- Li, M.; Yang, D.; Liu, Y.; et al. Toward robust incomplete multimodal sentiment analysis via hierarchical representation learning[J]. Adv. Neural Inf. Process. Syst. 2024, 37, 28515–28536. [Google Scholar]
- Wang, H.; Chen, Y.; Ma, C.; et al. Multi-modal learning with missing modality via shared-specific feature modelling[C]. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023; pp. 15878–15887. [Google Scholar]
- Lee, Y. L.; Tsai, Y. H.; Chiu, W. C.; et al. Multimodal prompting with missing modalities for visual recognition[C]. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023; pp. 14943–14952. [Google Scholar]
- Hu, L.; Shi, T.; Feng, W.; et al. Deep correlated prompting for visual recognition with missing modalities[J]. Adv. Neural Inf. Process. Syst. 2024, 37, 67446–67466. [Google Scholar]
- Li, M.; Yang, D.; Lei, Y.; et al. A unified self-distillation framework for multimodal sentiment analysis with uncertain missing modalities[C]//Proceedings of the AAAI conference on artificial intelligence. 2024, 38(9), 10074–10082. [Google Scholar]
- Cao, H.; Cooper, D. G.; Keutmann, M. K.; et al. CREMA-D: Crowd-sourced Emotional Multimodal Actors Dataset[J]. IEEE Trans. Affect. Comput. 2014, 5(4), 377–390. [Google Scholar] [CrossRef] [PubMed]
- Tian, Y.; Shi, J.; Li, B.; et al. Audio-Visual Event Localization in Unconstrained Videos[C]//Proceedings of the European Conference on Computer Vision (ECCV); Springer: Cham, 2018; pp. 247–263. [Google Scholar]
- Paul, M.; Ganguli, S.; Dziugaite, G. K. Deep learning on a data diet: Finding important examples early in training[C]. Adv. Neural Inf. Process. Syst. (NeurIPS) 2021, 34, 20596–20607. [Google Scholar]













| Comparison Dimension | ShaSpec | CC-MBS |
| Core Idea | Shared-specific feature decomposition | Modality confidence-driven collaborative compensation |
| Information Source | Single node internal | Across UAV nodes |
| Handling Modality Missingness | Relies on shared features to fill gaps | Compensation via neighboring UAVs |
| Compensation Layer | Feature representation layer | Data selection layer + cross-node information layer |
| Modality Quality Consideration | No (implicit modeling) | Yes (explicit modeling of modality confidence) |
| Sensitivity to Noise | Ordinary | Reduced (sample selection) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).