Submitted:
05 October 2025
Posted:
07 October 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- We introduce Distance-Measured Data Mixing (DM2) Active Learning, a novel deep active learning framework that estimates sample uncertainty through distance-weighted mixing of data samples. By exploiting inter-sample relationships and distributional structure, this method selects informative instances across the data manifold, including near-boundary regions, thereby enhancing the diversity of queried samples.
- To address noise susceptibility in challenging scenarios, we augment Distance-Measured Data Mixing with adversarial training (DM2-AT). We generate fast-gradient adversarial samples for selected near-boundary instances and train them jointly with the original data, improving model robustness and generalization performance under complex data distributions.
- Comprehensive experiments across diverse tasks, model architectures, and data modalities demonstrate that our method achieves superior performance while significantly reducing labeling requirements compared to existing approaches.
2. Related Work
2.1. Uncertainty Estimation
2.2. Diversity Improvement
3. Distance-Measured Data Mixing
3.1. Formal Definition
3.2. Feature Extraction
3.3. Distance Measured
3.4. Linear Data Mixing
| Algorithm 1 Distance-Measured Data Mixing Active Learning |
|
4. Adversarial Training for Boundary Data Feature Fusion
4.1. FGSM Confrontation Training
4.2. Adversarial Training for Sample Selection
| Algorithm 2 Distance-Measured Data Mixing with Adersarial Training |
|
5. Theoretical Analysis
5.1. Notation and Setup
5.2. Geometric Rationale for Distance-Weighted Mixing
5.3. Uncertainty Amplification Near Class Boundaries
5.4. Diversity Preservation via Distance Coupling
5.5. Stability of Mixed Confidence Under Neighbor Noise
5.6. Choice of the Combined Distance
5.7. Acquisition Optimality Under a Localized Fisher Criterion
5.8. On the Mixing Coefficient
5.9. Considerations and Summary
6. Experimental Results
6.1. Efficiency for Distance-Measure Data Mixing
6.2. Robustness for Adversarial Training
6.3. Convergence Analysis
6.4. Time Efficiency
6.5. Ablation Experiment
7. Conclusion
Acknowledgments
References
- Parvaneh, A.; Abbasnejad, E.; Teney, D.; Haffari, G.R.; Van Den Hengel, A.; Shi, J.Q. Active learning by feature mixing. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 12237–12246.
- Munjal, P.; Hayat, N.; Hayat, M.; Sourati, J.; Khan, S. Towards robust and reproducible active learning using neural networks. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 223–232.
- Settles, B. Active learning literature survey. Technical report, 2009.
- Gal, Y.; Islam, R.; Ghahramani, Z. Deep bayesian active learning with image data. In Proceedings of the International conference on machine learning. PMLR; 2017; pp. 1183–1192. [Google Scholar]
- Lewis, D.D. A sequential algorithm for training text classifiers: Corrigendum and additional data. In Proceedings of the Acm Sigir Forum. ACM New York, NY, USA, Vol. 29; 1995; pp. 13–19. [Google Scholar]
- Kremer, J.; Steenstrup Pedersen, K.; Igel, C. Active learning with support vector machines. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 2014, 4, 313–326. [Google Scholar] [CrossRef]
- Gal, Y.; Ghahramani, Z. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of the international conference on machine learning. PMLR; 2016; pp. 1050–1059. [Google Scholar]
- Freund, Y.; Seung, H.S.; Shamir, E.; Tishby, N. Selective sampling using the query by committee algorithm. Machine learning 1997, 28, 133–168. [Google Scholar] [CrossRef]
- Gorriz, M.; Carlier, A.; Faure, E.; Giro-i Nieto, X. Cost-effective active learning for melanoma segmentation. arXiv preprint arXiv:1711.09168, arXiv:1711.09168 2017.
- Ducoffe, M.; Precioso, F. Adversarial active learning for deep networks: a margin based approach. arXiv preprint arXiv:1802.09841, arXiv:1802.09841 2018.
- Mayer, C.; Timofte, R. Adversarial sampling for active learning. In Proceedings of the Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2020, pp. 3071–3079.
- Sener, O.; Savarese, S. Active learning for convolutional neural networks: A core-set approach. arXiv preprint arXiv:1708.00489, 2017; arXiv:1708.00489 2017. [Google Scholar]
- Caramalau, R.; Bhattarai, B.; Kim, T.K. Sequential graph convolutional network for active learning. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 9583–9592.
- Yoo, D.; Kweon, I.S. Learning loss for active learning. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 93–102.
- Liu, Z.; Ding, H.; Zhong, H.; Li, W.; Dai, J.; He, C. Influence selection for active learning. In Proceedings of the Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 9274–9283.
- Wang, T.; Li, X.; Yang, P.; Hu, G.; Zeng, X.; Huang, S.; Xu, C.Z.; Xu, M. Boosting active learning via improving test performance. In Proceedings of the Proceedings of the AAAI Conference on Artificial Intelligence, 2022, Vol. 36, pp. 8566–8574.
- Koh, P.W.; Liang, P. Understanding black-box predictions via influence functions. In Proceedings of the International conference on machine learning. PMLR; 2017; pp. 1885–1894. [Google Scholar]
- Sinha, S.; Ebrahimi, S.; Darrell, T. Variational adversarial active learning. In Proceedings of the Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 5972–5981.
- Ash, J.T.; Zhang, C.; Krishnamurthy, A.; Langford, J.; Agarwal, A. Deep batch active learning by diverse, uncertain gradient lower bounds. arXiv preprint arXiv:1906.03671, 2019; arXiv:1906.03671 2019. [Google Scholar]
- Li, X.; Yang, P.; Gu, Y.; Zhan, X.; Wang, T.; Xu, M.; Xu, C. Deep active learning with noise stability. In Proceedings of the Proceedings of the AAAI Conference on Artificial Intelligence, 2024, Vol. 38, pp. 13655–13663.
- Krizhevsky, A.; Hinton, G.; et al. Learning multiple layers of features from tiny images. Toronto, ON, Canada, 2009.
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017; arXiv:1704.04861 2017. [Google Scholar]
- Wang, D.; Shang, Y. A new active labeling method for deep learning. In Proceedings of the 2014 International joint conference on neural networks (IJCNN). IEEE; 2014; pp. 112–119. [Google Scholar]
- Kirsch, A.; Van Amersfoort, J.; Gal, Y. Batchbald: Efficient and diverse batch acquisition for deep bayesian active learning. Advances in neural information processing systems 2019, 32. [Google Scholar]
- Houlsby, N.; Huszár, F.; Ghahramani, Z.; Lengyel, M. Bayesian active learning for classification and preference learning. arXiv preprint arXiv:1112.5745, 2011; arXiv:1112.5745 2011. [Google Scholar]
- Loquercio, A.; Segu, M.; Scaramuzza, D. A general framework for uncertainty estimation in deep learning. IEEE Robotics and Automation Letters 2020, 5, 3153–3160. [Google Scholar] [CrossRef]
- Kuo, W.; Häne, C.; Yuh, E.; Mukherjee, P.; Malik, J. Cost-sensitive active learning for intracranial hemorrhage detection. In Proceedings of the Medical Image Computing and Computer Assisted Intervention–MICCAI 2018: 21st International Conference, Granada, Spain, 2018, Proceedings, Part III 11. Springer, 2018, September 16-20; pp. 715–723.
- Beluch, W.H.; Genewein, T.; Nürnberger, A.; Köhler, J.M. The power of ensembles for active learning in image classification. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 9368–9377.
- Yehuda, O.; Dekel, A.; Hacohen, G.; Weinshall, D. Active learning through a covering lens. Advances in Neural Information Processing Systems 2022, 35, 22354–22367. [Google Scholar]
- Elhamifar, E.; Sapiro, G.; Yang, A.; Sasrty, S.S. A convex optimization framework for active learning. In Proceedings of the Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 209–216.
- Hasan, M.; Roy-Chowdhury, A.K. Context aware active learning of activity recognition models. In Proceedings of the Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 4543–4551.
- Citovsky, G.; DeSalvo, G.; Gentile, C.; Karydas, L.; Rajagopalan, A.; Rostamizadeh, A.; Kumar, S. Batch active learning at scale. Advances in Neural Information Processing Systems 2021, 34, 11933–11944. [Google Scholar]
- Yang, C.; Wu, Q.; Li, H.; Chen, Y. Generative poisoning attack method against neural networks. arXiv preprint arXiv:1703.01340, 2017; arXiv:1703.01340 2017. [Google Scholar]
- Guo, R.; Chen, Q.; Liu, H.; Wang, W. Adversarial robustness enhancement for deep learning-based soft sensors: An adversarial training strategy using historical gradients and domain adaptation. Sensors 2024, 24, 3909. [Google Scholar] [CrossRef] [PubMed]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proceedings of the IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Netzer, Y.; Wang, T.; Coates, A.; Bissacco, A.; Wu, B.; Ng, A.Y.; et al. Reading digits in natural images with unsupervised feature learning. In Proceedings of the NIPS workshop on deep learning and unsupervised feature learning. Granada, Vol. 2011; p. 4. [Google Scholar]
- Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; Lerer, A. Automatic differentiation in pytorch, 2017.
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014; arXiv:1412.6980 2014. [Google Scholar]
- Bottou, L. Large-scale machine learning with stochastic gradient descent. In Proceedings of the Proceedings of COMPSTAT’2010: 19th International Conference on Computational StatisticsParis France, August 22-27, 2010 Keynote, Invited and Contributed Papers.
- Zhao, X.; Chen, F.; Hu, S.; Cho, J.H. Uncertainty aware semi-supervised learning on graph data. Advances in Neural Information Processing Systems 2020, 33, 12827–12836. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.




| Methods | MNIST | CIFAR-10 | CIFAR-10s | SVHN | Avg |
|---|---|---|---|---|---|
| Entropy[23] (IJCNN14) | 92.78±0.14 | 84.00±0.13 | 64.00±1.42 | 70.91±9.01 | 77.92 |
| Margin[6] (WIREs14) | 93.42±0.13 | 83.62±0.07 | 63.47±0.37 | 69.64±4.84 | 77.53 |
| Least Confidence[5] | 92.98±0.13 | 83.51±0.04 | 63.89±1.79 | 70.52±2.25 | 77.73 |
| Random[3] | 87.78±0.32 | 81.62±0.19 | 62.94±0.01 | 69.82±3.42 | 75.54 |
| EntropyBayesian[7][7] (ICML16) | 92.05±0.65 | 83.62±0.25 | 61.82±0.56 | 71.21±4.13 | 77.18 |
| CoreSet[12] | 89.45±1.12 | 83.04±0.14 | 63.78±0.72 | 71.41±1.94 | 76.85 |
| UncertainGCN[40] (NIPS20) | 87.56±2.47 | 83.51±0.10 | 61.40±4.43 | 70.65±0.07 | 75.78 |
| ProbCover[29] (NIPS22) | 88.75±0.68 | 81.63±0.17 | 64.13±1.88 | 69.04±12.58 | 75.89 |
| BALD[4] (ICML17) | 92.55±1.22 | 82.03±0.22 | 63.62±1.81 | 70.36±3.12 | 77.14 |
| BADGE[19] | 92.35±0.06 | 83.83±0.05 | 62.46±1.69 | 71.94±2.35 | 77.65 |
| Alpha-Mix[1] (CVPR22) | 93.10±0.46 | 84.22±0.05 | 62.58±0.72 | 69.43±0.28 | 77.25 |
| NoiseStability[20] (IJCAI24) | 92.63±0.37 | 83.87±0.04 | 62.55±0.36 | 69.87±19.85 | 77.23 |
| DM2(Ours) | 93.11±0.34 | 84.29±0.01 | 64.69±3.92 | 72.29±2.69 | 78.60 |
| Methods | SVHN (ResNet18) | CIFAR-10 (vgg16) | Avg |
|---|---|---|---|
| Entropy[23] | 92.94±0.04 | 75.97±5.56 | 84.46 |
| Margin[6] | 92.99±0.01 | 79.51±0.24 | 86.25 |
| Least Confidence[5] | 93.02±0.01 | 79.57±0.18 | 86.3 |
| Random[3] | 91.32±0.09 | 77.84±0.08 | 84.58 |
| EntropyBayesian[7] | 91.38±0.09 | 77.12±3.34 | 84.27 |
| CoreSet[12] | 92.48±0.01 | 77.10±0.08 | 84.79 |
| UncertainGCN[40] | 91.57±0.06 | 78.82±0.05 | 85.20 |
| ProbCover[29] | 90.88±0.19 | 77.62±0.17 | 84.25 |
| BALD[4] | 92.67±0.02 | 75.51±1.2 | 84.09 |
| BADGE[19] | 93.04±0.06 | 79.20±0.76 | 86.12 |
| Alpha-Mix[1] | 92.69±0.03 | 79.06±0.34 | 85.88 |
| NoiseStability[20] | 92.91±0.04 | 79.10±0.42 | 86.01 |
| DM2(Ours) | 93.05±0.05 | 79.67±0.07 | 86.36 |
| Methods | MNIST-C (CNN) | SVHN-C (MobileNet) | CIFAR10-C (MobileNet) | SVHN-C (ResNet18) | CIFAR10-C (ResNet34) |
|---|---|---|---|---|---|
| Entropy[23] | 84.74±2.54 | 72.51±1.25 | 67.21±0.54 | 79.53±0.89 | 74.25±2.56 |
| Least Confidence[5] | 85.62±1.67 | 73.66±0.72 | 68.35±1.85 | 81.20±2.56 | 75.61±1.87 |
| Margin[6] | 84.83±1.34 | 73.24±1.54 | 67.04±1.78 | 79.98±2.28 | 74.32±1.43 |
| Random[3] | 82.61±3.78 | 70.28±2.66 | 63.27±0.28 | 78.61±3.95 | 73.01±1.22 |
| EntropyBayesian[7] | 83.34±1.32 | 71.64±2.84 | 63.59±2.72 | 78.91±0.31 | 73.20±0.18 |
| CoreSet[12] | 84.52±2.54 | 72.91±1.52 | 62.18±3.36 | 79.01±1.02 | 73.61±2.45 |
| UncertainGCN[40] | 83.11±1.98 | 71.37±1.73 | 64.36±1.84 | 78.23±2.43 | 72.93±1.95 |
| ProbCover[29] | 80.13±2.75 | 70.01±3.43 | 65.23±0.76 | 79.54±3.61 | 73.81±3.68 |
| BALD[4] | 84.55±1.43 | 72.29±2.85 | 64.87±2.05 | 79.02±0.88 | 72.77±2.84 |
| BADGE[19] | 85.78±0.74 | 73.35±1.94 | 65.79±1.96 | 80.11±1.96 | 74.32±1.29 |
| Alpha-Mix[1] | 85.91±0.32 | 73.59±1.22 | 67.91±2.67 | 81.32±1.61 | 75.29±2.71 |
| NoiseStability[20] | 85.26±1.22 | 73.03±2.51 | 66.52±1.43 | 80.89±2.27 | 73.04±1.43 |
| DM2-AT(Ours) | 86.62±0.89 | 74.58±2.13 | 69.04±1.69 | 82.03±2.44 | 76.61±1.76 |
| Method | Mnist (/s) | CIFAR-10 (/m) |
|---|---|---|
| BADGE | 2 | 2.07 |
| EntropyBayesian | 4 | 3.25 |
| BALD | 16 | 4.58 |
| NoiseStability | 20 | 4.21 |
| Our | 3 | 2.07 |
| Datasets | -Distance | -Fusion Ratio | -Perturbation | DM2-AT |
|---|---|---|---|---|
| MNIST-C (CNN) | ||||
| CIFAR10-C (MobileNet) | ||||
| SVHN-C (MobileNet) | ||||
| SVHN-C (ResNet18) | ||||
| CIFAR10-C (ResNet34) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).