Submitted:
19 December 2025
Posted:
22 December 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
Contribution
- Novel Framework for OOD Detection: We propose a two-stage framework for OOD detection and malware classification that exploits the variation in embedding space between variants of the same malware family. The first stage uses spherical decision boundaries defined through Gaussian discriminant analysis to evaluate the likelihood of a test sample being in-distribution, while the second stage integrates this information into a deep neural network for final prediction.
- Distance-Based Confidence Modeling: Our method utilizes statistical measures, including Z-score and the coefficient of variation, to determine how far a test sample lies from the centroid of a class distribution. A key insight is the use of multiple z-score comparisons across class boundaries, where a sample must lie within a standard range relative to at least one centroid to be considered in-distribution. This multi-boundary approach accounts for the multidimensional nature of malware distributions.
- Empirical Validation: We provide a comprehensive evaluation of our model using malware datasets containing 25 known malware families and multiple out-of-distribution variants. Our model outperforms existing baseline methods in OOD detection, achieving superior AUC scores, particularly for novel malware types.
2. Background and Related Work
3. Research Methodology
3.1. Spherical Boundary Modeling for Malware Classification
3.2. Cluster-Aided Malware Detection and Final Prediction Network

| Malware Family | AUC Score | Malware Family | AUC Score |
|---|---|---|---|
| Adposhel | 0.0059 | InstallCore | 0.539 |
| Agent | 0.105 | MultiPlug | 0.576 |
| Allaple | 0.113 | Neoreklami | 0.629 |
| Amonetize | 0.166 | Neshta | 0.585 |
| Androm | 0.266 | VBA | 0.619 |
| BrowseFox | 0.257 | Sality | 0.573 |
| Elex | 0.305 | Snarasite | 0.830 |
| Expiro | 0.360 | Stantinko | 0.846 |
| Fasong | 0.383 | Out-of-Dist. (Novel) | 0.911 |
| HackKMS | 0.424 | VBKrypt | 0.916 |
| Hlux | 0.463 | Vilsel | 0.993 |
| Injector | 0.499 |
| Method | Model Comparison and Evaluation | |||||
|---|---|---|---|---|---|---|
| AUC | AP-Id | AP-OOD | FPR | AR-OOD | ACC | |
| MSP | 0.611 | 0.464 | 0.322 | 0.613 | 0.526 | 53.82 |
| OE | 0.247 | 0.634 | 0.709 | 0.751 | 0.592 | 0.407 |
| EnergyOE | 0.651 | 0.660 | 0.792 | 0.736 | 0.808 | 0.682 |
| OCL | 0.637 | 0.529 | 0.558 | 0.771 | 0.690 | 0.625 |
| PASCL | 0.209 | 0.405 | 0.392 | 0.229 | 0.393 | 0.592 |
| OS | 0.692 | 0.442 | 0.842 | 0.793 | 0.712 | 0.827 |
| Class Prio | 0.596 | 0.376 | 0.816 | 0.728 | 0.693 | 0.606 |
| BERL | 0.846 | 0.572 | 0.561 | 0.807 | 0.737 | 0.812 |
| MAD-OOD | 0.906 | 0.951 | 0.861 | 0.940 | 0.817 | 0.907 |
| OOD Malware | Proposed Model Generalization | |||||
|---|---|---|---|---|---|---|
| Dataset | AUC | AP-Id | AP-OOD | TPR | AR-OOD | ACC |
| MaleVis | 0.911 | 0.864 | 0.822 | 0.813 | 0.926 | 93.82 |
| BODMAS | 0.847 | 0.834 | 0.809 | 0.851 | 0.792 | 0.907 |
| Virus-MNIST | 0.851 | 0.860 | 0.792 | 0.936 | 0.908 | 0.882 |
| Stamina | 0.837 | 0.929 | 0.858 | 0.871 | 0.890 | 0.925 |
| MalImg | 0.906 | 0.951 | 0.861 | 0.940 | 0.817 | 0.907 |
4. Metric Evaluation

- z z-score (the number of standard deviations a data point is from the mean)
- x is the observed value
- is the mean of the data point within the cluster
- is the standard deviation of the new data point to the centroid cluster standard deviation
- Computation of the Mean and Standard Deviation: Calculate the mean () and standard deviation () of the selected feature across a dataset of benign and potentially malicious samples.
- Z-score computation: Determine the Z-score for each family represented by the centroid.
-
Pre-Determine in-distribution Threshold: Common thresholds for outlier detectionZ > 1 (or Z < -1) → Possible outlierZ > 2 → Highly suspicious
- Initial Input Classification: If a sample has a Z-score beyond the chosen threshold, it is flagged as an outlier and so is initially classify a possible out-of-distribution sample
4.1. Training a Computer Vision Based-Deep Learning Classifier for Initial Family Prediction
4.2. Training a Computer Vision Based-Deep Learning Classifier for Final Prediction
5. Contributions over Prior Work
References
- Zhou, Y. Rethinking reconstruction autoencoder-based out-of-distribution detection. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022; pp. 7379–7387. [Google Scholar]
- Ige, T.; Kiekintveld, C.; Piplai, A.; Wagler, A.; Kolade, O.; Matti, B.H. An in-Depth Investigation Into the Performance of State-of-the-Art Zero-Shot, Single-Shot, and Few-Shot Learning Approaches on an Out-of-Distribution Zero-Day Malware Attack Detection. In Proceedings of the 2024 International Symposium on Networks, Computers and Communications (ISNCC). IEEE, 2024; pp. 1–6. [Google Scholar]
- Ige, T.; Kiekintveld, C.; Piplai, A.; Wagler, A.; Kolade, O.; Matti, B.H. Towards an in-depth evaluation of the performance, suitability and plausibility of few-shot meta transfer learning on an unknown out-of-distribution cyber-attack detection. In Proceedings of the 2024 International Symposium on Networks, Computers and Communications (ISNCC). IEEE, 2024; pp. 1–6. [Google Scholar]
- Ige, T.; Kiekintveld, C.; Piplai, A. An investigation into the performances of the state-of-the-art machine learning approaches for various cyber-attack detection: A survey. In Proceedings of the 2024 IEEE International Conference on Electro Information Technology (eIT), 2024; IEEE; pp. 135–144. [Google Scholar]
- Ige, T.; Kiekintveld, C.; Piplai, A. Deep learning-based speech and vision synthesis to improve phishing attack detection through a multi-layer adaptive framework. arXiv 2024, arXiv:2402.17249. [Google Scholar]
- Ige, T.; Kiekintveld, C. Performance comparison and implementation of bayesian variants for network intrusion detection. In Proceedings of the 2023 IEEE International Conference on Artificial Intelligence, Blockchain, and Internet of Things (AIBThings), 2023; IEEE; pp. 1–5. [Google Scholar]
- Ige, T.; Kolade, A.; Kolade, O. Enhancing border security and countering terrorism through computer vision: A field of artificial intelligence. In Proceedings of the Computational Methods in Systems and Software, 2022; Springer; pp. 656–666. [Google Scholar]
- Ige, T.; Sikiru, A. Implementation of data mining on a secure cloud computing over a web API using supervised machine learning algorithm. In Proceedings of the Computer Science On-line Conference, 2022; Springer; pp. 203–210. [Google Scholar]
- Nguyen, A.T.; Lu, F.; Munoz, G.L.; Raff, E.; Nicholas, C.; Holt, J. Out of distribution data detection using dropout bayesian neural networks. Proceedings of the AAAI Conference on Artificial Intelligence 2022, 36, 7877–7885. [Google Scholar] [CrossRef]
- Wood, D.; Kapp, D.; Kebede, T.; Hirakawa, K. LMP-GAN: Out-of-Distribution Detection for Non-Control Data Malware Attacks. IEEE Transactions on Pattern Analysis and Machine Intelligence 2025. [Google Scholar]
- Park, S.; Gondal, I.; Kamruzzaman, J.; Oliver, J. Generative malware outbreak detection. In Proceedings of the 2019 IEEE International Conference on Industrial Technology (ICIT), 2019; IEEE; pp. 1149–1154. [Google Scholar]
- Ige, T.; Marfo, W.; Tonkinson, J.; Adewale, S.; Matti, B.H. Adversarial sampling for fairness testing in deep neural network. arXiv 2023, arXiv:2303.02874. [Google Scholar] [CrossRef]
- Datta, E.; Hennig, J.; Domschot, E.; Mattes, C.; Smith, M.R. Topology of Out-of-Distribution Examples in Deep Neural Networks. arXiv arXiv:2501.12522. [CrossRef]
- Shafiq, M.Z.; Khayam, S.A.; Farooq, M. Embedded malware detection using markov n-grams. In Proceedings of the International conference on detection of intrusions and malware, and vulnerability assessment, 2008; Springer; pp. 88–107. [Google Scholar]
- Kan, Z.; Pendlebury, F.; Pierazzi, F.; Cavallaro, L. Investigating labelless drift adaptation for malware detection. In Proceedings of the Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security, 2021; pp. 123–134. [Google Scholar]
- Ige, T.; Adewale, S. AI powered anti-cyber bullying system using machine learning algorithm of multinomial naïve Bayes and optimized linear support vector machine. arXiv 2022, arXiv:2207.11897. [Google Scholar] [CrossRef]
- Fort, S.; Ren, J.; Lakshminarayanan, B. Exploring the limits of out-of-distribution detection. Advances in neural information processing systems 2021, 34, 7068–7081. [Google Scholar]
- Yang, J.; Wang, P.; Zou, D.; Zhou, Z.; Ding, K.; Peng, W.; Wang, H.; Chen, G.; Li, B.; Sun, Y.; et al. Openood: Benchmarking generalized out-of-distribution detection. Advances in Neural Information Processing Systems 2022, 35, 32598–32611. [Google Scholar]
- Yang, J.; Zhou, K.; Li, Y.; Liu, Z. Generalized out-of-distribution detection: A survey. International Journal of Computer Vision 2024, 132, 5635–5662. [Google Scholar] [CrossRef]
- Adewale, S.; Ige, T.; Matti, B.H. Encoder-decoder based long short-term memory (lstm) model for video captioning. arXiv 2023, arXiv:2401.02052. [Google Scholar]
- Okomayin, A.; Ige, T. Ambient technology & intelligence. arXiv 2023, arXiv:2305.10726. [Google Scholar] [CrossRef]
- Karunanayake, N.; Gunawardena, R.; Seneviratne, S.; Chawla, S. Out-of-distribution data: An acquaintance of adversarial examples-a survey. ACM Computing Surveys 2025, 57, 1–40. [Google Scholar] [CrossRef]
- Um, D.; Lim, J.; Kim, S.; Yeo, Y.; Jung, Y. Spreading Out-of-Distribution Detection on Graphs. In Proceedings of the The Thirteenth International Conference on Learning Representations, 2025. [Google Scholar]
- Ruff, L.; Vandermeulen, R.; Goernitz, N.; Deecke, L.; Siddiqui, S.A.; Binder, A.; Müller, E.; Kloft, M. Deep one-class classification. In Proceedings of the International conference on machine learning. PMLR, 2018; pp. 4393–4402. [Google Scholar]
- Golan, I.; El-Yaniv, R. Deep anomaly detection using geometric transformations. Advances in neural information processing systems 2018, 31. [Google Scholar]
- Liang, S.; Li, Y.; Srikant, R. Enhancing the reliability of out-of-distribution image detection in neural networks. arXiv 2017, arXiv:1706.02690. [Google Scholar]
- Lee, K.; Lee, K.; Lee, H.; Shin, J. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. Advances in neural information processing systems 2018, 31. [Google Scholar]
- Lee, D.; Yu, S.; Yu, H. Multi-class data description for out-of-distribution detection. In Proceedings of the Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020; pp. 1362–1370. [Google Scholar]
- Ige, T. Exploiting the In-Distribution Embedding Space with Deep Learning and Bayesian Inference for Detection and Classification of an Out-of-Distribution Malware (Extended Abstract). PhilArchive;Extended abstract 2024. [Google Scholar]
- Ige, T. Impact of Variation in Vector Space on the Performance of Machine and Deep Learning Models on an Out-of-Distribution Malware Attack Detection. PhilArchive 2025. Forthcoming IEEE Conference Proceeding. [Google Scholar]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.