Submitted:
15 August 2025
Posted:
20 August 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Related Work
3. Background
3.1. Convolutional Neural Networks (CNN)
- Input Image: The network begins with an input image; in this example, an image of a dog is used.
- Convolutional Layer: The first layer performs convolution operations on the input image [17]. It extracts features such as edges, textures, and patterns by applying various filters or kernels, producing a set of feature maps.
- Pooling Layer: This layer reduces the spatial dimensions of the feature maps, which lowers computational complexity and mitigates overfitting through dimensionality reduction [17].
- Activation Function Layer: Following convolution, a nonlinear activation function is applied, typically the rectified linear unit (ReLU) [17].
- Fully Connected Layer: After a series of convolutional, activation, and pooling layers, the feature maps are flattened into a one-dimensional vector and passed through fully connected layers. These layers perform the final classification based on learned features [15].
- Output Layer: The final layer produces class probabilities. In the illustrated architecture, the network predicts whether the image corresponds to “Dog” or “Not Dog,” selecting the class with the highest probability as the output.


3.1.1. MobileNet
- Depthwise convolution: Applies a single filter to each input channel separately.
- Pointwise convolution: Combines the outputs across channels using a 1×1 convolution.
3.1.2. ResNet
3.1.3. DenseNet
3.2. Robust Hashing
4. Methodology
4.1. Dataset
4.2. Dataset Obfuscation
- Salting Ratio: 90% of the final salted sample is derived from the primary malware, while 10% is the adversarial sample, resulting in a salting ratio of 0.1.
- Image Size: The combined binary data is reshaped into a grayscale image of size (65,536 bytes).
- Majority and Minority Bytes: Byte segments are proportionally extracted from the primary and secondary malware samples.
- Salting Process: Each primary malware file is randomly paired with a file from a different family. 90% of the data is taken from the primary file, while 10% is taken from the secondary. The minority portion is randomly positioned to avoid padded regions commonly found at the start of binary files.
- Output: The resulting salted sample is saved as a grayscale PNG image.
4.3. Robust Hashing for Malware Classification
- Input: Raw binary malware files from the dataset.
- Block-wise Histogram Calculation: Byte frequencies (0–255) are computed for each fixed-size block of the binary file.
- Matrix Formation: Histograms from all blocks are stacked to form a 2D matrix.
- Normalization: Each row of the matrix is scaled to sum to 1.
- Resizing: The histogram matrix is resized to a 256×256 image to standardize input for CNN models.
- Gamma Correction: Applied to enhance contrast and bring out subtle visual features (gamma = 0.4).
- Output: Final PNG images representing robust hashes, ready for classification.
5. Experiments
- CNN evaluation on non-obfuscated data without hashing.
- CNN evaluation on non-obfuscated data with robust hashing.
- Experiments on obfuscated datasets using both hashed and standard image conversions.
- Application of five salting ratios: 0%, 10%, 20%, 30%, and 40%.
- Testing across MobileNet, ResNet, and DenseNet models.
- Each configuration repeated twice for validation and reliability.
6. Results
7. Conclusion
Funding
Data Availability Statement
Conflicts of Interest
References
- Kramer, S.; Bradfield, J.C. A general definition of malware. Journal in computer virology 2010, 6, 105–114. [Google Scholar] [CrossRef]
- O’Kane, P.; Sezer, S.; McLaughlin, K. Obfuscation: The hidden malware. IEEE Security & Privacy 2011, 9, 41–47. [Google Scholar] [CrossRef]
- J, M.P.; C D, A.; AS, A.; B. R, S.P.; S.M, R. Malware Detection using Machine Learning. In Proceedings of the 2024 Second International Conference on Advances in Information Technology (ICAIT), 2024, Vol. 1, pp. 1–5. [CrossRef]
- Salloum, S.A.; Alshurideh, M.; Elnagar, A.; Shaalan, K. Machine learning and deep learning techniques for cybersecurity: a review. In Proceedings of the The International Conference on Artificial Intelligence and Computer Vision. Springer, 2020, pp. 50–57.
- Huang, W.C.; Di Troia, F.; Stamp, M. Robust Hashing for Image-based Malware Classification. In Proceedings of the ICETE (1); 2018; pp. 617–625. [Google Scholar]
- Bokolo, B.; Jinad, R.; Liu, Q. A Comparison Study to Detect Malware using Deep Learning and Machine learning Techniques. In Proceedings of the 2023 IEEE 6th International Conference on Big Data and Artificial Intelligence (BDAI); 2023; pp. 1–6. [Google Scholar] [CrossRef]
- Sethi, K.; Kumar, R.; Sethi, L.; Bera, P.; Patra, P.K. A Novel Machine Learning Based Malware Detection and Classification Framework. In Proceedings of the 2019 International Conference on Cyber Security and Protection of Digital Services (Cyber Security); 2019; pp. 1–4. [Google Scholar] [CrossRef]
- Abhesa, R.A. ; Hendrawan. In ; Ismail, S.J.I. Classification of Malware Using Machine Learning Based on Image Processing. In Proceedings of the 2021 15th International Conference on Telecommunication Systems, Services, 2021, and Applications (TSSA); pp. 1–4. [CrossRef]
- Okubo, S.; Kimura, T.; Cheng, J. Entropy-Based Malware Detection Using One Dimensional CNN. In Proceedings of the 2024 International Conference on Consumer Electronics - Taiwan (ICCE-Taiwan); 2024; pp. 763–764. [Google Scholar] [CrossRef]
- Hebish, M.W.; Awni, M. CNN-Based Malware Family Classification and Evaluation. In Proceedings of the 2024 14th International Conference on Electrical Engineering (ICEENG). IEEE; 2024; pp. 219–224. [Google Scholar]
- Chang, H.Y.; Huang, Y.C.; Chang, T.S.; Chang, Y.C. A Malware Detection and Classification System Based on Lightweight CNN Model. In Proceedings of the 2025 IEEE International Conference on Consumer Electronics (ICCE); 2025; pp. 1–4. [Google Scholar] [CrossRef]
- Yajamanam, S.; Selvin, V.R.S.; Di Troia, F.; Stamp, M. Deep Learning versus Gist Descriptors for Image-based Malware Classification. In Proceedings of the Icissp; 2018; pp. 553–561. [Google Scholar]
- Bhodia, N.; Prajapati, P.; Di Troia, F.; Stamp, M. Transfer learning for image-based malware classification. arXiv preprint arXiv:1903.11551, arXiv:1903.11551 2019.
- Tran, K.; Di Troia, F.; Stamp, M. Robustness of image-based malware analysis. In Proceedings of the Silicon Valley Cybersecurity Conference. Springer; 2022; pp. 3–21. [Google Scholar]
- Wang, Z.J.; Turko, R.; Shaikh, O.; Park, H.; Das, N.; Hohman, F.; Kahng, M.; Chau, D.H.P. CNN explainer: learning convolutional neural networks with interactive visualization. IEEE Transactions on Visualization and Computer Graphics 2020, 27, 1396–1406. [Google Scholar] [CrossRef] [PubMed]
- Taye, M.M. Theoretical understanding of convolutional neural network: Concepts, architectures, applications, future directions. Computation 2023, 11, 52. [Google Scholar] [CrossRef]
- Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. Journal of big Data 2021, 8, 1–74. [Google Scholar] [CrossRef] [PubMed]
- Sinha, D.; El-Sharkawy, M. Thin MobileNet: An Enhanced MobileNet Architecture. In Proceedings of the 2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON); 2019; pp. 0280–0285. [Google Scholar] [CrossRef]
- He, F.; Liu, T.; Tao, D. Why resnet works? residuals generalize. IEEE transactions on neural networks and learning systems 2020, 31, 5349–5362. [Google Scholar] [CrossRef] [PubMed]
- Abdulazeez, F.A.; Ahmed, I.T.; Hammad, B.T. Examining the performance of various pretrained convolutional neural network models in malware detection. Applied Sciences 2024, 14, 2614. [Google Scholar] [CrossRef]
- Martínez, S.; Gérard, S.; Cabot, J. Robust hashing for models. In Proceedings of the Proceedings of the 21th ACM/IEEE International Conference on Model Driven Engineering Languages and Systems, 2018, pp.
- Ramakrishna, M. Hashing practice: analysis of hashing and universal hashing. ACM SIGMOD Record 1988, 17, 191–199. [Google Scholar] [CrossRef]
- Albeshri, A. An image hashing-based authentication and secure group communication scheme for IoT-enabled MANETs. Future Internet 2021, 13, 166. [Google Scholar] [CrossRef]
- Venkatesan, R.; Koon, S.M.; Jakubowski, M.; Moulin, P. Robust image hashing. In Proceedings of the Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101), Vol. 3; 2000; pp. 664–666. [Google Scholar] [CrossRef]
- Japkowicz, N.; Stephen, S. The class imbalance problem: A systematic study. Intelligent data analysis 2002, 6, 429–449. [Google Scholar] [CrossRef]
- Bakken, D.E.; Rarameswaran, R.; Blough, D.M.; Franz, A.A.; Palmer, T.J. Data obfuscation: Anonymity and desensitization of usable data sets. IEEE Security & Privacy 2004, 2, 34–41. [Google Scholar] [CrossRef]
- Maiorca, D.; Ariu, D.; Corona, I.; Aresu, M.; Giacinto, G. Stealth attacks: An extended insight into the obfuscation effects on android malware. Computers & Security 2015, 51, 16–31. [Google Scholar] [CrossRef]








| No. | Malware Family | Number of Images |
|---|---|---|
| 1 | VBInject | 1214 |
| 2 | Winwebsec | 1202 |
| 3 | Renos | 0850 |
| 4 | BHO | 0733 |
| 5 | Startpage | 0685 |
| 6 | Adload | 0654 |
| 7 | OnLineGames | 0636 |
| 8 | Vobfus | 0581 |
| 9 | FakeRean | 0544 |
| 10 | Zbot | 0538 |
| 11 | Allaple.A | 0510 |
| 12 | Wintrim.BX | 0505 |
| 13 | CeeInject | 0471 |
| 14 | Cycbot.G | 0458 |
| 15 | VB | 0451 |
| 16 | Vundo | 0433 |
| 17 | Agent | 0430 |
| 18 | Toga!rfn | 0393 |
| 19 | Rimecud.A | 0386 |
| 20 | Obfuscator | 0314 |
| Model | Robust Hashing | Without Robust Hashing | Accuracy Increment |
|---|---|---|---|
| MobileNet | 87.50% | 65.50% | +22.00% |
| ResNet | 88.00% | 67.50% | +21.50% |
| DenseNet | 89.50% | 68.00% | +21.50% |
| Model | Robust Hashing | Without Robust Hashing | Accuracy Increment |
|---|---|---|---|
| MobileNet | 83.00% | 62.50% | +21.50% |
| ResNet | 85.50% | 65.00% | +20.50% |
| DenseNet | 86.00% | 64.50% | +21.50% |
| Model | Robust Hashing | Without Robust Hashing | Accuracy Increment |
|---|---|---|---|
| MobileNet | 72.00% | 55.00% | +17.00% |
| ResNet | 72.00% | 56.50% | +15.50% |
| DenseNet | 73.50% | 57.00% | +16.50% |
| Model | Robust Hashing | Without Robust Hashing | Accuracy Increment |
|---|---|---|---|
| MobileNet | 54.50% | 43.00% | +11.50% |
| ResNet | 57.00% | 44.50% | +12.50% |
| DenseNet | 58.00% | 47.00% | +11.00% |
| Model | Robust Hashing | Without Robust Hashing | Accuracy Increment |
|---|---|---|---|
| MobileNet | 48.00% | 39.00% | +9.00% |
| ResNet | 48.17% | 41.50% | +7.20% |
| DenseNet | 50.13% | 43.00% | +7.13% |
| Model | Robust Hashing | Without Robust Hashing | Accuracy Increment |
|---|---|---|---|
| MobileNet | 69.00% | 53.00% | +16.00% |
| ResNet | 70.13% | 55.00% | +15.13% |
| DenseNet | 71.43% | 55.90% | +15.53% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).