Submitted:
22 January 2025
Posted:
23 January 2025
You are already at the latest version
Abstract
Keywords:
Introduction
Literature Review
Background
Proposed Approach
Input Processing
- N: number of samples (individual files being analyzed)
- F: number of features extracted from each file
- : real number values
- W: weight matrix
- b: bias vector
- : activation function (ReLU)
- H: dense layer output
- : batch mean
- : batch variance
- : small constant for numerical stability
Attention Layer
- Q: Query matrix (learned representation of input)
- K: Key matrix (learned representation for matching)
- V: Value matrix (learned representation for output)
- : Dimension of key vectors
Custom Capsule Layer and Convolutional Layers
- : input vector to capsule
- : output vector
- : magnitude of input vector
LSTM Layers
- : input gate
- : forget gate
- : output gate
- : cell state
- : hidden state
- ⊙: element-wise multiplication
Feature Extraction and Classification Pipelines

Equations
A. Input Processing
- The input, denoted as , where N is the number of samples and F is the number of features, is processed through dense layers and batch normalization:where W and b are weights and biases of the dense layer, and is the activation function.
- Batch normalization:normalizes H to stabilize and accelerate training.
- N: number of samples (individual files being analyzed)
- F: number of features extracted from each file
- : real number values
- W: weight matrix
- b: bias vector
- : activation function (ReLU)
- H: dense layer output
- : batch mean
- : batch variance
- : small constant for numerical stability
B. Data Processing
- Global max pooling:reduces dimensionality while preserving important features.
C. Attention Layer
- The model incorporates a self-attention and Multi-Headed Attention layer to capture long-range dependencies:
- Self-attention:enhances focus on relevant parts of the input sequence.
- Multi-head attention:weighs different parts of the sequence differently.
- Q: Query matrix (learned representation of input)
- K: Key matrix (learned representation for matching)
- V: Value matrix (learned representation for output)
- : Dimension of key vectors
D. Capsule Layer
- A capsule layer maintains spatial relationships:preserves structure in the input data.
- : input vector to capsule
- : output vector
- : magnitude of input vector
E. LSTM Layers
- LSTM layers capture sequential dependencies:model sequential patterns and update state.
- : input gate
- : forget gate
- : output gate
- : cell state
- : hidden state
- ⊙: element-wise multiplication
F. Feature Extraction and Classification Pipelines
- The output features of the neural network:
- These features are then passed to different classifiers using Function Transformer to create separate pipelines:
Experimental Setup
Accuracy
- is the number of true positives (correctly classified malware),
- is the number of true negatives (correctly classified goodware),
- is the number of false positives (benign files incorrectly classified as malware),
- is the number of false negatives (malicious files incorrectly classified as benign).
Recall
Precision
F1-Score
Confusion Matrix
where:- (True Positive) is the number of correctly classified malware,
- (False Negative) is the number of malware incorrectly classified as benign,
- (False Positive) is the number of benign files incorrectly classified as malware,
- (True Negative) is the number of correctly classified benign files.
Results
Validation Results

Test Results
| Algorithm | Accuracy | Recall | F1-Score | ROC-AUC |
|---|---|---|---|---|
| MLP | 0.96 | 0.967 | 0.96 | 0.98 |
Conclusions
Future Work
References
- Nataraj, L.; Karthikeyan, S.; Jacob, G.; Manjunath, B. Malware images: Visualization and automatic classification. Proceedings of the 8th International Symposium on Visualization for Cyber Security 2011. [Google Scholar] [CrossRef]
- Vasan, D.; Alazab, M.; Safaei, B. Image-based malware classification using convolutional neural networks. Journal of Computer Virology and Hacking Techniques 2020, 16, 283–297. [Google Scholar]
- Ferrag, M.A.; Maglaras, L.; Moschoyiannis, S.; Janicke, H. Deep learning for cyber security intrusion detection: Approaches, datasets, and comparative study. Journal of Information Security and Applications 2020, 50, 102419. [Google Scholar] [CrossRef]
- Wang, X.; Zhao, Y.; Liang, Y. A bibliometric analysis of deep learning for malware detection research. IEEE Access 2018, 6, 33311–33321. [Google Scholar]
- Gharibian, S.; Ghorbani, A.A. A comparison of machine learning and deep learning approaches for malware detection. Security and Privacy Journal 2015. [Google Scholar]
- Hou, X.; Zhou, C.; Duan, L. A categorization of deep learning-based methods for malware detection. Journal of Computer Virology and Hacking Techniques 2016. [Google Scholar]
- Ye, Y.; Wang, D.; Li, T. Hybrid deep learning models for malware detection. IEEE Transactions on Information Forensics and Security 2019, 15, 3265–3277. [Google Scholar]
- Saxe, J.; Berlin, K. Deep neural network-based malware classification using binary file representations. Proceedings of the 10th International Conference on Malicious and Unwanted Software 2015. [Google Scholar] [CrossRef]
- Huang, L.; Stokes, J.W. Long short-term memory networks for dynamic malware analysis. Proceedings of the 25th USENIX Security Symposium 2016. [Google Scholar]
- Shahriar, H.; Klintic, G. Dynamic analysis of evasive malware using RNNs. Journal of Cyber Security 2016. [Google Scholar]
- Al-Qatf, M.; Lasheng, Y.; Mohamad, N. Deep learning approach for malware detection in IoT systems. Future Generation Computer Systems 2018, 91, 91–98. [Google Scholar]
- Jin, Z.; Yan, C.; Yue, J. Towards unsupervised anomaly detection in malware detection. Computers & Security 2018, 75, 14–29. [Google Scholar]
- Kumar, A.; Chen, Y. Detecting malware using generative adversarial networks. IEEE Symposium on Security and Privacy 2017. [Google Scholar]
- David, O.; Netanyahu, N.S. Deep learning for detecting malware. Proceedings of the 28th IEEE Computer Security Foundations Symposium 2015. [Google Scholar]
- Kim, H.; Lee, D. Static malware detection using memory-efficient CNNs. IEEE Transactions on Cybernetics 2020. [Google Scholar]
- Siddiqui, S.; Qureshi, M.F. A lightweight deep learning framework for mobile malware detection. Journal of Mobile Networks and Applications 2019, 24, 1023–1032. [Google Scholar]
- Al-Garadi, M.A.; Mohammed, A.; Guizani, M. A survey on deep learning techniques in cybersecurity and malware detection. IEEE Communications Surveys & Tutorials 2020, 22, 1940–1971. [Google Scholar]
- Pham, Q.; Yun, I.D. Hybrid CNN-LSTM model for malware detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2017. [Google Scholar]
- Tobiyama, S.; Yamada, H. Dynamic malware behavior analysis using attention-based LSTM. Proceedings of the 33rd Annual Computer Security Applications Conference 2016. [Google Scholar]
- Luo, X.; Yan, J. Automatic feature extraction for malware detection. Journal of Cyber Security Technology 2017, 1, 123–134. [Google Scholar]
- Xu, T.; Wang, P.; Liu, B. Enhancing malware detection with graph neural networks. IEEE Transactions on Neural Networks and Learning Systems 2020. [Google Scholar]
- Li, F.; Zhang, P. Learning adaptive malware detection models using reinforcement learning. Proceedings of the 14th ACM ASIA Conference on Computer and Communications Security 2018. [Google Scholar]
- Zhao, Y.; Qin, J. A dynamic deep learning model for malware detection. IEEE Access 2019, 7, 54543–54554. [Google Scholar]
- Sung, E.; Kim, D.S. Multi-task learning for detecting multiple malware families. Security and Privacy in Computing Systems 2016, 10–18. [Google Scholar]
- Zhou, X.; Xiao, X. Deep ensemble model for malware detection across multiple vectors. IEEE Access 2018, 6, 70571–70580. [Google Scholar]
- Berman, R.; Weck, R. Adversarial resilience in malware detection using deep learning. IEEE Transactions on Information Forensics and Security 2020. [Google Scholar]
- Wu, J.; Zhang, X. A survey on convolutional neural networks in cybersecurity. Journal of Cyber Security 2017, 42, 116–130. [Google Scholar]
- Pektas, A.; Acarman, T. Hybrid CNN-GRU architecture for malware detection. IEEE Transactions on Cybernetics 2018. [Google Scholar]
- Zhang, H.; Chen, K. Detection of malware using transfer learning. IEEE Transactions on Big Data 2019. [Google Scholar]
- Chen, J.; Wu, Y. An end-to-end deep learning approach for malware classification. IEEE Access 2020, 8, 1234–1245. [Google Scholar]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).