Submitted:
23 April 2026
Posted:
24 April 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Novelty and Main Contributions
- 1.
- TransForge: a purpose-built, category-agnostic benchmarking framework for endpoint detection robustness evaluation across six execution artifact categories and four programming languages, with an evolutionary core that is architecturally invariant across all artifact types.
- 2.
- Modular transformation pipeline: language-aware mutation operators with handler-based artifact routing, preserving a consistent evolutionary core across all artifact categories.
- 3.
- Multi-stage fitness evaluation pipeline: integration of static and behavioral detection signals under a multi-objective formulation balancing detection variance, transformation efficiency, and functional correctness.
- 4.
- Category-specific functionality validation: per-category correctness protocols ensuring fitness-driven selection does not favor non-functional variants.
3. Related Work
3.1. Single-Category Coupling
3.2. Static-Only Optimization
3.3. Functionality Preservation
3.4. Evasion Effectiveness Across Adaptive Approaches
4. Methodology
4.1. Framework Overview
4.2. Execution Corpus
4.3. Chromosome Representation and Transformation Pipeline
4.4. Fitness Evaluation Pipeline
4.5. Functionality Validation
4.6. Evolutionary Configuration and Experimental Design
4.7. Design Rationale and Limitations
4.8. Threat Model and Evaluation Scope
5. Hypotheses
6. Results
6.1. Evaluation of H1: Detection Outcome Differences
6.2. Evaluation of H2: Functionality Preservation
6.3. Evaluation of H3: Consistency Under Repeated Execution
6.4. Ablation Analysis
7. Discussion
7.1. Category-Specific Detection Behavior
7.2. Implications for Detection Robustness Evaluation
7.3. Limitations
8. Ethical Considerations
9. Conclusions
Author Contributions
Funding
Acknowledgments
Data Availability Statement
DURC Statement
Conflicts of Interest
Abbreviations
| AV | Antivirus |
| EDR | Endpoint Detection and Response |
| GA | Genetic Algorithm |
| GP | Genetic Programming |
| RL | Reinforcement Learning |
| LLM | Large Language Model |
| PE | Portable Executable |
| APK | Android Package Kit |
| XSS | Cross-Site Scripting |
| WAF | Web Application Firewall |
| CAPE | Cuckoo Automated Payload Execution |
| VT | VirusTotal |
| HA | Hybrid Analysis |
| SHA | Secure Hash Algorithm |
| API | Application Programming Interface |
| SSH | Secure Shell |
| DNS | Domain Name System |
| DOM | Document Object Model |
| PHP | Hypertext Preprocessor |
| XOR | Exclusive Or |
| HTTP | Hypertext Transfer Protocol |
| URL | Uniform Resource Locator |
| CSV | Comma-Separated Values |
References
- Le Faou, A. Antivirus and EDR Bypass Techniques Explained. Vaadata Blog. 2024. Available online: https://www.vaadata.com/blog/antivirus-and-edr-bypass-techniques/ (accessed on Aug. 26 2025).
- Traoré, A.; Le Faou, A. Red Teaming: Methodology and Scope of a Red Team Operation. Vaadata Blog. 2024. Available online: https://www.vaadata.com/blog/what-is-red-teaming-methodology-and-scope-of-a-red-team-operation/ (accessed on 20 October 2025).
- Mandvi, K. Threat Actors Exploit AV/EDR Evasion Framework to Deploy Malware in the Wild. Cyber Security News. 2025. Available online: https://cyberpress.org/threat-actors-exploit-av-edr-evasion-framework/ (accessed on 26 August 2025).
- K V, A.; P M, B.; Nagamani, S.N.S.; Patil, H. AV Evasion Techniques: A Practical Evaluation of Payload Obfuscation. Int. J. Sci. Res. Arch. 2025, 16, 1504–1511. [Google Scholar] [CrossRef]
- Cirkovic, S.; Mladenovic, V.; Tomic, S.; Drljaca, D.; Ristic, O. Utilizing Fine-Tuning of Large Language Models for Generating Synthetic Payloads. Comput. Mater. Contin. 2025, 82, 4409–4430. [Google Scholar] [CrossRef]
- Khan, S. LL-XSS: End-to-End Generative Model-Based XSS Payload Creation. In Proceedings of the 21st Learning and Technology Conference, 2024; pp. 121–126. [Google Scholar] [CrossRef]
- Kingful, F.; Ahene, E.; Appiah, B.; Frimpong, B.; Osei, I.; Hammond, E. Dynamic Programming-Based Adversarial Windows Payload Generator. Research Squaret. 2023. [CrossRef]
- Anderson, H.S.; Kharkar, A.; Filar, B.; Evans, D.; Roth, P. Learning to Evade Static PE Machine Learning Malware Models via Reinforcement Learning, 2018. arXiv arXiv:1801.08917. [CrossRef]
- Domico, K.; Ferrand, J.C.; Sheatsley, R.; Pauley, E.; Hanna, J.; McDaniel, P. Adversarial Agents: Black-Box Evasion Attacks with Reinforcement Learning. arXiv 2025, arXiv:2503.01734. [Google Scholar] [CrossRef]
- Castro, R.L.; Schmitt, C.; Dreo, G. AIMED: Evolving Malware with Genetic Programming to Evade Detection. In Proceedings of the 2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE), 2019; pp. 240–247. [Google Scholar] [CrossRef]
- Liu, Z.; Fang, Y.; Huang, C.; Xu, Y. GAXSS: Effective Payload Generation Method to Detect XSS Vulnerabilities Based on Genetic Algorithm. Secur. Commun. Netw. 2022, 2031924. [Google Scholar] [CrossRef]
- Rathore, H.; B, P.; Iyengar, S.S.; Sahay, S.K. Breaking the Anti-malware: EvoAAttack Based on Genetic Algorithm Against Android Malware Detection Systems. In Proceedings of the Computational Science – ICCS 2023: 23rd International Conference, Prague, Czech Republic; Proceedings, Part V, Berlin, Heidelberg, July 3–5, 2023; 2023; pp. 535–550. [Google Scholar] [CrossRef]
- Faruki, P.; Bhan, R.; Jain, V.; Bhatia, S.; El Madhoun, N.; Pamula, R. A Survey and Evaluation of Android-Based Malware Evasion Techniques and Detection Frameworks. Information 2023, 14. [Google Scholar] [CrossRef]
- Berger, H.; Hajaj, C.; Dvir, A. Evasion Is Not Enough: A Case Study of Android Malware. In Cyber Security Cryptography and Machine Learning; Springer International Publishing: Cham, Switzerland, 2020; pp. 167–174. [Google Scholar] [CrossRef]
- D’Elia, D.C.; Coppa, E.; Palmaro, F.; Cavallaro, L. On the Dissection of Evasive Malware. IEEE Trans. Inf. Forensics Secur. 2020, 15, 2750–2765. [Google Scholar] [CrossRef]
- Minja, A.R.; Ndibwile, J.D. ShellForge: A Genetic Algorithm-Based Reverse Shell Generator for AV/EDR Evasion. In Proceedings of the Proceedings of the IEEE, 2026. Accepted, in press.
- Xu, Y.; Fang, Y.; Xu, Y.; Wang, Z. Automatic optimization for generating adversarial malware based on prioritized evolutionary computing. Appl. Soft Comput. 2025, 173, 112933. [Google Scholar] [CrossRef]
- Digregorio, G.; Maccarrone, S.; D’Onghia, M.; Gallo, L.; Carminati, M.; Polino, M.; Zanero, S. Tarallo: Evading Behavioral Malware Detectors in the Problem Space. arXiv 2024, arXiv:2506.02660. [Google Scholar] [CrossRef]
- Yuste, J.; Pardo, E.G.; Tapiador, J. Optimization of code caves in malware binaries to evade machine learning detectors. Comput. Secur. 2022, 116, 102643. [Google Scholar] [CrossRef]
- Lan, T.; Naït-Abdesselam, F. LLM-Driven Feature-Level Adversarial Attacks on Android Malware Detectors, 2025. arXiv arXiv:2512.21404. [CrossRef]
- Anderson, H.S.; Kharkar, A.; Filar, B. Evading Machine Learning Malware Detection.

| Study | Approach | Format | Evasion Rate | Scenario |
|---|---|---|---|---|
| Anderson et al. [8] | RL (ACER) | PE Binary | 10–24% | Black-box |
| Anderson et al. [21] | RL (Deep Q) | PE Binary | 16% | Black-box |
| AIMED [10] | Genetic Programming | PE Binary | 24% | Black-box |
| Yuste et al. [19] | GA + Code Caves | PE Binary | 97.99% | Black-box |
| EvoAAttack [12] | GA (permissions) | Android APK | 97.48% | Grey-box |
| Tarallo [18] | PS-FGSM | PE Binary | 99% | White+Black-box |
| LAMLAD [20] | Dual-agent LLM | Android APK | 97% | Black-box |
| Domico et al. [9] | RL (PPO) | Mixed | +17% AEs | Black-box |
| ShellForge [16] | GA (source-level) | Python Script | 99.99% | Black-box |
| Study | Multi-Category | Behavioral Evaluation | Functionality Validated | Cross-Language |
|---|---|---|---|---|
| AIMED [10] | × | × | Partial | × |
| GAXSS [11] | × | × | × | × |
| EvoAAttack [12] | × | × | × | × |
| Anderson et al. [8] | × | × | × | × |
| Tarallo [18] | × | ✓ | Partial | × |
| ShellForge [16] | × | ✓ | ✓ | × |
| TransForge | ✓ | ✓ | ✓ | ✓ |
| Category | Source | Languages | Count |
|---|---|---|---|
| Remote Execution Agents | Metasploit Framework | Python, PHP | 16 |
| Web Execution Interfaces | GitHub (WebShell Collection) | PHP | 26 |
| Staged Downloaders | GitHub | PowerShell, Python | 8 |
| Cross-Site Execution | GitHub | JavaScript | 9 |
| Surveillance Components | Custom | Python | 9 |
| Credential Access Routines | GitHub | Python, PowerShell | 7 |
| Total | 75 |
| Mutation Type | Python | PHP | PowerShell | JavaScript |
|---|---|---|---|---|
| Variable Renaming | ✓ | ✓ | ✓ | ✓ |
| Base64 Encoding | ✓ | ✓ | ✓ | ✓ |
| String Concatenation | ✓ | ✓ | ✓ | ✓ |
| Comment Injection | ✓ | ✓ | ✓ | ✓ |
| Whitespace Randomization | ✓ | ✓ | ✓ | ✓ |
| Function Obfuscation | ✓ | ✓ | ✓ | ✓ |
| ROT13 / Hex Encoding | ✓ | ✓ | ✓ | ✓ |
| XOR Encoding | ✓ | × | ✓ | × |
| Case Randomization | × | × | ✓ | × |
| Alias Substitution | × | × | ✓ | × |
| DOM Manipulation | × | × | × | ✓ |
| Execution Delays | ✓ | ✓ | ✓ | ✓ |
| Debugger Detection | ✓ | × | ✓ | ✓ |
| Process Environment Checks | ✓ | × | ✓ | × |
| DNS / Connection Timing | ✓ | ✓ | ✓ | × |
| Category | Validation Protocol | Timeout |
|---|---|---|
| Remote Execution Agents | Connection callback + shell response confirmation | 180s |
| Web Execution Interfaces | Reverse connection + command execution check | 60s |
| Staged Downloaders | Download completion + file integrity check | 120s |
| Credential Access Routines | Exfiltration confirmation | 120s |
| Surveillance Components | Keystroke capture confirmation | 60s |
| Cross-Site Execution | Alert-trigger verification | 30s |
| Category | n | 0-Detection | Min | Max | Mean Alerts |
|---|---|---|---|---|---|
| Remote Execution Agents | 16 | 16/16 | 0 | 0 | 0.00 |
| Web Execution Interfaces | 26 | 16/26 | 0 | 4 | 0.92 |
| Staged Downloaders | 8 | 0/8 | 7 | 10 | 8.75 |
| Credential Access Routines | 7 | 7/7 | 0 | 0 | 0.00 |
| Surveillance Components | 9 | 9/9 | 0 | 0 | 0.00 |
| Cross-Site Execution | 9 | 9/9 | 0 | 0 | 0.00 |
| Overall | 71 | 53/71 | 0 | 10 | 1.04 |
| Category | Functional Success | Mean VT Alerts | Avg Fitness | Best Mutation |
|---|---|---|---|---|
| Remote Execution Agents | 100% [16] | 0.00 | 0.906 | xor + base64 + behavioral |
| Web Execution Interfaces | 100% (26/26) | 0.92 | 0.925 | beh_php_input_json |
| Staged Downloaders | 50% (4/8) | 8.75 | 0.855 | ps_webrequest_swap |
| Credential Access Routines | 100% (6/6) | 0.00 | 0.955 | rot_encode |
| Surveillance Components | 85% (8/8 structural) | 0.00 | 0.925 | xor_encode |
| Cross-Site Execution | 100% (8/8) | 0.00 | 0.895 | js_var_rename |
| Category | Gen 0 | Gen 1 | Gen 2 | Gen 3 | Gen 4 | Converged |
|---|---|---|---|---|---|---|
| Remote Execution Agents | 0.820 | 0.856 | 0.879 | 0.906 | 0.906 | Gen 4 |
| Web Execution Interfaces | 0.699 | 0.714 | 0.734 | 0.750 | 0.755 | Gen 4 |
| Staged Downloaders | 0.620 | 0.634 | 0.651 | 0.655 | 0.655 | Gen 4 |
| Credential Access Routines | 0.750 | 0.750 | 0.754 | 0.755 | 0.755 | Gen 4 |
| Surveillance Components | 0.698 | 0.725 | 0.725 | 0.722 | 0.725 | Gen 4 |
| Cross-Site Execution | 0.629 | 0.683 | 0.691 | 0.693 | 0.695 | Gen 4 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).