Submitted:
14 November 2023
Posted:
14 November 2023
You are already at the latest version
Abstract
Keywords:
I. Introduction
- Callgraph division. Firstly, the program’s callgraph was partitioned into small, closely related subgraphs based on the function invocation dependencies by utilizing the existing tree division method. Then, these small subgraphs were merged into the given quantity of subgraphs (equal to the number of parallel fuzzing instances) with approximately equal estimated fuzzing workloads, and every subgraph is easy to explore.
- Subgraph association and seed mapping. According to the total weight of functions belonging to each subgraph in the function-level trace of every seed, the global seed corpus was divided into sub-corpora dynamically. The task division problem with heterogeneous ensemble fuzzing is then resolved by assigning these sub-corpora to each fuzzer.
- Bitmaps & seeds synchronization and task scheduling. Bitmaps and seeds synchronization strategies are designed to synchronize the progress of each fuzzer. Specifically, the bitmaps of all paralleled fuzzers will periodically synchronize to their union, enabling each fuzzer to acquire the current overall branch coverage status of the program under test. New seeds discovered by every fuzzer will be immediately allocated to the fuzzer currently performing fuzzing on the corresponding sub-task. Additionally, there is also a task scheduling strategy. The sub-corpus corresponding to each sub-task is allocated using the strategy of cyclic scheduling, ensuring the full utilization of the advantages of ensemble fuzzing in all sub-tasks.
- We designed a scalable callgraph-based task division method for ensemble fuzzing. By mapping the target program’s callgraph division to the division of the global seed corpus, the problem of dynamic task division in heterogeneous ensemble fuzzing was solved. The method is scalable and can be extended to more heterogeneous fuzzers.
- We implement a prototype of our method and evaluate its efficiency. In the experiment, four heterogeneous fuzzers in the ensemble fuzzing system including AFLFast, MOPT, QSYM, and radamsa were incorporated. Compared to the collaborative parallel fuzzing of AFLTeam, the developed system achieved up to 24.04% more branch coverage.
II. Background and Related Work
A. ENSEMBLE FUZZING
B. Task Division in Parallel Fuzzing
C. Challenges in Applying Task Division in Ensemble Fuzzing
III. Concepts and Methods
A. Callgraph Division
- 1)
- Callgraph partition: The tested program is executed sequentially using each seed from the current global seed corpus as input. Based on the execution traces of the tested program, the functions (nodes) and function call relationships (edges) that are not present in the initial callgraph of the tested program obtained while compiling are added to the callgraph. Subsequently, duplicate edges, nodes without any basic blocks in their corresponding functions, and those nodes that cannot be accessed from the node corresponding to the "main" function are removed from the callgraph. Finally, the pruned callgraph is transformed into a minimum spanning tree and divided into a significant quantity of subgraphs with tight internal connectivity using the existing tree partitioning algorithm named Lukes' algorithm [30].
- 2)
- Subgraph merging: After obtaining a considerable amount of subgraphs in the callgraph, firstly, each of those subgraphs that have no functions that can be called directly by the "main" function was merged to the subgraph that can call it to ensure that each subtask ultimately partitioned can be reached easily from the "main" function. Then, based on the total weight of all functions contained in each subgraph, they were merged as evenly as possible into a given number of subtasks and marked on the callgraph.
1). Callgraph Partition
- 1)
- Callgraph updating. Before dividing the task, it is necessary to update the initial callgraph (as shown in Figure 4a as an example) obtained during the compilation of the tested program based on the fuzzing results up to now. Specifically, each seed in the current global seed corpus is used as input sequentially to execute the tested program and obtain the execution trace for each execution. Then, the functions (nodes) and function call relationships (edges) existing in the execution traces but not in the initial callgraph are added to the callgraph. Then, for each function (node) in the callgraph, the number of basic blocks, branches, and uncovered branches were attached to the node based on the execution results. A sample of the updated callgraph is shown in Figure 4b.
- 2)
- Callgraph simplification. After updating the callgraph, duplicate edges and nodes containing no basic blocks or untraversable from the "main" function were deleted to trim the callgraph. The simplified callgraph is shown in Figure 4c. Then, the callgraph was transformed into the form of a minimum spanning tree, as illustrated in Figure 4d.
- 3)
- Calculation of nodes' weights. From the perspective of balancing the workload of fuzzing, as shown in Figure 4d, the minimum spanning tree was weighted for each function node to estimate its fuzzing workload. The weight assignment principle is that the more branches a function has, the more complex it is, so the longer time it needs to take for fuzzing; at the same time, functions with more uncovered branches have more exploration potential, so they also require more fuzzing time. Specifically, the weighted sum of the total number of branches contained in the function and the number of uncovered branches was used as this function's weight, as shown in the following formula:
- 4)
- Callgraph partition. After obtaining the weight of each function (node) in the callgraph, the classic algorithm for tree partitioning named Lukes’ algorithm was used to partition the whole callgraph into a large number of divisions with close internal relationships, as shown in Figure 4e.

2). Subgraph Merging

B. Subgraph Association and Seed Mapping

A. Bitmaps & Seeds Synchronization and Task Scheduling
1). Bitmaps Synchronization
2). Seeds Synchronization
3). Task Scheduling
IV. Fuzzing Processes and Framework
A. Fuzzing Processes
1). Exploration Phase
2). Exploitation Phase
B. Fuzzing Framework
V. Evaluation
A. EXPERIMENTAL SETUP
B. Experimental Results and Analysis
VI. Discussing
VII. Conclusions and Prospect
Funding
References
- Manes, V.J.; et al. The Art, Science, and Engineering of Fuzzing: A Survey. IEEE Trans. Softw. Eng. 2019, 47, 2312–2331. [CrossRef]
- Boehme, M.; Cadar, C.; Roychoudhury, A. Fuzzing: Challenges and Reflections. IEEE Softw. 2021, 38, 79–86. [CrossRef]
- Li, J.; Zhao, B.; Zhang, C. Fuzzing: a survey. Cybersecurity 2018, 1, 1–13, . [CrossRef]
- Liang, H.; Pei, X.; Jia, X.; Shen, W.; Zhang, J. Fuzzing: State of the Art. IEEE Trans. Reliab. 2018, 67, 1199–1218.
- M. BohmeV. Pham and A. Roychoudhury, Coverage-based greybox fuzzing as markov chain, Proc. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, 2016, pp. 1032-1043.
- C. Lemieux and K. Sen, Fairfuzz: a targeted mutation strategy for increasing greybox fuzz testing coverage, Proc. Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ACM, 2018, pp. 475-485.
- Fioraldi, D. Maier, H. Eißfeldt and M. Heuse, Afl++: combining incremental steps of fuzzing research, Proc. 14th USENIX Workshop on Offensive Technologies (WOOT 20), 2020.
- Pham, V.-T.; Boehme, M.; Santosa, A.E.; Caciulescu, A.R.; Roychoudhury, A. Smart Greybox Fuzzing. IEEE Trans. Softw. Eng. 2019, 47, 1980-1997.
- Lyu, et al., Mopt: optimized mutation scheduling for fuzzers, Proc. 28th USENIX Security Symposium (USENIX Security 19), 2019, pp. 1949-1966.
- American fuzzy lop. [online]. Available: https://lcamtuf.coredump.cx/afl/. Accessed on: 2023-3-9.
- M. BöhmeV.J. Manès and S.K. Cha, Boosting fuzzer efficiency: an information theoretic perspective, Proc. Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2020, pp. 678-689.
- Aschermann, S. Schumilo, T. Blazytko, R. Gawlik and T. Holz, Redqueen: fuzzing with input-to-state correspondence., Proc. Network and Distributed System Security Symposium(NDSS), 2019, pp. 1-15.
- K. Serebryany, Oss-fuzz-google’s continuous fuzzing service for open source software, Proc. USENIX Security symposium, USENIX Association, 2017.
- Y.L. Chen, et al., Enfuzz: ensemble fuzzing with seed synchronization among diverse fuzzers, Proc. PROCEEDINGS OF THE 28TH USENIX SECURITY SYMPOSIUM, USENIX Association, 2019, pp. 1967-1983.
- J. Liang, Y. Jiang, Y. Chen, M. Wang, C. Zhou and J. Sun, Pafl: extend fuzzing optimizations of single mode to industrial parallel mode, Proc. Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ACM, 2018, pp. 809-814.
- V. Pham, M. Nguyen, Q. Ta, T. Murray and B.I.P. Rubinstein, Towards systematic and dynamic task allocation for collaborative parallel fuzzing, Proc. 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE), IEEE, 2021, pp. 1337-1341.
- M. SuttonA. Greene and P. Amini, "Fuzzing: brute force vulnerability discovery,", San Antonio, USA: Pearson Education, 2007, pp. 1-576.
- M. Böhme, V. Pham, M. Nguyen and A. Roychoudhury, Directed greybox fuzzing, Proc. Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, 2017, pp. 2329-2344.
- Z. Du, Y. Li, Y. Liu and B. Mao, Windranger: a directed greybox fuzzer driven by deviation basic blocks, Proc. Proceedings of the 44th International Conference on Software Engineering, 2022, pp. 2440-2451.
- M. Nguyen, S. Bardin, R. Bonichon, R. Groz and M. Lemerre, Binary-level directed fuzzing for use-after-free vulnerabilities, Proc. RAID, 2020, pp. 47-62.
- V.J. ManèsS. Kim and S.K. Cha, Ankou: guiding grey-box fuzzing towards combinatorial difference, Proc. Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering(ICSE), 2020, pp. 1024-1036.
- Stephens, N.; et al. Driller: Augmenting Fuzzing Through Selective Symbolic Execution. NDSS 2016, 1–16.
- Yun, S. Lee, M. Xu, Y. Jang and T. Kim, Qsym: a practical concolic execution engine tailored for hybrid fuzzing, Proc. 27th USENIX Security Symposium (USENIX Security 18), 2018, pp. 745-761.
- H. Liang, L. Jiang, L. Ai and J. Wei, Sequence directed hybrid fuzzing, Proc. 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER), IEEE, 2020, pp. 127-137.
- P. Chen and H. Chen, Angora: efficient fuzzing by principled search, Proc. 2018 IEEE Symposium on Security and Privacy (S&P), pp. 711-725.
- S. Rawat, V. Jain, A. Kumar, L. Cojocar, C. Giuffrida and H. Bos, Vuzzer: application-aware evolutionary fuzzing, Proc. Symposium on Network and Distributed System Security (NDSS), 2017, pp. 1-14.
- Song, C.; Zhou, X.; Yin, Q.; He, X.; Zhang, H.; Lu, K. P-Fuzz: A Parallel Grey-Box Fuzzing Framework. Appl. Sci. 2019, 9, 5100. [CrossRef]
- Zhou, X.; et al. UltraFuzz: Towards Resource-Saving in Distributed Fuzzing. IEEE Trans. Softw. Eng. 2022.
- Guler, et al., Cupid: automatic fuzzer selection for collaborative fuzzing, Proc. 36TH ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE (ACSAC 2020), Association for Computing Machinery, 2020, pp. 360-372.
- Lukes, J.A. Efficient Algorithm for the Partitioning of Trees. IBM J. Res. Dev. 1974, 18, 217–224.
- Aki helin/radamsa · gitlab. Available online: https://gitlab.com/akihe/radamsa (accessed on 13 May 2023).









| Frameworks | Number of instances | Used fuzzers | Time and memory limit |
|---|---|---|---|
| AFLTeam | 4 | horsefuzz | -t 2000+ -m none |
| OEF | 4 | AFLFast, MOPT, QSYM, and radamsa | -t 2000+ -m none |
| TAEF | 4 | AFLFast, MOPT, QSYM, and radamsa | -t 2000+ -m none |
| Program | Test driver | Input format | Option |
|---|---|---|---|
| libpng | pngimage | PNG | @@ |
| libjpeg-turbo | djpeg | JPEG | @@ |
| jasper | jasper | JP2 | --input @@ --output-format jp2 |
| guetzli | guetzli | JPEG | @@ /dev/null |
| Binutils | readelf | ELF | -agteSdcWw --dyn-syms -D @@ |
| Binutils | nm-new | ELF | -a -C -l --synthetic @@ |
| Target | AFLTeam | OEF | TAEF | TAEF vs. AFLTeam | TAEF vs. OEF | ||
|---|---|---|---|---|---|---|---|
| Improvement | Significance | Improvement | Significance | ||||
| pngimage | 4624 | 4604 | 4636 | 0.26%↑ | ns | 0.70%↑ | * |
| djpeg | 2563 | 2905 | 3015 | 17.64%↑ | **** | 3.79%↑ | ** |
| jasper | 7566 | 7413 | 7901 | 4.43%↑ | * | 6.58%↑ | ** |
| guetzli | 7468 | 7488 | 7570 | 1.37%↑ | ** | 1.10%↑ | ** |
| readelf | 14095 | 15003 | 14895 | 5.68%↑ | *** | -0.72%↓ | ns |
| nm-new | 8115 | 8889 | 10066 | 24.04%↑ | **** | 13.24%↑ | *** |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).