Preprint Article Version 2 Preserved in Portico This version is not peer-reviewed

DFSGraph: Data Flow Semantic Model for Intermediate Representation Programs Based on Graph Network

Version 1 : Received: 5 September 2022 / Approved: 5 September 2022 / Online: 5 September 2022 (07:10:33 CEST)
Version 2 : Received: 23 September 2022 / Approved: 23 September 2022 / Online: 23 September 2022 (08:24:19 CEST)

How to cite: Tang, K.; Shan, Z.; Zhang, C.; Xu, L.; Qiao, M.; Liu, F. DFSGraph: Data Flow Semantic Model for Intermediate Representation Programs Based on Graph Network. Preprints 2022, 2022090046 (doi: 10.20944/preprints202209.0046.v2). Tang, K.; Shan, Z.; Zhang, C.; Xu, L.; Qiao, M.; Liu, F. DFSGraph: Data Flow Semantic Model for Intermediate Representation Programs Based on Graph Network. Preprints 2022, 2022090046 (doi: 10.20944/preprints202209.0046.v2).

Abstract

With the improvement of software copyright protection awareness, code obfuscation technology plays a crucial role in protecting key code segments. As the obfuscation technology becomes more and more complex and diverse, it has spawned a large number of malware variants, which make it easy to evade the detection of anti-virus software. Malicious code detection mainly depends on binary code similarity analysis. However, the existing software analysis technologies are difficult to deal with the growing complex obfuscation technologies. To solve this problem, this paper proposes a new obfuscation-resilient program analysis method, which is based on the data flow transformation relationship of the intermediate representation and the graph network model. In our approach, we first construct the data transformation graph based on LLVM IR. Then, we design a novel intermediate language representation model based on graph networks, named DFSGraph, to learn the data flow semantics from DTG. DFSGraph can detect the similarity of obfuscated code by extracting the semantic information of program data flow without deobfuscation. Extensive experiments prove that our approach is more accurate than existing deobfuscation tools when searching for similar functions from obfuscated code.

Keywords

Obfuscation; Deobfuscation; LLVM IR; Graph Network

Subject

MATHEMATICS & COMPUTER SCIENCE, Artificial Intelligence & Robotics

Comments (1)

Comment 1
Received: 23 September 2022
Commenter: Ke Tang
Commenter's Conflict of Interests: Author
Comment: We have revised the abstract, introduction, and conclusion of our manuscript, expanded the method description and comparative experiments.
+ Respond to this comment

We encourage comments and feedback from a broad range of readers. See criteria for comments and our diversity statement.

Leave a public comment
Send a private comment to the author(s)
Views 0
Downloads 0
Comments 1
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.

We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.