Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Structural Syntax Network for Code Classification

Version 1 : Received: 11 December 2023 / Approved: 12 December 2023 / Online: 12 December 2023 (05:35:17 CET)

How to cite: Shah, M.; Patel, R.; Terry, A. Structural Syntax Network for Code Classification. Preprints 2023, 2023120805. https://doi.org/10.20944/preprints202312.0805.v1 Shah, M.; Patel, R.; Terry, A. Structural Syntax Network for Code Classification. Preprints 2023, 2023120805. https://doi.org/10.20944/preprints202312.0805.v1

Abstract

The field of program classification, a critical aspect of software engineering, facilitates understanding and categorization of code across various applications, including anomaly detection, and code quality assessment. The advancement of cross-language program classification opens up avenues for efficient code translation among different programming languages, significantly aiding developers in rapid coding and reducing development cycles. Existing research primarily focuses on semantic code analysis, with limited emphasis on cross-linguistic challenges. To address this, we introduce an innovative neural network model, CodeSemanticsNN, which leverages a refined Syntax Structure (SS) approach. This model consists of two integral mechanisms: first, it adopts a novel SS representation that combines sequential and graph-based SS structures, enhancing semantic feature capture. Second, it employs a 'unified vocabulary' strategy to bridge the gap between diverse programming languages, facilitating efficient cross-language classification. Additionally, we have compiled a comprehensive dataset of 20,000 files spanning five programming languages, serving as a benchmark for cross-language classification. Our experiments on this dataset demonstrate that CodeSemanticsNN surpasses existing models in four key metrics: Precision, Recall, F1-score, and Accuracy.

Keywords

code analysis; syntax-based classification; multi-language code parsing; cross-linguistic code analysis

Subject

Computer Science and Mathematics, Computer Networks and Communications

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.