Submitted:
16 January 2024
Posted:
17 January 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
- We are the first to explore the use of a statement-grained hierarchy graph for extracting global hierarchical structural properties. This graph is integrated with the code token sequence to represent code semantics for source code summarization.
- We propose a novel model, SHT, which incorporates the statement-grained hierarchy graph and token sequence to generate code summaries. This model uniquely combines two encoders for sequence and graph learning within the Transformer framework.
- Our approach is evaluated on two source code summarization benchmark datasets against baseline models, surpassing previous works and achieving new state-of-the-art results. Additionally, the ablation study and human evaluation further validate our strategy for code comprehension in code summarization.
2. Background
2.1. Transformer
2.2. Self-attention
2.3. Relation-aware self-attention
3. Problem Formulation
4. Proposed Approach
4.1. Data Processing
4.1.1. Construction of statement-grained hierarchical graph
4.1.2. Construction of token and syntax sequences
4.2. Statement-grained hierarchy Transformer
4.2.1. Relational attention-based hierarchical graph encoder
4.2.2. Sequnce encoder
4.2.3. Decoder with copy machine
5. Experiment
5.1. Experiments Setup
5.1.1. Evaluation datasets
5.2. Metrics
- BLEU calculates the n-gram precision overlap between two texts, serving as an accuracy measure. It indicates the proportion of generated text that corresponds to the reference text.
- METEOR, a recall-oriented metric, reflects the percentage of correctly generated content in comparison to the reference summary.
- ROUGE-L quantifies the longest common subsequence (LCS) between the reference and the generated code summary. It serves as a recall metric and provides additional insights not captured by BLEU scores alone.
5.3. Comparison baselines
- CODE-NN [5]: the first data-driven source code summarization model. It views source code as a sequential text and utilizes LSTM, a sequential model, for generating source code summaries.
- HDeepCom [29]: a neural machine translation (NMT) based code summarization model that converts the AST into a sequence by employing a structure-based traversal (SBT) method. It then feeds this sequence into a sequence-to-sequence model for comment generation.
- ASTattGRU [35]: This is a dual learning framework that jointly trains code summarization and code generation tasks, aiming to enhance both aspects simultaneously.
- NeuralCodeSum [6]: is a transformer-based code summarization model that leverages relative position information among code tokens to enhance the summary generation process.
- CAST [26]: is a hybrid code summarization model that combines a tree-based Recursive Variational Neural Network (RvNN) and a vanilla code token encoder to capture both code structure and sequence. It incorporates a hybrid mechanism in the decoding phase to combine inputs for generating descriptive summaries.
- TPTrans [36] a transformer-based code summarization model that captures the pairwise path information in AST and integrates path encodings into the Transformer for concise summary generation.
5.4. Parameter configurations
5.5. Main Results
5.5.1. Compasisons with baselines
5.5.2. Ablation study
5.5.3. Human evaluation
5.5.4. Qualitative Analysis
6. Related Work
7. Limitation
- Dataset limitations: While numerous public datasets are available for the code summarization task, our model evaluation was confined to only two public Java datasets. Consequently, these datasets may not fully represent other programming languages, potentially limiting the scalability of our model. In future work, we plan to experiment with more large-scale datasets encompassing diverse programming languages. We anticipate that our model could be extended to other languages capable of being parsed into ASTs with minimal adaptation.
- Hyperparameter settings in deep learning: The configuration of dimensions plays a pivotal role in influencing the outcomes of a deep learning model. We conducted a limited-range grid search focusing on learning rate and batch size to optimize our model’s performance. To mitigate the impact of varying hyperparameter settings among baseline models, we compared our performance against the best results reported in previous works for these baselines.
- Biases of human evaluation: We incorporated human evaluation by inviting five participants to assess the quality of 50 code-summary pairs, selected randomly. It is important to acknowledge that the outcomes of human annotations can be influenced by various factors, including the participants’ programming experience and their comprehension of the evaluation criteria. Recognizing this potential for bias, future iterations of our study will seek to involve a larger number of skilled software developers for evaluating an expanded set of code-summary pairs. Additionally, to further enhance the reliability of our findings, each code-summary pair will be reviewed by a minimum of five participants.
8. Conclusion
Author Contributions
Funding
Conflicts of Interest
| 1 | |
| 2 | |
| 3 |
References
- Wan, Y.; Zhao, Z.; Yang, M.; Xu, G.; Ying, H.; Wu, J.; Yu, P.S. Improving automatic source code summarization via deep reinforcement learning. In Proceedings of the Proceedings of the 33rd ACM/IEEE international conference on automated software engineering, 2018, pp. 397–407. [CrossRef]
- Xia, X.; Bao, L.; Lo, D.; Xing, Z.; Hassan, A.E.; Li, S. Measuring program comprehension: A large-scale field study with professionals. IEEE Transactions on Software Engineering 2017, 44, 951–976. [CrossRef]
- Stapleton, S.; Gambhir, Y.; LeClair, A.; Eberhart, Z.; Weimer, W.; Leach, K.; Huang, Y. A human study of comprehension and code summarization. In Proceedings of the Proceedings of the 28th International Conference on Program Comprehension, 2020, pp. 2–13. [CrossRef]
- Liu, S.; Chen, Y.; Xie, X.; Siow, J.; Liu, Y. Retrieval-augmented generation for code summarization via hybrid gnn. arXiv preprint arXiv:2006.05405 2020.
- Iyer, S.; Konstas, I.; Cheung, A.; Zettlemoyer, L. Summarizing source code using a neural attention model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics 2016. Association for Computational Linguistics, 2016, pp. 2073–2083. [CrossRef]
- Ahmad, W.U.; Chakraborty, S.; Ray, B.; Chang, K.W. A transformer-based approach for source code summarization. arXiv preprint arXiv:2005.00653 2020.
- Allamanis, M.; Barr, E.T.; Devanbu, P.; Sutton, C. A survey of machine learning for big code and naturalness. ACM Computing Surveys (CSUR) 2018, 51, 1–37. [CrossRef]
- Shido, Y.; Kobayashi, Y.; Yamamoto, A.; Miyamoto, A.; Matsumura, T. Automatic source code summarization with extended tree-lstm. In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN). IEEE, 2019, pp. 1–8.
- Harer, J.; Reale, C.; Chin, P. Tree-transformer: A transformer-based method for correction of tree-structured data. arXiv preprint arXiv:1908.00449 2019.
- Hu, X.; Li, G.; Xia, X.; Lo, D.; Jin, Z. Deep code comment generation. In Proceedings of the Proceedings of the 26th conference on program comprehension, 2018, pp. 200–210. [CrossRef]
- Alon, U.; Brody, S.; Levy, O.; Yahav, E. code2seq: Generating sequences from structured representations of code. arXiv preprint arXiv:1808.01400 2018.
- Allamanis, M.; Brockschmidt, M.; Khademi, M. Learning to represent programs with graphs. arXiv preprint arXiv:1711.00740 2017.
- Fernandes, P.; Allamanis, M.; Brockschmidt, M. Structured neural summarization. arXiv preprint arXiv:1811.01824 2018.
- Cheng, J.; Fostiropoulos, I.; Boehm, B. GN-Transformer: Fusing Sequence and Graph Representation for Improved Code Summarization. arXiv preprint arXiv:2111.08874 2021.
- Hellendoorn, V.J.; Sutton, C.; Singh, R.; Maniatis, P.; Bieber, D. Global relational models of source code. In Proceedings of the International conference on learning representations, 2019.
- Zügner, D.; Kirschstein, T.; Catasta, M.; Leskovec, J.; Günnemann, S. Language-agnostic representation learning of source code from structure and context. arXiv preprint arXiv:2103.11318 2021.
- Zhang, K.; Li, Z.; Jin, Z.; Li, G. Implant Global and Local Hierarchy Information to Sequence based Code Representation Models. arXiv preprint arXiv:2303.07826 2023. [CrossRef]
- Kitaev, N.; Klein, D. Constituency parsing with a self-attentive encoder. arXiv preprint arXiv:1805.01052 2018. [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Advances in neural information processing systems 2017, 30.
- Song, K.; Wang, K.; Yu, H.; Zhang, Y.; Huang, Z.; Luo, W.; Duan, X.; Zhang, M. Alignment-enhanced transformer for constraining nmt with pre-specified translations. In Proceedings of the Proceedings of the AAAI Conference on Artificial Intelligence, 2020, Vol. 34, pp. 8886–8893. [CrossRef]
- Zhao, X.; Wang, L.; He, R.; Yang, T.; Chang, J.; Wang, R. Multiple knowledge syncretic transformer for natural dialogue generation. In Proceedings of the Proceedings of The Web Conference 2020, 2020, pp. 752–762. [CrossRef]
- Tang, Z.; Shen, X.; Li, C.; Ge, J.; Huang, L.; Zhu, Z.; Luo, B. AST-trans: Code summarization with efficient tree-structured attention. In Proceedings of the Proceedings of the 44th International Conference on Software Engineering, 2022, pp. 150–162. [CrossRef]
- Shaw, P.; Uszkoreit, J.; Vaswani, A. Self-attention with relative position representations. arXiv preprint arXiv:1803.02155 2018. [CrossRef]
- Diao, C.; Loynd, R. Relational attention: Generalizing transformers for graph-structured tasks. arXiv preprint arXiv:2210.05062 2022.
- Chai, L.; Li, M. Pyramid Attention For Source Code Summarization. Advances in Neural Information Processing Systems 2022, 35, 20421–20433.
- Shi, E.; Wang, Y.; Du, L.; Zhang, H.; Han, S.; Zhang, D.; Sun, H. Cast: Enhancing code summarization with hierarchical splitting and reconstruction of abstract syntax trees. arXiv preprint arXiv:2108.12987 2021. [CrossRef]
- Vinyals, O.; Fortunato, M.; Jaitly, N. Pointer networks. Advances in neural information processing systems 2015, 28.
- Hu, X.; Li, G.; Xia, X.; Lo, D.; Lu, S.; Jin, Z. Summarizing source code with transferred api knowledge 2018. [CrossRef]
- Hu, X.; Li, G.; Xia, X.; Lo, D.; Jin, Z. Deep code comment generation with hybrid lexical and syntactical information. Empirical Software Engineering 2020, 25, 2179–2217. [CrossRef]
- Papineni, K.; Roukos, S.; Ward, T.; Zhu, W.J. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the Proceedings of the 40th annual meeting of the Association for Computational Linguistics, 2002, pp. 311–318. [CrossRef]
- Banerjee, S.; Lavie, A. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, 2005, pp. 65–72.
- Lin, C.Y. Rouge: A package for automatic evaluation of summaries. In Proceedings of the Text summarization branches out, 2004, pp. 74–81.
- Liu, Z.; Xia, X.; Treude, C.; Lo, D.; Li, S. Automatic generation of pull request descriptions. In Proceedings of the 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2019, pp. 176–188. [CrossRef]
- Nie, L.Y.; Gao, C.; Zhong, Z.; Lam, W.; Liu, Y.; Xu, Z. Coregen: Contextualized code representation learning for commit message generation. Neurocomputing 2021, 459, 97–107. [CrossRef]
- LeClair, A.; Haque, S.; Wu, L.; McMillan, C. Improved code summarization via a graph neural network. In Proceedings of the Proceedings of the 28th international conference on program comprehension, 2020, pp. 184–195.
- Peng, H.; Li, G.; Wang, W.; Zhao, Y.; Jin, Z. Integrating tree path in transformer for code representation. Advances in Neural Information Processing Systems 2021, 34, 9343–9354.
- Libovickỳ, J.; Helcl, J.; Mareček, D. Input combination strategies for multi-source transformer decoder. arXiv preprint arXiv:1811.04716 2018. [CrossRef]




| Dataset | TL-CodeSum | EMSE-Deepcom |
|---|---|---|
| #train | 66,928 | 283,741 |
| #validition | 8,366 | 12,226 |
| #test | 8,366 | 12,226 |
| #/Avg.#tokens in code | 20,162/120.10 | 47,939/93.93 |
| #/Avg.#tokens in summary | 25,619/17.76 | 26,145/11.24 |
| Approaches | Input | Backbone | TL-CodeSum | EMSE-Deepcom | ||||
|---|---|---|---|---|---|---|---|---|
| BLEU | ROUGE-L | METEOR | BLEU | ROUGE-L | METEOR | |||
| CodeNN [5] | Code | LSTM | 22.22 | 33.14 | 14.08 | 28.45 | 43.51 | 17.89 |
| HDeepcom [29] | AST | GRU | 23.32 | 33.94 | 13.76 | 32.19 | 49.03 | 21.53 |
| ASTattGRU [35] | AST | GRU | 30.78 | 39.94 | 17.35 | 33.40 | 49.76 | 22.20 |
| NeuralCodeSum [6] | Code | Transformer | 40.63 | 52.00 | 24.85 | 37.13 | 54.87 | 25.05 |
| CAST [26] | AST | Transformer | 43.76 | 54.09 | 27.15 | 37.19 | 54.87 | 25.07 |
| TPTrans [36] | AST | Transformer | 44.50 | 55.08 | 27.88 | 37.25 | 54.99 | 25.02 |
| SHT | AST | Transformer | 45.76 | 55.70 | 28.22 | 38.85 | 56.02 | 25.90 |
| Approaches | BLEU | ROUGE | METEOR |
|---|---|---|---|
| SHT | 45.76 | 55.70 | 28.22 |
| w/o token | 42.33 | 53.27 | 26.89 |
| w/o syntax | 45.38 | 55.03 | 27.63 |
| token-grained hierarchy | 45.08 | 55.17 | 27.90 |
| concatenation | 45.42 | 55.36 | 28.06 |
| Approaches | Similarity | Conciseness | Readability |
|---|---|---|---|
| SHT | 3.52 | 3.17 | 3.34 |
| NeuralCodeSum | 2.51 | 2.72 | 2.34 |
| CAST | 3.11 | 2.90 | 2.97 |
| TPTrans | 3.20 | 3.17 | 3.34 |
![]() |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
