Submitted:
07 February 2024
Posted:
07 February 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
- Is there a significant difference in the productivity of each sentiment group (positive, neutral, and negative)?
- Which groups are most conducive to community activation?
2. Theoretical Background and Review of Previous Studies
2.1. Collaboration Process in OSS

2.2. Productivity in Software
2.3. Collaborators’ Attitudes and Productivity
3. Research Methodology
- The Number of PRs merged out of all the PRs created by an individual;
- The total LoCs created by an individual, counting only those from merged PRs;
- LoCs per 1 merged PR (LoCs/PR);
- The number of comments made by individuals regarding PR. To gauge the community’s activity level, we collected data on the number of comments for the PRs of all statuses, not limited to those that were merged.
3.1. Selection of Target Repositories
- Exclude things that are not apparently software projects;
- Classify repositories that rank high in both stars and forks counts as high-ranking. For instance, if “react” is ranked 10th in star and 22nd in fork, it is categorized as star;
- Exclude repositories that are marked as “archived;”
- Exclude repositories that did not adhere to the fork-and-pull model [12].
3.2. Collection of PR Number List

3.3. Collection of PR Detailed Information



3.4. Database Loading
3.5. Sentiment Analysis Execution
3.6. Data Processing
3.7. Conducting Statistical Analysis
4. Results
| Category | Kruskal–Wallis H Test | Mann–Whitney U Test | ||||||
| Positive-Neutral | Neutral-Negative | Positive-Negative | ||||||
| Statistic | p-value | Statistic | p-value | Statistic | p-value | Statistic | p-value | |
| No. of PRs | 1586.55 | <0.001 | 60981076.0 | <0.001 | 13562406.0 | <0.001 | 26904913.0 | <0.001 |
| Total LoCs | 1790.90 | <0.001 | 63403497.0 | <0.001 | 13324718.0 | <0.001 | 25763927.5 | <0.001 |
| LoCs/PR | 1172.00 | <0.001 | 60285774.5 | <0.001 | 14415289.5 | <0.001 | 26014063.0 | <0.001 |
| Comments | 3971.18 | <0.001 | 70109454.5 | <0.001 | 9059851.5 | <0.001 | 27658384.5 | <0.001 |
| Category | Kruskal–Wallis H Test | Mann–Whitney U Test | ||||||
| Positive-Neutral | Neutral-Negative | Positive-Negative | ||||||
| Statistic | p-value | Statistic | p-value | Statistic | p-value | Statistic | p-value | |
| No. of PRs | 394.82 | <0.001 | 6973654.0 | <0.001 | 1548296.5 | <0.001 | 2894748.0 | <0.01 |
| Total LoCs | 460.53 | <0.001 | 7270662.5 | <0.001 | 1501109.0 | <0.001 | 2789273.5 | <0.001 |
| LoCs/PR | 304.90 | <0.001 | 6954815.0 | <0.001 | 1596388.0 | <0.001 | 2811140.5 | <0.001 |
| Comments | 1300.30 | <0.001 | 8260316.5 | <0.001 | 995769.0 | <0.001 | 2969978.5 | 0.035 * |
| Category | Kruskal–Wallis H Test | Mann–Whitney U Test | ||||||
| Positive-Neutral | Neutral-Negative | Positive-Negative | ||||||
| Statistic | p-value | Statistic | p-value | Statistic | p-value | Statistic | p-value | |
| No. of PRs | 1211.81 | <0.001 | 26721644.0 | <0.001 | 5952711.0 | <0.001 | 12032923.5 | <0.001 |
| Total LoCs | 1345.71 | <0.001 | 27723557.0 | <0.001 | 5891048.0 | <0.001 | 11501190.5 | <0.001 |
| LoCs/PR | 873.12 | <0.001 | 26267469.5 | <0.001 | 6420072.5 | <0.001 | 11654302.5 | <0.001 |
| Comments | 2713.92 | <0.001 | 30344676.5 | <0.001 | 4051938.5 | <0.001 | 12328809.5 | <0.001 |
| Category | Spearman | Kendall’s Tau | |||
| Correlation Coefficient | p-value | Correlation Coefficient | p-value | ||
| ]3*Positive | No. of PRs vs. Total LoCs |
0.6944 | <0.001 | 0.5549 | <0.001 |
| No. of PRs vs. Comments |
0.6793 | <0.001 | 0.5541 | <0.001 | |
| Total LoCs vs. Comments |
0.6025 | <0.001 | 0.4465 | <0.001 | |
| Neutral | No. of PRs vs. Total LoCs |
0.4585 | <0.001 | 0.3722 | <0.001 |
| No. of PRs vs. Comments |
0.3626 | <0.001 | 0.3156 | <0.001 | |
| Total LoCs vs. Comments |
0.3427 | <0.001 | 0.2614 | <0.001 | |
| Negative | No. of PRs vs. Total LoCs |
0.6471 | <0.001 | 0.5157 | <0.001 |
| No. of PRs vs. Comments |
0.5908 | <0.001 | 0.4779 | <0.001 | |
| Total LoCs vs. Comments |
0.5294 | <0.001 | 0.3875 | <0.001 | |
| Category | Spearman | Kendall’s Tau | |||
| Correlation Coefficient | p-value | Correlation Coefficient | p-value | ||
| Positive | No. of PRs vs. Total LoCs |
0.6610 | <0.001 | 0.5335 | <0.001 |
| No. of PRs vs. Comments |
0.6076 | <0.001 | 0.4992 | <0.001 | |
| Total LoCs vs. Comments |
0.5687 | <0.001 | 0.4247 | <0.001 | |
| Neutral | No. of PRs vs. Total LoCs |
0.4259 | <0.001 | 0.3458 | <0.001 |
| No. of PRs vs. Comments |
0.3176 | <0.001 | 0.2808 | <0.001 | |
| Total LoCs vs. Comments |
0.3066 | <0.001 | 0.2360 | <0.001 | |
| Negative | No. of PRs vs. Total LoCs |
0.6187 | <0.001 | 0.4999 | <0.001 |
| No. of PRs vs. Comments |
0.5344 | <0.001 | 0.4388 | <0.001 | |
| Total LoCs vs. Comments |
0.5190 | <0.001 | 0.3841 | <0.001 | |
| Category | Spearman | Kendall’s Tau | |||
| Correlation Coefficient | p-value | Correlation Coefficient | p-value | ||
| Positive | No. of PRs vs. Total LoCs |
0.7077 | <0.001 | 0.5623 | <0.001 |
| No. of PRs vs. Comments |
0.7113 | <0.001 | 0.5799 | <0.001 | |
| Total LoCs vs. Comments |
0.6159 | <0.001 | 0.4550 | <0.001 | |
| Neutral | No. of PRs vs. Total LoCs |
0.4740 | <0.001 | 0.3847 | <0.001 |
| No. of PRs vs. Comments |
0.3840 | <0.001 | 0.3319 | <0.001 | |
| Total LoCs vs. Comments |
0.3617 | <0.001 | 0.2752 | <0.001 | |
| Negative | No. of PRs vs. Total LoCs |
0.6552 | <0.001 | 0.5192 | <0.001 |
| No. of PRs vs. Comments |
0.6097 | <0.001 | 0.4912 | <0.001 | |
| Total LoCs vs. Comments |
0.5292 | <0.001 | 0.3864 | <0.001 | |
5. Discussion
6. Conclusion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Fuggetta, A. Open Source Software––an Evaluation. Journal of Systems and Software 2003, 66, 77–90. [Google Scholar] [CrossRef]
- Paulson, J.; Succi, G.; Eberlein, A. An Empirical Study of Open-Source and Closed-Source Software Products. IEEE Transactions on Software Engineering 2004, 30, 246–256. [Google Scholar] [CrossRef]
- Dalle, J.M.; Jullien, N. Windows vs. Linux: Some Explorations into the Economics of Free Software. Advances in Complex Systems 2000, 03, 399–416. [Google Scholar] [CrossRef]
- West, J.; Gallagher, S. Challenges of Open Innovation: The Paradox of Firm Investment in Open-Source Software. R&D Management 2006, 36, 319–331. [Google Scholar] [CrossRef]
- Guterres, A. Roadmap for Digital Cooperation; United Nations, 2020.
- World Benchmarking Alliance. Digital Inclusion Benchmark 2023 Insights Report; World Benchmarking Alliance, 2023.
- World Benchmarking Alliance. Digital Inclusion Benchmark 2021 Scoring guidelines; World Benchmarking Alliance, 2021.
- Jones, G.R.; George, J.M. The Experience and Evolution of Trust: Implications for Cooperation and Teamwork. The Academy of Management Review 1998, 23, 531–259293. [Google Scholar] [CrossRef]
- Ortu, M.; Adams, B.; Destefanis, G.; Tourani, P.; Marchesi, M.; Tonelli, R. Are Bullies More Productive? Empirical Study of Affectiveness vs. Issue Fixing Time. 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories. IEEE, 2015, pp. 303–313. [CrossRef]
- Carige Junior, R.; Carneiro, G. Impact of Developers Sentiments on Practices and Artifacts in Open Source Software Projects: A Systematic Literature Review:. Proceedings of the 22nd International Conference on Enterprise Information Systems. SCITEPRESS - Science and Technology Publications, 2020, pp. 31–42. [CrossRef]
- Ferreira, I.; Stewart, K.; German, D.; Adams, B. A Longitudinal Study on the Maintainers’ Sentiment of a Large Scale Open Source Ecosystem. 2019 IEEE/ACM 4th International Workshop on Emotion Awareness in Software Engineering (SEmotion). IEEE, 2019, pp. 17–22. [CrossRef]
- Padhye, R.; Mani, S.; Sinha, V.S. A Study of External Community Contribution to Open-Source Projects on GitHub. Proceedings of the 11th Working Conference on Mining Software Repositories. ACM, 2014, pp. 332–335. [CrossRef]
- Soares, D.M.; De Lima Júnior, M.L.; Murta, L.; Plastino, A. Acceptance Factors of Pull Requests in Open-Source Projects. Proceedings of the 30th Annual ACM Symposium on Applied Computing. ACM, 2015, pp. 1541–1546. [CrossRef]
- Gousios, G.; Pinzger, M.; Deursen, A.V. An Exploratory Study of the Pull-Based Software Development Model. Proceedings of the 36th International Conference on Software Engineering. ACM, 2014, pp. 345–355. [CrossRef]
- Guo, Y.; Leitner, P. Studying the Impact of CI on Pull Request Delivery Time in Open Source Projects—a Conceptual Replication. PeerJ Computer Science 2019, 5, e245. [Google Scholar] [CrossRef] [PubMed]
- LINE. Enable to receive compressed request from client by joonhaeng · Pull Request #3087 · line/armeria. https://github.com/line/armeria/pull/3087, accessed on 2024-01-21.
- Meyer, A.N.; Fritz, T.; Murphy, G.C.; Zimmermann, T. Software Developers’ Perceptions of Productivity. Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 2014, pp. 19–29. [CrossRef]
- Zhou, M.; Mockus, A. Developer Fluency: Achieving True Mastery in Software Projects. Proceedings of the Eighteenth ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 2010, pp. 137–146. [CrossRef]
- Kieburtz, R.; McKinney, L.; Bell, J.; Hook, J.; Kotov, A.; Lewis, J.; Oliva, D.; Sheard, T.; Smith, I.; Walton, L. A Software Engineering Experiment in Software Component Generation. Proceedings of IEEE 18th International Conference on Software Engineering, 1996, pp. 542–552. [CrossRef]
- Devanbu, P.; Karstu, S.; Melo, W.; Thomas, W. Analytical and Empirical Evaluation of Software Reuse Metrics. Proceedings of IEEE 18th International Conference on Software Engineering. IEEE Comput. Soc. Press, 1996, pp. 189–199. [CrossRef]
- Blackburn, J.; Scudder, G.; Van Wassenhove, L. Improving Speed and Productivity of Software Development: A Global Survey of Software Developers. IEEE Transactions on Software Engineering 1996, 22, 875–885. [Google Scholar] [CrossRef]
- Delorey, D.P.; Knutson, C.D.; Chun, S. Do Programming Languages Affect Productivity? A Case Study Using Data from Open Source Projects. First International Workshop on Emerging Trends in FLOSS Research and Development (FLOSS’07: ICSE Workshops 2007). IEEE, 2007, pp. 8–8. [CrossRef]
- Symons, C. Function Point Analysis: Difficulties and Improvements. IEEE Transactions on Software Engineering 1988, 14, 2–11. [Google Scholar] [CrossRef]
- Jiang, Q.; Lee, Y.C.; Davis, J.G.; Zomaya, A.Y. Diversity, Productivity, and Growth of Open Source Developer Communities. 2018, arXiv:cs.SE/1809.03725. [Google Scholar]
- Guzman, E.; Azócar, D.; Li, Y. Sentiment Analysis of Commit Comments in GitHub: An Empirical Study. Proceedings of the 11th Working Conference on Mining Software Repositories. ACM, 2014, pp. 352–355. [CrossRef]
- Asri, I.E.; Kerzazi, N.; Uddin, G.; Khomh, F.; Janati Idrissi, M. An Empirical Study of Sentiments in Code Reviews. Information and Software Technology 2019, 114, 37–54. [Google Scholar] [CrossRef]
- Huq, S.F.; Sadiq, A.Z.; Sakib, K. Is Developer Sentiment Related to Software Bugs: An Exploratory Study on GitHub Commits. 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 2020, pp. 527–531. [CrossRef]
- Li, L.; Cao, J.; Lo, D. Sentiment Analysis over Collaborative Relationships in Open Source Software Projects. Proceedings of the International Conference on Software Engineering and Knowledge Engineering, 2020.
- Steinmacher, I.; Conte, T.; Gerosa, M.A.; Redmiles, D. Social Barriers Faced by Newcomers Placing Their First Contribution in Open Source Software Projects. Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. ACM, 2015, pp. 1379–1392. [CrossRef]
- Licorish, S.A.; MacDonell, S.G. Exploring the Links between Software Development Task Type, Team Attitudes and Task Completion Performance: Insights from the Jazz Repository. Information and Software Technology 2018, 97, 10–25. [Google Scholar] [CrossRef]
- Wagner, S.; Ruhe, M. A Systematic Review of Productivity Factors in Software Development. 2nd International Workshop on Software Productivity Analysis and Cost Estimation (SPACE 2008), 2018, [1801.06475].
- Meyer, A.N.; Zimmermann, T.; Fritz, T. Characterizing Software Developers by Perceptions of Productivity. 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), 2017, pp. 105–110. [CrossRef]
- Murphy-Hill, E.; Jaspan, C.; Sadowski, C.; Shepherd, D.; Phillips, M.; Winter, C.; Knight, A.; Smith, E.; Jorde, M. What Predicts Software Developers’ Productivity? IEEE Transactions on Software Engineering 2021, 47, 582–594. [Google Scholar] [CrossRef]
- Satratzemi, M.; Xinogalos, S.; Tsompanoudi, D.; Karamitopoulos, L. Examining Student Performance and Attitudes on Distributed Pair Programming. Scientific Programming, 2018, 1–8. [CrossRef]
- Kalliamvakou, E.; Gousios, G.; Blincoe, K.; Singer, L.; German, D.M.; Damian, D. The Promises and Perils of Mining GitHub. Proceedings of the 11th Working Conference on Mining Software Repositories. ACM, 2014, pp. 92–101. [CrossRef]
- Kalliamvakou, E.; Gousios, G.; Blincoe, K.; Singer, L.; German, D.M.; Damian, D. An In-Depth Study of the Promises and Perils of Mining GitHub. Empirical Software Engineering 2016, 21, 2035–2071. [Google Scholar] [CrossRef]
- Du, K.; Yang, H.; Zhang, Y.; Duan, H.; Wang, H.; Hao, S.; Li, Z.; Yang, M. Understanding Promotion-as-a-Service on GitHub. Annual Computer Security Applications Conference. ACM, 2020, pp. 597–610. [CrossRef]
- Borges, H.; Hora, A.; Valente, M.T. Understanding the Factors That Impact the Popularity of GitHub Repositories. 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME), 2016, pp. 334–344, [1606.04984]. [CrossRef]
- Jiang, J.; Lo, D.; He, J.; Xia, X.; Kochhar, P.S.; Zhang, L. Why and How Developers Fork What from Whom in GitHub. Empirical Software Engineering 2017, 22, 547–578. [Google Scholar] [CrossRef]
- Jemerov, D.; Isakova, S. Kotlin in action; Simon and Schuster, 2017.
- Islam, M.R.; Zibran, M.F. SentiStrength-SE: Exploiting Domain Specificity for Improved Sentiment Analysis in Software Engineering Text. Journal of Systems and Software 2018, 145, 125–146. [Google Scholar] [CrossRef]
- Thelwall, M.; Buckley, K.; Paltoglou, G. Sentiment Strength Detection for the Social Web. Journal of the American Society for Information Science and Technology 2012, 63, 163–173. [Google Scholar] [CrossRef]
- Obaidi, M.; Nagel, L.; Specht, A.; Klünder, J. Sentiment Analysis Tools in Software Engineering: A Systematic Mapping Study. Information and Software Technology 2022, 151, 107018. [Google Scholar] [CrossRef]
- Kruskal, W.H.; Wallis, W.A. Use of ranks in one-criterion variance analysis. Journal of the American statistical Association 1952, 47, 583–621. [Google Scholar] [CrossRef]
- Mann, H.B.; Whitney, D.R. On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics 1947, pp. 50–60.
- Myers, J.; Well, A.; Lorch, R. Research Design and Statistical Analysis: Third Edition; Taylor & Francis, 2013.
- Kendall, M.G. A New Measure of Rank Correlation. Biometrika 1938, 30, 81–93. [Google Scholar] [CrossRef]
- Python Software Foundation. Python 3.11.7 documentation. https://docs.python.org/3.11/, accessed on 2024-01-21.
- Wes McKinney. Data Structures for Statistical Computing in Python. Proceedings of the 9th Python in Science Conference; Stéfan van der Walt.; Jarrod Millman., Eds., 2010, pp. 56 – 61. [CrossRef]
- Virtanen, P.; Gommers, R.; Oliphant, T.E.; Haberland, M.; Reddy, T.; Cournapeau, D.; Burovski, E.; Peterson, P.; Weckesser, W.; Bright, J.; van der Walt, S.J.; Brett, M.; Wilson, J.; Millman, K.J.; Mayorov, N.; Nelson, A.R.J.; Jones, E.; Kern, R.; Larson, E.; Carey, C.J.; Polat, İ.; Feng, Y.; Moore, E.W.; VanderPlas, J.; Laxalde, D.; Perktold, J.; Cimrman, R.; Henriksen, I.; Quintero, E.A.; Harris, C.R.; Archibald, A.M.; Ribeiro, A.H.; Pedregosa, F.; van Mulbregt, P.; SciPy 1.0 Contributors. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods 2020, 17, 261–272. [Google Scholar] [CrossRef]
- Dunn, O.J. Multiple Comparisons Among Means. Journal of the American Statistical Association 1961, 56, 52–64. [Google Scholar] [CrossRef]
- Dunnett, C.W. A Multiple Comparison Procedure for Comparing Several Treatments with a Control. Journal of the American Statistical Association 1955, 50, 1096–1121. [Google Scholar] [CrossRef]
- Rovai, A.; Baker, J.; Ponton, M. Social Science Research Design and Statistics: A Practitioner’s Guide to Research Methods and IBM SPSS; Watertree Press, 2013; p. 375.
- Pearce, J.M. The Case for Open Source Appropriate Technology. Environment, Development and Sustainability 2012, 14, 425–431. [Google Scholar] [CrossRef]
- Hoe, N.S. Breaking Barriers: The Potential of Free and Open Source Software for Sustainable Human Development; United Nations Development Programme, 2007.


| Star Top 10 | Fork Top 10 | ||
|---|---|---|---|
| Repository Name | Data Collection Date | Repository Name | Data Collection Date |
| react | 2023-11-05 | tensorflow | 2023-10-31 |
| ohmyzsh | 2023-11-05 | bootstrap | 2023-10-25 |
| flutter | 2023-11-04 | opencv | 2023-10-30 |
| vscode | 2023-11-08 | kubernetes | 2023-10-29 |
| AutoGPT | 2023-11-02 | bitcoin | 2023-10-24 |
| transformers | 2023-11-07 | three.js | 2023-11-01 |
| next.js | 2023-11-04 | qmk_firmware | 2023-10-30 |
| react-native | 2023-11-06 | material-ui | 2023-10-29 |
| electron | 2023-11-02 | django | 2023-10-27 |
| stable-diffusion-webui | 2023-11-07 | cpython | 2023-10-26 |
| Category | Sample Size | Median | Mean | |
|---|---|---|---|---|
| total | 24607 | - | - | |
| Positive | No. of PRs | 12262 | 2.0 | 17.37 |
| Total LoCs | 86.0 | 9865.64 | ||
| LoCs/PR | 39.12 | 332.31 | ||
| Comments | 6.0 | 91.65 | ||
| Neutral | No. of PRs | 7647 | 1.0 | 3.22 |
| Total LoCs | 17.0 | 620.69 | ||
| LoCs/PR | 12.0 | 135.34 | ||
| Comments | 2.0 | 6.45 | ||
| Negative | No. of PRs | 4698 | 2.0 | 10.04 |
| Total LoCs | 51.0 | 7661.10 | ||
| LoCs/PR | 28.0 | 262.59 | ||
| Comments | 5.0 | 46.23 | ||
| Category | Sample Size | Median | Mean | |
|---|---|---|---|---|
| total | 8321 | - | - | |
| Positive | No. of PRs | 4232 | 1.0 | 16.65 |
| Total LoCs | 62.5 | 5832.69 | ||
| LoCs/PR | 34.4 | 201.42 | ||
| Comments | 4.0 | 55.96 | ||
| Neutral | No. of PRs | 2632 | 1.0 | 5.14 |
| Total LoCs | 17.0 | 336.69 | ||
| LoCs/PR | 13.0 | 81.47 | ||
| Comments | 1.0 | 8.13 | ||
| Negative | No. of PRs | 1457 | 1.0 | 11.85 |
| Total LoCs | 42.0 | 3372.37 | ||
| LoCs/PR | 25.0 | 151.09 | ||
| Comments | 4.0 | 30.18 | ||
| Category | Sample Size | Median | Mean | |
|---|---|---|---|---|
| total | 16286 | - | - | |
| Positive | No. of PRs | 8030 | 2.0 | 17.75 |
| Total LoCs | 104.0 | 11991.10 | ||
| LoCs/PR | 43.05 | 401.29 | ||
| Comments | 7.0 | 110.47 | ||
| Neutral | No. of PRs | 5015 | 1.0 | 2.21 |
| Total LoCs | 17.0 | 769.74 | ||
| LoCs/PR | 12.0 | 163.62 | ||
| Comments | 2.0 | 5.58 | ||
| Negative | No. of PRs | 3241 | 2.0 | 9.22 |
| Total LoCs | 58.0 | 9589.11 | ||
| LoCs/PR | 29.0 | 312.71 | ||
| Comments | 6.0 | 53.45 | ||
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).