Submitted:
08 September 2025
Posted:
11 September 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Methods
3. Statistical Analysis
3.1. Main Characteristics
3.2. Journal Publication and Author Group
3.3. Vignettes and Journal Publication
3.4. Reverse Imports/Suggests/Enhances and Journal Publication
3.5. Reverse Imports/Suggests/Enhances and Updates
3.6. Continent of Creator and Year of Creation
3.7. Year of Creation and Updates
3.8. Continent of Creator and Bayesian Analysis
3.9. Dataset Availability and Journal Publication
3.10. Growth of Packages
3.11. Text Analysis of Descriptions
4. Conclusions
References
- Kopczewska, K. Applied spatial statistics and econometrics: data analysis in R; Routledge, 2020.
- Frisch, R. Editorial. Econometrica 1933, 1, 1–4. [Google Scholar] [CrossRef]
- Govindasamy, P.; Isa, N.J.M.; Mohamed, N.F.; Noor, A.M.; Ma, L.; Olmos, A.; Green, K. A systematic review of exploratory factor analysis packages in R software. Wiley Interdisciplinary Reviews: Computational Statistics 2024, 16, e1630. [Google Scholar] [CrossRef]
- Casals, M.; Fernández, J.; Martínez, V.; Lopez, M.; Langohr, K.; Cortés, J. A systematic review of sport-related packages within the R CRAN repository. International Journal of Sports Science & Coaching 2023, 18, 621–629. [Google Scholar]
- Gentzkow, M.; Shapiro, J.M. Code and Data for the Social Sciences: A Practitioner’s Guide. Journal of Economic Perspectives 2014, 28, 191–206. [Google Scholar] [CrossRef]
- Marwick, B.; Boettiger, C.; Mullen, L. Packaging Data Analytical Work Reproducibly Using R (and Friends). The American Statistician 2018, 72, 80–88. [Google Scholar] [CrossRef]
- Bartoń, K. Why You Should Write a Vignette: Documentation Practices in R. The R Journal 2023. [Google Scholar]
- Stodden, V. Reproducible Research: Addressing the Need for Data and Code Sharing in Computational Science. Computing in Science & Engineering 2010, 12, 8–12. [Google Scholar]
- Gentzkow, M.; Shapiro, J.M. Code and Data for the Social Sciences: A Practitioner’s Guide. Technical report, University of Chicago, 2014.
- Peng, R.D. Reproducible Research in Computational Science. Science 2011, 334, 1226–1227. [Google Scholar] [CrossRef] [PubMed]
- Eghbal, N. Roads and Bridges: The Unseen Labor Behind Our Digital Infrastructure. Ford Foundation, 2016.
- Decan, A.; Mens, T.; Claes, M. Package Dependencies and Upstream Health in Open Source Software. Empirical Software Engineering 2019, 24, 881–919. [Google Scholar]
- Bartoń, K. Package ’MuMIn’: Multi-Model Inference, 2023. R package version 1.47.1.
- Wilson, G.; Aruliah, D.A.; Brown, C.T.; Hong, N.P.C.; Davis, M.; Guy, R.T.; Haddock, S.H.D.; Huff, K.D.; Mitchell, I.M.; Plumbley, M.D.; et al. Best Practices for Scientific Computing. PLoS Biology 2014, 12, e1001745. [Google Scholar] [CrossRef] [PubMed]
- Hinsen, K. Dealing with software collapse in computational science: The need for software engineering education and training. Computing in Science & Engineering 2019, 21, 104–109. [Google Scholar] [CrossRef]
- Baker, M. 1,500 scientists lift the lid on reproducibility. Nature 2016, 533, 452–454. [Google Scholar] [CrossRef]
- Katz, D.S.; Hong, N.P.C.; Howison, J.; Löffler, F.; Hwang, L.; Crick, T.; Turk, M. Recognizing the Value of Software: A Software Citation Guide. F1000Research 2021, 9, 1257. [Google Scholar] [CrossRef]
- Koop, G. Bayesian Econometrics; Wiley, 2003.
- Rossi, P.E.; Allenby, G.M.; McCulloch, R. Bayesian Statistics and Marketing; John Wiley & Sons, 2005.
- Leamer, E.E. Let’s Take the Con out of Econometrics. The American Economic Review 1983, 73, 31–43. [Google Scholar]
- Koenker, R.; Zeileis, A. On Quantile Regression in Econometrics. Journal of Statistical Software 2009, 27, 1–5. [Google Scholar]
- Tenopir, C.; Allard, S.; Douglass, K.; Aydinoglu, A.; Wu, L.; Read, E.; Manoff, M.; Frame, M. Data Sharing by Scientists: Practices and Perceptions. PLoS ONE 2011, 6, e21101. [Google Scholar] [CrossRef] [PubMed]
- Wagner, C.S.; Leydesdorff, L. Network structure, self-organization, and the growth of international collaboration in science. Research Policy 2005, 34, 1608–1618. [Google Scholar] [CrossRef]
- Chen, K.; Zhang, Y.; Fu, X. International research collaboration: An emerging domain of innovation studies? Research Policy 2019, 48, 149–168. [Google Scholar] [CrossRef]
- Zeileis, A.; Kleiber, C.; Jackman, S. Regression models for count data in R. Journal of Statistical Software 2008, 27, 1–25. [Google Scholar] [CrossRef]
- Croissant, Y.; Millo, G. Panel data econometrics in R: The plm package. Journal of Statistical Software 2008, 27, 1–43. [Google Scholar] [CrossRef]
- Stallman, R.M. Free Software, Free Society: Selected Essays of Richard M. Stallman; GNU Press: Boston, MA, 2002. [Google Scholar]
- Von Hippel, E. Democratizing Innovation; MIT Press: Cambridge, MA, 2005. [Google Scholar]
- Schmid, H. Improvements in Part-of-Speech Tagging with an Application to German. In Natural Language Processing Using Very Large Corpora; Springer Netherlands: Dordrecht, 1999; pp. 13–25. [Google Scholar] [CrossRef]
- Schmid, H. Probabilistic Part-of-Speech Tagging Using Decision Trees. In Proceedings of the Proceedings of the International Conference on New Methods in Language Processing, 1994, Vol. 12, pp. 1–9.
- Michalke, M. koRpus: Text Analysis with Emphasis on POS Tagging, Readability, and Lexical Diversity, 2021. R package version 0.13-8. [CrossRef]
- Fellows, I. wordcloud: Word Clouds, 2018. R package version 2.6. [CrossRef]
- Small, E.; Cabrera, J. Principal phrase mining: an automated method for extracting meaningful phrases from text. International Journal of Computers and Applications 2025, 47, 84–92. [Google Scholar] [CrossRef]



| Variable | Category | Percentage |
|---|---|---|
| Year of creation | 1999 - 2008 | 26% |
| 2009 - 2018 | 46% | |
| 2019 - 2024 | 28% | |
| Continent of creator | Europe | 56% |
| North America | 27% | |
| South America | 5% | |
| Asia | 7% | |
| Oceania | 5% | |
| Number of authors | 1 - 2 | 56% |
| 3 - 10 | 40% | |
| 11 - 26 | 4% | |
| Number of updates | 1 - 10 | 54% |
| 11 - 50 | 37% | |
| 51 - 216 | 9% | |
| Contains data | Yes | 68% |
| No | 32% | |
| Contains vignette | Yes | 46% |
| No | 54% | |
| Journal publication | Yes | 45% |
| No | 55% | |
| Book publication | Yes | 19% |
| No | 81% | |
| Reverse imports/suggests/enhances | Yes | 93% |
| No | 7% | |
| Gender of the creator | Male | 94% |
| Female | 6% | |
| Datasets only | Yes | 7% |
| No | 93% | |
| Bayesian analysis | Yes | 7% |
| No | 93% | |
| Web scrapping | Yes | 8% |
| No | 92% | |
| CTV Econometrics | Yes | 73% |
| No | 27% |
| Journal Publication | ||
|---|---|---|
| Author group | No Publication | Publication |
| 1–2 | 73 | 44 |
| 3–10 | 36 | 46 |
| 11–26 | 4 | 4 |
| Vignette | ||
|---|---|---|
| Journal publication | No | Yes |
| No | 76 | 37 |
| Yes | 35 | 59 |
| Reverse Imports/Suggests/Enhances | ||
|---|---|---|
| Journal publication | No | Yes |
| No | 10 | 103 |
| Yes | 4 | 90 |
| Updates | |||
|---|---|---|---|
| Reverse imports | 1-10 | 11-50 | 51-216 |
| No | 12 | 2 | 0 |
| Yes | 100 | 75 | 18 |
| Continent of creator | |||||
|---|---|---|---|---|---|
| First year | Asia | Europe | North America | Oceania | South America |
| 1999–2008 | 2 | 29 | 19 | 4 | 0 |
| 2009–2018 | 7 | 55 | 22 | 4 | 8 |
| 2019–2024 | 5 | 33 | 14 | 3 | 2 |
| Number of Updates | |||
|---|---|---|---|
| Year of creation | 1-10 | 11-50 | 51-216 |
| 1999–2008 | 4 | 36 | 14 |
| 2009–2018 | 64 | 28 | 4 |
| 2019–2024 | 44 | 13 | 0 |
| Bayesian Analysis | ||
|---|---|---|
| Creator Continent | No | Yes |
| Europe | 107 | 10 |
| North America | 52 | 3 |
| South America | 10 | 0 |
| Asia | 13 | 1 |
| Oceania | 10 | 1 |
| Journal publication | ||
|---|---|---|
| Datasets only | No | Yes |
| No | 101 | 92 |
| Yes | 12 | 2 |
| Word | Frequency | |
|---|---|---|
| 1 | data | 100 |
| 2 | estimation | 83 |
| 3 | regression | 78 |
| 4 | linear | 65 |
| 5 | effect | 64 |
| 6 | test | 57 |
| 7 | base | 50 |
| 8 | variable | 50 |
| 9 | time | 45 |
| 10 | fit | 43 |
| 11 | economic | 42 |
| 12 | estimate | 41 |
| 13 | generalize | 36 |
| 14 | panel | 36 |
| 15 | spatial | 30 |
| 16 | two | 30 |
| 17 | choice | 29 |
| 18 | design | 29 |
| 19 | estimator | 29 |
| 20 | series | 29 |
| Phrase | Frequency | |
|---|---|---|
| 1 | time series | 21 |
| 2 | maximum likelihood estimation | 16 |
| 3 | fixed effect | 10 |
| 4 | instrumental variable | 10 |
| 5 | confidence interval | 8 |
| 6 | generalized linear | 7 |
| 7 | linear regression | 7 |
| 8 | panel data | 7 |
| 9 | synthetic control | 7 |
| 10 | cross sectional | 6 |
| 11 | data set | 6 |
| 12 | beta regression | 5 |
| 13 | call sur | 5 |
| 14 | data drive | 5 |
| 15 | generalized additive | 5 |
| 16 | least square | 5 |
| 17 | sample selection | 5 |
| 18 | treatment effect | 5 |
| 19 | two step | 5 |
| 20 | average treatment effect | 4 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).