Version 1
: Received: 21 August 2022 / Approved: 22 August 2022 / Online: 22 August 2022 (04:51:50 CEST)
How to cite:
Leogrande, A.; Costantiello, A.; Laureti, L. K-Means Clusterization and Machine Learning Prediction of European Most Cited Scientific Publications. Preprints2022, 2022080374. https://doi.org/10.20944/preprints202208.0374.v1
Leogrande, A.; Costantiello, A.; Laureti, L. K-Means Clusterization and Machine Learning Prediction of European Most Cited Scientific Publications. Preprints 2022, 2022080374. https://doi.org/10.20944/preprints202208.0374.v1
Leogrande, A.; Costantiello, A.; Laureti, L. K-Means Clusterization and Machine Learning Prediction of European Most Cited Scientific Publications. Preprints2022, 2022080374. https://doi.org/10.20944/preprints202208.0374.v1
APA Style
Leogrande, A., Costantiello, A., & Laureti, L. (2022). K-Means Clusterization and Machine Learning Prediction of European Most Cited Scientific Publications. Preprints. https://doi.org/10.20944/preprints202208.0374.v1
Chicago/Turabian Style
Leogrande, A., Alberto Costantiello and Lucio Laureti. 2022 "K-Means Clusterization and Machine Learning Prediction of European Most Cited Scientific Publications" Preprints. https://doi.org/10.20944/preprints202208.0374.v1
Abstract
In this article we investigate the determinants of the European “Most Cited Publications”. We use data from the European Innovation Scoreboard-EIS of the European Commission for the period 2010-2019. Data are analyzed with Panel Data with Fixed Effects, Panel Data with Random Effects, WLS, and Pooled OLS. Results show that the level of “Most Cited Publications” is positively associated, among others, to “Innovation Index” and “Enterprise Birth” and negatively associated, among others, to “Government Procurement of Advanced Technology Products” and “Human Resources”. Furthermore, we perform a cluster analysis with the k-Means algorithm either with the Silhouette Coefficient and the Elbow Method. We find that the Elbow Method shows better results than the Silhouette Coefficient with a number of clusters equal to 3. In adjunct we perform a network analysis with the Manhattan distance, and we find the presence of 4 complex and 2 simplified network structures. Finally, we present a confrontation among 10 machine learning algorithms to predict the level of “Most Cited Publication” either with Original Data-OD either with Augmented Data-AD. Results show that the best machine learning algorithm to predict the level of “Most Cited Publication” with Original Data-OD is SGD, while Linear Regression is the best machine learning algorithm for the prediction of “Most Cited Publications” with Augmented Data-AD.
Keywords
Innovation and Invention; Processes and Incentives; Management of Technological Innovation and R&D; Diffusion Processes; Open Innovation
Subject
Business, Economics and Management, Economics
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.