Héberger, K. Frequent Errors in Modeling by Machine Learning: A Prototype Case of Predicting the Timely Evolution of COVID-19 Pandemic. Algorithms 2024, 17, 43, doi:10.3390/a17010043.
Héberger, K. Frequent Errors in Modeling by Machine Learning: A Prototype Case of Predicting the Timely Evolution of COVID-19 Pandemic. Algorithms 2024, 17, 43, doi:10.3390/a17010043.
Héberger, K. Frequent Errors in Modeling by Machine Learning: A Prototype Case of Predicting the Timely Evolution of COVID-19 Pandemic. Algorithms 2024, 17, 43, doi:10.3390/a17010043.
Héberger, K. Frequent Errors in Modeling by Machine Learning: A Prototype Case of Predicting the Timely Evolution of COVID-19 Pandemic. Algorithms 2024, 17, 43, doi:10.3390/a17010043.
Abstract
Background: The development and application of machine learning (ML) methods became so fast that almost nobody can follow their developments in every detail. There is no wonder that numerous errors and inconsistencies in their usage have also spread with a similar speed independently from the tasks: regression and classification.. This work summarizes frequent errors committed by certain authors with the aims of helping scientists to avoid them. Methods: The principle of parsimony governs the train of thought. Fair method comparison methods can be completed with multicriteria decision making techniques, preferably sum of ranking differences (SRD). Its coupling with analysis of variance (ANOVA) decomposes the effect of several factors. Earlier findings are summarized in a review-like manner: the abuse of the correlation coefficient and proper practices for model discrimination are also outlined. Results: Using an illustrative example, the correct practice and the methodology is summarized as guidelines for model discrimination, and for minimizing the prediction errors. The following factors are all prerequisites for successful modeling: proper data preprocessing, statistical tests, suitable performance parameters, appropriate degrees of freedom, fair comparison of models, outlier detection, just to name a few. A checklist is provided on how to present ML modeling properly. The advocated practices are reviewed in the discussion. Conclusions: Many of the errors can easily be filtered out with careful reviewing. The authors’ responsibility is to adhere to the rules of modeling and validation.
Computer Science and Mathematics, Artificial Intelligence and Machine Learning
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.