Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Frequent Errors in Modeling by Machine Learning: A Prototype Case of Predicting the Timely Evolution of COVID-19 Pandemic

Version 1 : Received: 16 November 2023 / Approved: 17 November 2023 / Online: 17 November 2023 (12:35:21 CET)

A peer-reviewed article of this Preprint also exists.

Héberger, K. Frequent Errors in Modeling by Machine Learning: A Prototype Case of Predicting the Timely Evolution of COVID-19 Pandemic. Algorithms 2024, 17, 43, doi:10.3390/a17010043. Héberger, K. Frequent Errors in Modeling by Machine Learning: A Prototype Case of Predicting the Timely Evolution of COVID-19 Pandemic. Algorithms 2024, 17, 43, doi:10.3390/a17010043.

Abstract

Background: The development and application of machine learning (ML) methods became so fast that almost nobody can follow their developments in every detail. There is no wonder that numerous errors and inconsistencies in their usage have also spread with a similar speed independently from the tasks: regression and classification.. This work summarizes frequent errors committed by certain authors with the aims of helping scientists to avoid them. Methods: The principle of parsimony governs the train of thought. Fair method comparison methods can be completed with multicriteria decision making techniques, preferably sum of ranking differences (SRD). Its coupling with analysis of variance (ANOVA) decomposes the effect of several factors. Earlier findings are summarized in a review-like manner: the abuse of the correlation coefficient and proper practices for model discrimination are also outlined. Results: Using an illustrative example, the correct practice and the methodology is summarized as guidelines for model discrimination, and for minimizing the prediction errors. The following factors are all prerequisites for successful modeling: proper data preprocessing, statistical tests, suitable performance parameters, appropriate degrees of freedom, fair comparison of models, outlier detection, just to name a few. A checklist is provided on how to present ML modeling properly. The advocated practices are reviewed in the discussion. Conclusions: Many of the errors can easily be filtered out with careful reviewing. The authors’ responsibility is to adhere to the rules of modeling and validation.

Keywords

Machine learning; artificial neural networks; performance parameters; degree of freedom; fair method comparison; QSAR

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.