Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Creating Variant Features to Enhance Covid-19 Predictions with Machine Learning Ensemble

Version 1 : Received: 20 January 2022 / Approved: 21 January 2022 / Online: 21 January 2022 (15:17:58 CET)

How to cite: Wood, J.; Wang, W. Creating Variant Features to Enhance Covid-19 Predictions with Machine Learning Ensemble. Preprints 2022, 2022010333. https://doi.org/10.20944/preprints202201.0333.v1 Wood, J.; Wang, W. Creating Variant Features to Enhance Covid-19 Predictions with Machine Learning Ensemble. Preprints 2022, 2022010333. https://doi.org/10.20944/preprints202201.0333.v1

Abstract

Covid-19 has caused infections and deaths worldwide. While research in the field of Data Science has contributed good predictions of positive Covid-19 case numbers, this study's review of literature shows there is little research in the use of variants of the virus in predictions. We set out to define and evaluate novel variant features. We find that features relating to variant trends, thresholds and amino acid substitutions are especially powerful in two tasks. In the first task, predicting Covid-19 case numbers, accuracy improved from 71.53% without variant features to 82.12% with variant features. In the second task, predicting transmission severity of variants between two classes, we created a method to build some variable ensembles through selecting appropriate models that are generated with variant features. The test results showed that our ensembles are more accurate and reliable. One particular ensemble of 14 models correctly classified 90.91% of variants, outperforming other models including the popular Random Forest ensemble. In addition, as the variant features have represented more underlying information about Covid-19 pathophysiology, our ensemble methods use only a few data samples to achieve an accurate prediction. The ensemble of 14 models uses only 50 cases of each variant, an ability that could be exploited for early detection of highly infectious variants. These research findings may benefit public health professionals, policy makers, and the research community in the collective efforts to overcome this disease.

Keywords

Covid-19; Ensemble; Genome sequencing; Machine learning; Variant

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.