Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

DVGfinder: A Metasearch Engine for Identifying Defective Viral Genomes in RNA-Seq Data

Version 1 : Received: 3 March 2022 / Approved: 7 March 2022 / Online: 7 March 2022 (16:25:18 CET)

How to cite: Olmo-Uceda, M.J.; Muñoz-Sánchez, J.C.; Lasso-Giraldo, W.; Arnau, V.; Díaz-Villanueva, W.; Elena, S.F. DVGfinder: A Metasearch Engine for Identifying Defective Viral Genomes in RNA-Seq Data. Preprints 2022, 2022030110. https://doi.org/10.20944/preprints202203.0110.v1 Olmo-Uceda, M.J.; Muñoz-Sánchez, J.C.; Lasso-Giraldo, W.; Arnau, V.; Díaz-Villanueva, W.; Elena, S.F. DVGfinder: A Metasearch Engine for Identifying Defective Viral Genomes in RNA-Seq Data. Preprints 2022, 2022030110. https://doi.org/10.20944/preprints202203.0110.v1

Abstract

The generation of different types of defective viral genomes (DVG) is an unavoidable consequence of the error-prone replication of RNA viruses. In recent years, a particular class of DVGs, those containing long deletions or genome rearrangements, has gain interest due to their potential therapeutic and biotechnological applications. Identifying such DVGs in high-throughput sequencing data has become an interesting computational problem. Up to nowadays, several algorithms have been proposed, though all incur in false positives, a problem of practical interest if such DVGs have to be synthetized and tested in the laboratory. Here we develop a novel software, DVGfinder, that wraps the two most commonly used algorithms into a pipeline that predicts DVGs. Using a gradient boosting classifier machine learning algorithm, we evaluate the performance of DVGfinder compared to previous algorithms and found that it outcompetes their precision and sensitivity in simulated datasets. DVGfinder generates user-friendly output files in HTML format that can assist users to identify DVGs based on their associated probability of being true positives.

Keywords

benchmarking; bioinformatics; defective viral genomes; gradient boosting; machine learning; RNA-seq; SARS-CoV-2; virus replication

Subject

Biology and Life Sciences, Virology

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.