Version 1
: Received: 3 March 2022 / Approved: 7 March 2022 / Online: 7 March 2022 (16:25:18 CET)
How to cite:
Olmo-Uceda, M.J.; Muñoz-Sánchez, J.C.; Lasso-Giraldo, W.; Arnau, V.; Díaz-Villanueva, W.; Elena, S.F. DVGfinder: A Metasearch Engine for Identifying Defective Viral Genomes in RNA-Seq Data. Preprints.org2022, 2022030110. https://doi.org/10.20944/preprints202203.0110.v1.
Olmo-Uceda, M.J.; Muñoz-Sánchez, J.C.; Lasso-Giraldo, W.; Arnau, V.; Díaz-Villanueva, W.; Elena, S.F. DVGfinder: A Metasearch Engine for Identifying Defective Viral Genomes in RNA-Seq Data. Preprints.org 2022, 2022030110. https://doi.org/10.20944/preprints202203.0110.v1.
Cite as:
Olmo-Uceda, M.J.; Muñoz-Sánchez, J.C.; Lasso-Giraldo, W.; Arnau, V.; Díaz-Villanueva, W.; Elena, S.F. DVGfinder: A Metasearch Engine for Identifying Defective Viral Genomes in RNA-Seq Data. Preprints.org2022, 2022030110. https://doi.org/10.20944/preprints202203.0110.v1.
Olmo-Uceda, M.J.; Muñoz-Sánchez, J.C.; Lasso-Giraldo, W.; Arnau, V.; Díaz-Villanueva, W.; Elena, S.F. DVGfinder: A Metasearch Engine for Identifying Defective Viral Genomes in RNA-Seq Data. Preprints.org 2022, 2022030110. https://doi.org/10.20944/preprints202203.0110.v1.
Abstract
The generation of different types of defective viral genomes (DVG) is an unavoidable consequence of the error-prone replication of RNA viruses. In recent years, a particular class of DVGs, those containing long deletions or genome rearrangements, has gain interest due to their potential therapeutic and biotechnological applications. Identifying such DVGs in high-throughput sequencing data has become an interesting computational problem. Up to nowadays, several algorithms have been proposed, though all incur in false positives, a problem of practical interest if such DVGs have to be synthetized and tested in the laboratory. Here we develop a novel software, DVGfinder, that wraps the two most commonly used algorithms into a pipeline that predicts DVGs. Using a gradient boosting classifier machine learning algorithm, we evaluate the performance of DVGfinder compared to previous algorithms and found that it outcompetes their precision and sensitivity in simulated datasets. DVGfinder generates user-friendly output files in HTML format that can assist users to identify DVGs based on their associated probability of being true positives.
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.