Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

AI-Enabled Pipeline for Virus Detection, Validation, and SNP Discovery from Next-Generation Sequencing Data

Version 1 : Received: 14 May 2024 / Approved: 14 May 2024 / Online: 14 May 2024 (13:26:57 CEST)

How to cite: Ghorbani, A.; Guzzi, P. H.; Rostami, M. AI-Enabled Pipeline for Virus Detection, Validation, and SNP Discovery from Next-Generation Sequencing Data. Preprints 2024, 2024050971. https://doi.org/10.20944/preprints202405.0971.v1 Ghorbani, A.; Guzzi, P. H.; Rostami, M. AI-Enabled Pipeline for Virus Detection, Validation, and SNP Discovery from Next-Generation Sequencing Data. Preprints 2024, 2024050971. https://doi.org/10.20944/preprints202405.0971.v1

Abstract

The rapid and accurate detection of viruses and the discovery of single nucleotide polymorphisms (SNPs) are crucial for disease management and understanding viral evolution. In this study, a pipeline for virus detection, validation, and SNP discovery from next-generation sequencing (NGS) data is presented. By integrating state-of-the-art bioinformatics tools with artificial intelligence, the pipeline processes raw sequencing data to identify viral sequences with high accuracy and sensitivity. Before aligning the reads to the reference genomes, quality control measures and adapter trimming are performed to ensure the integrity of the data. Unmapped reads are subjected to de novo assembly to reveal novel viral sequences and genetic elements. The effectiveness of the pipeline is demonstrated by the identification of virus sequences, illustrating its potential for the detection of known and emerging pathogens. SNP discovery is performed using a custom Python script that compares the entire population of sequenced viral reads to a reference genome. This approach provides a comprehensive overview of viral genetic diversity and identifies dominant variants and a spectrum of genetic variations. The robustness of the pipeline is confirmed by the recovery of complete viral sequences, which improves our understanding of viral genomics. This study highlights the synergy between traditional bioinformatics techniques and modern approaches, providing a robust tool for analyzing viral genomes and contributing to the broader field of viral genomics.

Keywords

virus detection; next-generation sequencing; bioinformatics analysis; SNP discovery; viral genomics; AI-assisted genomics

Subject

Biology and Life Sciences, Virology

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.