Preprint Review Version 1 Preserved in Portico This version is not peer-reviewed

Methods for De-novo Genome Assembly

Version 1 : Received: 26 June 2020 / Approved: 28 June 2020 / Online: 28 June 2020 (08:56:09 CEST)

How to cite: Bayat, A.; Gamaarachchi, H.; Deshpande, N.P.; Wilkins, M.R.; Parameswaran, S. Methods for De-novo Genome Assembly. Preprints 2020, 2020060324 (doi: 10.20944/preprints202006.0324.v1). Bayat, A.; Gamaarachchi, H.; Deshpande, N.P.; Wilkins, M.R.; Parameswaran, S. Methods for De-novo Genome Assembly. Preprints 2020, 2020060324 (doi: 10.20944/preprints202006.0324.v1).


Despite advances in algorithms and computational platforms, de-novo genome assembly remains a challenging process. Due to the constant innovation in sequencing technologies (Sanger, SOLiD, Illumina, 454, PacBio and Oxford Nanopore), genome assembly has evolved to respond to the changes in input data type. This paper includes a broad and comparative review of the most recent short-read, long-read and hybrid assembly techniques. In this review, we provide (1) an algorithmic description of the important processes in the workflow that introduces fundamental concepts and improvements; (2) a review of existing software that explains possible options for genome assembly; and (3) a comparison of the accuracy and the performance of existing methods executed on the same computer using the same processing capabilities and using the same set of real and synthetic datasets. Such evaluation allows a fair and precise comparison of accuracy in all aspects. As a result, this paper identifies both the strengths and weaknesses of each method. This comparative review is unique in providing a detailed comparison of a broad spectrum of cutting-edge algorithms and methods.

Subject Areas

De-novo Genome Assembly; Short Read Genome Assembly; Long Read Genome Assembly; Hybrid Genome Assembly

Comments (2)

Comment 1
Received: 3 July 2020
The commenter has declared there is no conflict of interests.
Comment: The article seems a bit out of date. It is becoming clearer in 2020 that 2nd generation (NGS) short reads will soon be superseded by long reads from PacBio and Oxford Nanopore (if ONT can deliver full phasing as PacBio now can do). It is likely that Illumina will will not exist in 5 years.
+ Respond to this comment
Response 1 to Comment 1
Received: 7 July 2020
Commenter: Arash Bayat
Commenter's Conflict of Interests: I am an author
Comment: I agree with you that long-reads play a critical role in the de-novo assembly of complex genomes. Yet short-reads can be used for error correction of noisy long-reads. A significant part of the paper is dedicated to this combination (Hybrid Assembly). Apart from the scope of this paper, short-reads are yet the main data source for the reference-guided assembly pipeline (i.e. Variant-Calling). I also agree with you that we have considered well-established tools (you mentioned out of date) but not the cutting edge new technologies. The paper is novel in terms of providing a uniform dataset and computational resources to evaluate the existing de-novo contig assembly tool in a fair environment considering both accuracy and runtime. We compare tools that are widely used by bioinformaticians so that the comparison result become useful for them to decide which pipeline they would like to use based on their specific requirement. Many thanks for your comment on our paper.

We encourage comments and feedback from a broad range of readers. See criteria for comments and our diversity statement.

Leave a public comment
Send a private comment to the author(s)
Views 0
Downloads 0
Comments 2
Metrics 0

Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.