Submitted:
20 February 2024
Posted:
22 February 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Classification of Whole Genome Alignment Algorithms
2.1. Suffix tree-based alignment methods
2.1.1. Suffix tree
2.1.2. MUMmer technique
- (i)
- Perform a maximal unique match (MUM) decomposition of the two genomes. A MUM is a subsequence that occurs exactly once in genome A and once in genome B. This decomposition identifies all maximal unique matches between the two genomes. To detect those MUMs, the two genomes are represented by a suffix tree. The common substrings detected on the tree will represent all the MUMs between the two genomes (Figure 2).
- (ii)
- Sort the MUMs and extract the longest possible set of matches that occur in the same order in both genomes.
- (iii)
- Close the gaps in the alignment by performing an identification of large inserts, repeats, mutated regions and single nucleotide variation (SNV).
2.1.3. MUMmer 2.1
2.1.4. MUMmer 3.0
2.1.5. MUMmer 4.0
2.1.6. Other sequence comparison approaches based on suffix tree method
2.2. Anchors based methods
2.2.1. Lagan
- (i)
-
Generation of Local AlignmentsLagan uses CHAOS method [22] to detect local homologies between the two genomes and chains them into a rough global map.The first step of CHAOS is chaining short exact matches (seeds) which match between the two genomes. The seeds, that are close, are regrouped to same anchors. The gaps between the seeds are aligned using a dynamic programming alignment method.
- (ii)
-
Construction of a Rough Global MapLagan uses local alignments to perform a rough global map. Each local alignment has a score of similarity. The optimal rough global map has the highest-scoring chain, which can be computed using Sparse Dynamic Programming [23].
- (iii)
-
Computation of Global AlignmentTo compute the final global alignment, Lagan uses Needleman-Wunsch algorithm to perform an alignment of the limited-area between the anchors.
2.2.2. Multi-LAGAN
2.2.3. Mauve
2.2.3.1. Steps
2.2.3.2 Alignment visualization tool
2.2.3.3 Progressive Mauve
2.2.4. BLASTZ
2.2.4.1 Main steps

2.2.4.2 Main use
2.2.5. STELLAR
2.2.6. LASTZ
2.2.7. DIALLGN
2.2.8. AnchorWave
2.2.9. Minimap-2
2.3. Graph-based homology mapping methods
2.3.1. Mugsy
2.3.2. BubbZ
2.3.3. SibeliaZ
3. Algorithmic aspects of WGA algorithms
3.1. Performance characteristics
3.2. Methodological Underpinnings
4. Recent advancements in WGA algorithms
5. Comprehensive Analysis of Genomic Comparison Tools: Human and Diverse Genomes
5.1. Analyses of Human Genomes
5.2. Analysis of C. elegans and Baker's Yeast Genomes
5.3. Execution Times and Tool Complexity
5.4. Methodological insights
| Whole Genome Alignment Tools | Human Vs Human | C. elegans Vs Baker’s Yeast |
|---|---|---|
| SibeliaZ | 9% | 12% |
| MUMmer 4.0 | 99% | 0% |
| Minimap2 | 100% | 0% |
| Whole Genome Alignment Tools | Alignment Time |
|---|---|
| SibeliaZ | 6 hours and 19 minutes |
| MUMmer 4.0 | 15 hours and 34 minutes |
| Diaalign-2 | Fail |
| Minimap2 | 44 minutes |
6. Challenges in whole genome alignment
6.1. Computational challenges
6.2. Biological relevance
6.3. Future directions
7. Discussion
8. Conclusion
Author Contributions
Funding
Data and Scripts Availability
Acknowledgments
References
- Guerfali, F.; Laouini, D.; Boudabous, A.; Tekaia, F. Designing and running an advanced Bioinformatics and genome analyses course in Tunisia. PLoS Computational Biology 2019, 15, e1006373. [Google Scholar] [CrossRef] [PubMed]
- Saada Bacem and Z. Jing. DNA sequences compression algorithms based on the two bits codation method. in 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). 2015. IEEE.
- Venter JC, A.M.D. , Myers EW, Li PW, Mural RJ, Sutton GG., The sequence of the human genome. science 2001, 291, 1304–51. [Google Scholar]
- Goldfeder, R.L.; et al. Human Genome Sequencing at the Population Scale: A Primer on High-Throughput DNA Sequencing and Analysis. American Journal of Epidemiology 2017, 186, 1000–1009. [Google Scholar] [CrossRef] [PubMed]
- Pinese, M.; et al. The Medical Genome Reference Bank contains whole genome and phenotype data of 2570 healthy elderly. Nature Communications 2020, 11, 435. [Google Scholar] [CrossRef] [PubMed]
- Anderson, W.; et al. International network of cancer genome projects. Nature 2010, 464. [Google Scholar]
- Blake, J.A.; et al. Mouse Genome Database (MGD): Knowledgebase for mouse–human comparative biology. Nucleic Acids Research 2021, 49, D981–D987. [Google Scholar] [CrossRef] [PubMed]
- Abascal, F.; et al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 2020, 583, 699–710. [Google Scholar]
- Needleman, S.B.; Wunsch, C.D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 1970, 48, 443–453. [Google Scholar] [CrossRef]
- Smith, T.F.; Waterman, M.S. Identification of common molecular subsequences. Journal of molecular biology 1981, 147, 195–197. [Google Scholar] [CrossRef]
- Morgenstern, B. DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics 1999, 15, 211–218. [Google Scholar] [CrossRef]
- Delcher, A.L.; et al. Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Research 2002, 30, 2478–2483. [Google Scholar] [CrossRef] [PubMed]
- Gusfield, D. Algorithms on stings, trees, and sequences: Computer science and computational biology. Acm Sigact News 1997, 28, 41–60. [Google Scholar] [CrossRef]
- Tian, Y.; et al. Practical methods for constructing suffix trees. The VLDB Journal 2005, 14, 281–299. [Google Scholar] [CrossRef]
- Delcher, A.L.; et al. Alignment of whole genomes. Nucleic Acids Research 1999, 27, 2369–2376. [Google Scholar] [CrossRef] [PubMed]
- Kurtz, S.; et al. Versatile and open software for comparing large genomes. Genome Biology 2004, 5, R12. [Google Scholar] [CrossRef]
- Marçais, G.; et al. MUMmer4: A fast and versatile genome alignment system. PLoS computational biology 2018, 14, e1005944. [Google Scholar] [CrossRef]
- Soares, I.; Goios, A.; Amorim, A. Sequence Comparison Alignment-Free Approach Based on Suffix Tree and <i>L-Words</i> Frequency. The Scientific World Journal 2012, 2012, 450124. [Google Scholar]
- Su, W.; et al. Multiple sequence alignment based on a suffix tree and center-star strategy: a linear method for multiple nucleotide sequence alignment on spark parallel framework. Journal of Computational Biology 2017, 24, 1230–1242. [Google Scholar] [CrossRef]
- Quan, Z.O.U.; et al. An Algorithm for DNA Multiple Sequence Alignment Based on Center Star Method and Keyword Tree. ACTA ELECTONICA SINICA 2009, 37, 1746–1750. [Google Scholar]
- Brudno, M.; et al. LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome research 2003, 13, 721–731. [Google Scholar] [CrossRef]
- Wan, X.; Karniadakis, G.E. An adaptive multi-element generalized polynomial chaos method for stochastic differential equations. Journal of Computational Physics 2005, 209, 617–642. [Google Scholar] [CrossRef]
- Eppstein, D.; et al. Sparse dynamic programming I: linear cost functions. Journal of the ACM (JACM) 1992, 39, 519–545. [Google Scholar] [CrossRef]
- Darling, A.C.; et al. Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome research 2004, 14, 1394–1403. [Google Scholar] [CrossRef]
- Saitou, N.; Nei, M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 1987, 4, 406–425. [Google Scholar] [PubMed]
- Thompson, J.D.; Higgins, D.G.; Gibson, T.J. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research 1994, 22, 4673–4680. [Google Scholar] [CrossRef] [PubMed]
- Darling, A.E.; Mau, B.; Perna, N.T. progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PloS one 2010, 5, e11147. [Google Scholar] [CrossRef] [PubMed]
- Tatusova, T.A.; Madden, T.L. BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences. FEMS Microbiology Letters 1999, 174, 247–250. [Google Scholar] [CrossRef] [PubMed]
- Ma, B.; Tromp, J.; Li, M. PatternHunter: faster and more sensitive homology search. Bioinformatics 2002, 18, 440–445. [Google Scholar] [CrossRef] [PubMed]
- Schwartz, S.; et al. Human–mouse alignments with BLASTZ. Genome research 2003, 13, 103–107. [Google Scholar] [CrossRef]
- Kehr, B.; Weese, D.; Reinert, K. STELLAR: fast and exact local alignments. BMC Bioinformatics 2011, 12, S15. [Google Scholar] [CrossRef]
- Rasmussen, K.R.; Stoye, J.; Myers, E.W. Efficient q-Gram Filters for Finding All ε-Matches over a Given Length. In Research in Computational Molecular Biology; Springer: Berlin/Heidelberg, Germany, 2005. [Google Scholar]
- Harris, R.S. Improved pairwise alignment of genomic DNA. 2007: The Pennsylvania State University.
- Al Ait, L.; Yamak, Z.; Morgenstern, B. DIALIGN at GOBICS—multiple sequence alignment using various sources of external information. Nucleic Acids Research 2013, 41, W3–W7. [Google Scholar] [CrossRef]
- Song, B.; et al. AnchorWave: Sensitive alignment of genomes with high sequence diversity, extensive structural polymorphism, and whole-genome duplication. Proceedings of the National Academy of Sciences 2022, 119, e2113075119. [Google Scholar] [CrossRef]
- Li, H. New strategies to improve minimap2 alignment accuracy. Bioinformatics 2021, 37, 4572–4574. [Google Scholar] [CrossRef]
- Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 2018, 34, 3094–3100. [Google Scholar] [CrossRef] [PubMed]
- Dewey, C.N. Aligning multiple whole genomes with Mercator and MAVID. Comparative genomics 2008, 221–235. [Google Scholar]
- Angiuoli, S.V.; Salzberg, S.L. Mugsy: fast multiple alignment of closely related whole genomes. Bioinformatics 2010, 27, 334–342. [Google Scholar] [CrossRef] [PubMed]
- Minkin, I.; Medvedev, P. Scalable pairwise whole-genome homology mapping of long genomes with BubbZ. IScience 2020, 23. [Google Scholar] [CrossRef] [PubMed]
- Minkin, I.; Medvedev, P. Scalable multiple whole-genome alignment and locally collinear block construction with SibeliaZ. Nature Communications 2020, 11, 6327. [Google Scholar] [CrossRef] [PubMed]
- Saada, B.; Zhang, J. DNA sequences compression algorithm based on extended-ASCII representation. in Proceedings of the world congress on engineering and computer science. 2015.
- Silva, M.; Pratas, D.; Pinho, A.J. Efficient DNA sequence compression with neural networks. GigaScience 2020, 9. [Google Scholar] [CrossRef]
- Corbett, R.D.; et al. A distributed whole genome sequencing benchmark study. Frontiers in genetics 2020, 11, 612515. [Google Scholar] [CrossRef]
- Marco-Sola, S.; et al. Optimal gap-affine alignment in O(s) space. Bioinformatics 2023, 39. [Google Scholar] [CrossRef]
- Alser, M.; et al. Technology dictates algorithms: recent developments in read alignment. Genome Biology 2021, 22, 249. [Google Scholar] [CrossRef]
- Armstrong, J.; et al. Progressive Cactus is a multiple-genome aligner for the thousand-genome era. Nature 2020, 587, 246–251. [Google Scholar] [CrossRef] [PubMed]
- Armstrong, J.; et al. Progressive Cactus is a multiple-genome aligner for the thousand-genome era. Nature 2020, 587, 246–251. [Google Scholar] [CrossRef] [PubMed]
- Armstrong, J.; et al. Progressive alignment with Cactus: a multiple-genome aligner for the thousand-genome era. BioRxiv, 2019: p. 730531. [CrossRef]
- Zhou, Y.; et al. A completeness-independent method for pre-selection of closely related genomes for species delineation in prokaryotes. BMC Genomics 2020, 21, 183. [Google Scholar] [CrossRef] [PubMed]
- Gardner, S.N.; et al. Multiplex primer prediction software for divergent targets. Nucleic Acids Research 2009, 37, 6291–6304. [Google Scholar] [CrossRef] [PubMed]
- Dewey, C.N. Whole-Genome Alignment. In Evolutionary Genomics: Statistical and Computational Methods; Volume, 1, Anisimova, M., Eds.; Humana Press: Totowa, NJ, 2012; pp. 237–257. [Google Scholar]
- Löytynoja, A. Alignment methods: strategies, challenges, benchmarking, and comparative overview. Evolutionary Genomics: Statistical and Computational Methods, Volume 1, 2012: p. 203-235.
- Couronne, O.; et al. Strategies and tools for whole-genome alignments. Genome research 2003, 13, 73–80. [Google Scholar] [CrossRef] [PubMed]
- Govek, K.W.; Yamajala, V.S.; Camara, P.G. Clustering-independent analysis of genomic data using spectral simplicial theory. PLoS computational biology 2019, 15, e1007509. [Google Scholar] [CrossRef] [PubMed]
- Wu, Y.; et al. A multiple alignment workflow shows the effect of repeat masking and parameter tuning on alignment in plants. The Plant Genome 2022, 15, e20204. [Google Scholar] [CrossRef]
- Dewey, C.N. Aligning Multiple Whole Genomes with Mercator and MAVID. In Comparative Genomics; N.H. Bergman, Editor; Humana Press: Totowa, NJ, 2008; pp. 221–235. [Google Scholar]
- Dewey, C.N. Whole-genome alignment. Evolutionary Genomics: Statistical and Computational Methods, 2019: p. 121-147.
- Huang, C.; Li, R.; Li, A. Parallel Implementation of Key Algorithms for Intelligent Processing of Graphic Signal Data of Consumer Digital Equipment. Mobile Networks and Applications 2023. [Google Scholar] [CrossRef]
- Nolle, T.; et al. DeepAlign: alignment-based process anomaly correction using recurrent neural networks. in International conference on advanced information systems engineering. 2020. Springer.
- Peltzer, A.; et al. EAGER: efficient ancient genome reconstruction. Genome Biology 2016, 17, 60. [Google Scholar] [CrossRef] [PubMed]
- Song, B.; Buckler, E.S.; Stitzer, M.C. New whole-genome alignment tools are needed for tapping into plant diversity. Trends in Plant Science 2023. [Google Scholar] [CrossRef] [PubMed]
- Earl, D.; et al. Alignathon: a competitive assessment of whole-genome alignment methods. Genome research 2014, 24, 2077–2089. [Google Scholar] [CrossRef] [PubMed]
- Schadt, E.E.; et al. Computational solutions to large-scale data management and analysis. Nature Reviews Genetics 2010, 11, 647–657. [Google Scholar] [CrossRef] [PubMed]
- Dewey, C. Whole-Genome Alignment. 2019. pp. 121–147.
- Ye, C.; et al. DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies. Scientific Reports 2016, 6, 31900. [Google Scholar] [CrossRef] [PubMed]
- Kshemkalyani, A.D.; Singhal, M. Distributed computing: principles, algorithms, and systems. 2011: Cambridge University Press.
- Volozonoka, L.; Miskova, A.; Gailite, L. Whole genome amplification in preimplantation genetic testing in the era of massively parallel sequencing. International Journal of Molecular Sciences 2022, 23, 4819. [Google Scholar] [CrossRef] [PubMed]
- Uffelmann, E.; et al. Genome-wide association studies. Nature Reviews Methods Primers 2021, 1, 59. [Google Scholar] [CrossRef]
- Girisha, M.N.; Badiger, V.P.; Pattar, S. A comprehensive review of global alignment of multiple biological networks: background, applications and open issues. Network Modeling Analysis in Health Informatics and Bioinformatics 2022, 11, 9. [Google Scholar] [CrossRef]
- Hennig, A.; Nieselt, K. Efficient merging of genome profile alignments. Bioinformatics 2019, 35, i71–i80. [Google Scholar] [CrossRef]
- Armstrong, J.; et al. Whole-genome alignment and comparative annotation. Annual review of animal biosciences 2019, 7, 41–64. [Google Scholar] [CrossRef]
- Kille, B.; et al. Multiple genome alignment in the telomere-to-telomere assembly era. Genome Biology 2022, 23, 182. [Google Scholar] [CrossRef] [PubMed]
- Macaulay, I.C.; Voet, T. Single cell genomics: advances and future perspectives. PLoS genetics 2014, 10, e1004126. [Google Scholar] [CrossRef] [PubMed]
- Shi, L.; Wang, Z. Computational strategies for scalable genomics analysis. Genes 2019, 10, 1017. [Google Scholar] [CrossRef] [PubMed]
- Ryva, B.; et al. Wheat germ agglutinin as a potential therapeutic agent for leukemia. Frontiers in oncology 2019, 9, 100. [Google Scholar] [CrossRef]
- Taylor, J.; et al. Alignment for advanced machine learning systems. Ethics of Artificial Intelligence 2016, 342–382. [Google Scholar]





| Approach | Method | Type |
|---|---|---|
| Suffix tree based methods | MUMmer | Local alignment |
| MUMmer 4.0 | Global multiple genome alignment | |
| Suffix tree & Lword | Global multiple genome alignment | |
| Multiple Sequence Alignment (MSA) | Local Alignment | |
| Anchor based methods | LAGAN/ Multi-LAGAN | Global multiple genome alignment |
| ProgressiveMauve | Hierarchical WGA mapping | |
| BlastZ | Local alignment | |
| STELLAR | Local alignment | |
| LASTZ | Local alignment | |
| DIALIGN | Global multiple genome alignment | |
| Graph based methods | AnchorWave | Global alignment |
| MERCATOR | Homology mapping | |
| Mugsy | Hierarchical WGA mapping | |
| BubbZ | Homology mapping | |
| SibeliaZ | Hierarchical WGA mapping |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).