Preprint
Review

This version is not peer-reviewed.

Exploring Pan-Genomes of Crops: Genomic Resources and Tools for Unraveling Gene Evolution, Function, and Adaptation

A peer-reviewed article of this preprint also exists.

Submitted:

21 August 2023

Posted:

23 August 2023

You are already at the latest version

Abstract
The availability of multiple sequenced genomes from a single species made it possible to explore intra- and inter-specific genomic comparisons at higher resolution and build clade-specific pangenomes of several crops. The pan-genomes of crops constructed from various cultivars/accessions, landraces, and wild ancestral species represent a compendium of genes and structural variations and allow researchers to search for the novel genes and alleles that were inadvertently lost in domesticated crops during the historical process of crop domestication or in the process of extensive plant breeding. Fortunately, many valuable genes and alleles associated with desirable traits like disease resistance, abiotic stress tolerance, plant architecture, and nutrition qualities exist in landraces, ancestral species, and crop wild relatives. The novels genes from the wild ancestors and landraces can be introduced back to high-yielding varieties of modern crops by implementing classical plant breeding, genomic selection, and transgenic/gene editing approaches. Thus, pan-genomic represents a great leap in plant research and offers new avenues for targeted breeding for mitigating the impact of global climate change. Here we summarize the tools used for pangenome assembly and annotations, web-portals hosting plant pangenomes. Furthermore, we highlight a few discoveries made in crops using the pan-genomic approach and its future potential.
Keywords: 
;  ;  ;  ;  ;  

1. Introduction

In recent years, advancements in affordable sequencing platforms and computational resources have helped to generate reference-quality genome assemblies for multiple crop accessions/varieties belonging to a single species. Thus, crop genomics has transitioned to the pan-genomic era. This shift has been made possible by advances in both low-cost sequencing (i.e., short-read data Illumina sequencing, long-read PacBio sequencing with low errors, Nanopore sequencing) and bioinformatics tools employed for genomic data processing and genome assemblies (for excellent reviews, see [1,2,3,4,5,6,7]. Whole-genome assembly-based pangenomes have been reported for a few plant species, including rice [8,9,10], barley [11], wheat [12], maize [13], soybean [14], a wild relative of soybean Glycine soja [15], and brassica [16]. In particular, PacBio HiFi reads proved very useful for high-quality polyploid genome assemblies for peanut [17], wheat [18], oilseed [16], strawberry [19] and potato [20].
The intraspecies genome comparisons of crops suggest extensive structural variation across diverse genotypes that affect both the genomic contents and plant function [14,21,22,23]. The structural variations (SVs) among genotypes of the same species include insertion, deletions (indels), and translocation of the small or large genomic regions that further cause presence-absence variation (PAV), copy number variation (CNV), chromosomal rearrangements, repetitive sequences (i.e., tandem gene duplications, repetitive sequences in non-coding regions of the genome, transposable elements, centromere repeats, etc.). The conserved genes present in all cultivars/genotypes/subspecies/strains within a species constitute the “core” genome, and the variable genes represent as the “dispensable” or “accessory” genome [2,24]. The accessory genome consists of the “shell genes” (found in most cultivars within a species) and the “cloud genes “ (present in only a small fraction of cultivars of a same species) as shown in Figure 1.
Thus, a single reference genome represents only a fraction of the species-wide genomic space, and a pan-genome represents species-wide genomic space [2,13,15,25,26]. Often, the pangenomes encompass genes found in crop wild relatives and ancestral species [27,28,29,30,31]. We like to note here that many useful genes which were lost during the crop domestication and extensive plant breeding [32,33,34] may be found in the “dispensable”/accessory genome of any crop species which are needed for adaptation to a range of biotic and abiotic stress conditions, and improving physiology, plant structure, yield and nutrient quality of crops [15,33,35,36,37,38,39]. Thus, the availability of plant pangenomes allows plant researchers and breeders to explore important candidate genes for improving yield and nutritional quality, and resilience of crops to a range of climatic conditions and pathogens. For instance, a few comparative genomic studies have revealed that gene amplification plays a vital role in disease resistance, abiotic stress tolerance, and other agronomic traits associated with plant development, architecture, yield, etc. [40,41,42,43,44,45,46,47,48,49]. The high-quality pangenomes also make it possible to study previously inaccessible regions of the eukaryotic genomes, including centromeres, long heterochromatic blocks, and rDNA regions, etc. that exhibit low recombination, and provide new insights into crop genome evolution [50].
Recently, many excellent reviews have been published on plant pan-genomes (see [1,2,51,52,53,54]. Here, we review the current strategies for constructing and visualizing crop pan-genome data, public genomic portals hosting pan-genes, pan-genome data, pan-genome browsers, and associated analytical tools. Furthermore, we highlight a few studies that have exploited a pan-genomic approach to discovering candidate genes associated with important agronomic traits. We also discuss the potential of pan-genome-driven translational research.

2. Pangenome Construction, Visualization, and Data Analysis Tools

The first step in setting up a pangenome infrastructure is the selection of a diverse set of representative genotypes for sequence assembly that capture as many genetic variants as possible with a limited panel of genotypes [14,21,55,56]. The second step is sequencing of individual genomes. The third step is assembly and construction of pan-genome. Several popular approaches are implemented for pan-genome assembly (for detailed reviews, see [1,57,58,59]. Here we briefly describe the basic tenets of three popular methods (see Figure 2).
The first approach uses a high-quality reference genome for mapping sequence reads generated from all other genotypes. Iterative refinement allows for a progressive improvement of the assembly with additional data. This strategy can minimize errors by exploiting the information from a high-quality reference genome and limiting the coordinate consolidation issue. However, this method requires the availability of a high-quality reference genome, which may not be available for all species or strains. Secondly, it is sensitive to misalignment errors or inaccuracies in the reference genome, which can potentially propagate errors throughout the assembly. Likewise, bias towards the reference genome may limit the detection of novel or divergent sequences.
In species where a reference is unavailable, de novo assembly of individual genotypes is generated, and then assembled contigs are mapped to each other [27]. The de-novo genome assemblies have become a method of choice due to the advances in long-read sequencing and the availability of fast algorithms for aligning long-reads to call structural variants [7].
Conceptually, de novo assembly of multiple high-quality reference genome sequences and their comparison by pair-wise sequence alignment is arguably the most powerful and accurate approach to detect all types of sequence variants at base-level resolution, including novel genomic elements and rearrangements. However, generating assemblies of polyploid plant genomes is still a challenge, and current methods are limited in detecting and phasing heterozygous structural variants, erroneously produce chimeric contigs joining different haplotypes or ignore alternative haplotypes [60,61]. This approach is time-intensive and requires significant computational and bioinformatic resources, especially for large genomes and complex variations. Repeat resolution can be challenging, leading to fragmented assemblies, as it relies heavily on sequencing depth to overcome repetitive regions and complex variations.
The third graph-based approach allows the addition of any variant to the reference as a node at the genomic location where it is discovered, and then haplotypes are associated with one of the reference genomes used to build the graph. Reads are then realigned to this genome leading to more accurate mapping. This method can accommodate new genomic data through iterative refinement, allowing for continuous improvement of the pangenome assembly. However, graph construction and traversal can be computationally intensive, especially for large and diverse pangenomes, and requires substantial computational resources. Typically, graph complexity increases with the addition of more genomes, potentially impacting scalability, and computational efficiency. Nonetheless, the graph-based pan-genomes can represent complex variations, including structural variants and large-scale rearrangements, facilitating identification of shared and unique genomic regions among individuals or strains, and aiding in excellent visualization of pan-genomes. A conceptual visualization of the pan-genome is shown in Figure 3.
Recently, Shang et al., 2022 [21] have constructed ‘Super Pan-genome of rice’ containing high-quality assemblies of 251 rice genomes including 202 accessions of domesticated Asian rice Oryza sativa, 28 accession of a Oryza rufipogon (the wild ancestor of O. sativa), 11 accessions of domesticated African rice Oryza glaberrima, and ten accessions of Oryza barthii (the wild ancestor of O. glaberrima) using de novo long-read assembly and a graph-based approach. A reference-free whole-genome multiple sequence alignment for the 251 rice accessions that can be visualized using any assembly as the reference with the Rice Super Pan-genome Information Resource Database (http://www.ricesuperpir.com). This fully annotated pan-genome graph, visualized using the JBrowse genome browser, also integrated the SVs, gene annotations, transposable elements annotations, pan-genome graph, and BLAST tools [21].
A few excellent reviews have previously been published on the development of computational tools for pan-genome visualizations [57,58,62,63]. We note here that genome sequencing technologies and assembly algorithms, as well as the pan-genome visualization tools, are rapidly evolving to achieve high accuracy and are being complemented by additional independent mapping approaches, such as optical maps and Hi-C to validate structural variant calls (i.e., inversions and translocations). It is important to acknowledge that the details and outcomes of each method may vary based on the specific pangenome assembly tools, parameters, and characteristics of the genomic data employed in the process [64,65]. Here we compiled a summary of the latest representative tools in Table 1. It is crucial to recognize that the development of pan-genome tools is an active field, and the list could not be exhaustive.

3. A Survey of Crop Pan-genomes and Genomic Resources

With technical advances and affordability of the sequencing and assembly of the genomes, we are experiencing a deluge of big data in biology. The plant research community now faces a bigger challenge of making the genomic data findable, accessible, interoperable, and reusable (FAIR) [111,112]. Public databases and genomic resources play a crucial role in making genomic data FAIR and provide tools for visualization of genomic, transcriptomic, proteomic and metabolomic data [113,114,115,116,117,118,119,120]. Furthermore, the secondary knowledgebases synthesize and curate knowledge graphs providing information for gene-gene interactions, metabolic networks, and pathways, and provide the tools for analyses of user’s data in the context of plant genome browser or pathways [90,97,114,115,119,121,122,123,124,125,126,127,128,129,130]. Currently, a substantial number of plant genome browsers, amounting to a few hundred, can be accessed through platforms like Plant Ensembl [120], Phytozome [131], and various clade-specific community databases (for a recent review see [132]). However, the availability of plant pangenome portals is limited and experiencing slow growth. To provide an overview of the current state of crop pan-genomic research, resources, and portals, we compiled Table 2.
It is clear from the Table 2 that at present crop pan-genome research is at its early stage. The pan-genome browsers are available for a few crops, and thus, most of the data is not supported for user-friendly query, visualization and supporting the analysis of the user’s data. However, the very few platforms and genomic databases that support pan-gene analysis in phylogenomic context and support the user’s query show the potential of pan-genome data for answering both basic research and translational applications of this knowledge for crop improvement. Here, we highlight an example of visualizing the pan-gene data for TILLER ANGLE CONTROL 1 (TAC1) transcription factor from various accessions of cultivated rice O. sativa, other members of Oryza genus at Gramene oryza pansite (https://oryza.gramene.org). OsTAC1 is induced by gravistimulation and promotes horizontal shoot growth by negatively regulating shoot gravitropism [160]. Thus, it is involved in regulation of tiller angle and modulating plant architectural traits of agronomic tremendous importance. A comparison of TAC1 protein sequences shows a significant variability at the carboxy terminal between the domesticated rice cultivars of japonica and indica accessions (Figure 4A). Indeed, a previously published study has shown that a point mutation in OsTAC1 gene at the 3’-splicing site of the 1.5-kb intron (‘GGGA’) in japonica rice accessions caused reduction in the expression of this gene leading to a smaller tiller angle. This trait was selected in the japonica rice accessions. In contrast, wild rice accessions and indica rice accessions, which have large tiller angle, contain ‘AGGA’ sequence at 3’-splicing site of the 1.5-kb intron [161]. OsTAC 1 gene is on chromosome 9 in rice genome and shows high conservation across all rice accession (Figure 4B). However, we also see that genomic neighborhood of the TAC1 on left-hand side is not much conserved and it remains to be explored if any of these genes are involved in functional adaptation of rice accessions.
Beside Gramene database, a few more pan-genome portals provide a similar view of pan-genes visualization. The other portal host different views and analysis tools. For example, the Banana Genome Hub uses Panache platform for visualization of pan-genomic data on Musaceae. However, in general the well-established crop genome portals supports users in exploring genes and gene families, chromosome structures, synteny, SVs, gene expression patterns, SNP markers, etc.

4. Plant pan-genomics-driven insights for understanding the basis of agronomic traits

Cereal crops have been the prime subject of agriculture research. Thus, we have matured genomic resources, enriched genome annotations, and genotype and sequence data facilitating comparative genomics and pan-genomics studies (See Table 2). The focus of all categories of genomic research on cereal crops aims to increase the grain yield or plant developmental, physiological, and architectural traits that can support the higher yield. For example, Wang et al. (2022) [109] used rice pan-genome to identify GW5 genes associated with the trait ‘thousand-grain weight’ (TGW), and a novel locus qPH8-1 involved in the regulation of plant height. In another study by Shang et al., 2022 [21] a super pan-genome of rice was constructed helped to identify genetic variants associated with submergence tolerance, seed shattering, and plant architecture [21]. Many important studies have been published in maize, rice, wheat and are being implemented for their improvement.
Notably, more investments are being made in the genomic and pan-genomic research of minor cereal crops, including sorghum and millets suitable for growing in diverse and marginal lands (see Table 2). These crops have a high degree of in-built tolerance for mitigating the impact of harsh environments and resistance against many pests and pathogens. For example, foxtail millet (Setaria italica), is a model plant for studying C4 photosynthesis and developing climate-resilient crops. A pan-genomic study of foxtail millet identified an important genetic variation in the promoter region of SiGW3 that is associated with yield improvements [22]. Another study identified 13 marker-trait associations using proso millet (Panicum miliaceum L.) pan-genome [162]. Similarly, Yan et al. (2023) recently constructed a pearl millet pan-genome that identified over 400,000 genomic structural variants (SVs) and provided insights into heat tolerance. This study identified the RWP-RK gene conferring enhanced heat tolerance [55]. Another group of previously understudied crops is legumes that has gained from genomic and pan-genomic research and breeding efforts (see Table 2). For example, pigeon pea is an important orphan crop mainly grown by smallholder farmers in the tropics and subtropical regions of the world. It has an in-built tolerance for drought stress and is very productive in marginal land with small inputs. A pan-genome study of pigeon pea identified 225 SNPs associated with nine agronomically important traits. These associations will aid pigeon pea germplasm improvement for breeding legumes [154]. Mung bean is an excellent plant-based source of protein and is grown in temperate, subtropical, and tropical regions. In a study, Liu et al. (2022) analyzed 217 mung bean accessions and discovered many novel genes associated with agronomic traits, for example, an SNP in the candidate genes SWEET10 homolog (jg24043) associated with crude starch content; NRT1/PTR FAMILY 2.13 gene for pod length; a homolog of WUSCHEL-family homeobox gene associated with yield; and a PAV event in a multi-gene locus associated with color-related traits [163].
Pan-genome studies have become increasingly important for understanding the genetic diversity of major crops from tropical regions, including cassava (Manihot esculenta) and banana (Musa spp.). These crops are a vital component of diverse ecosystems and play essential roles in the livelihoods of local communities. Insights gained from genomic and pan-genomic research are being deemed vital for improving banana for resistance against Fusarium wilt to safeguard the global banana industry [99,164]. Similarly, the pan-genome of cassava (Manihot esculenta), a staple crop for millions of people in Africa, is helping to score genomic variations, particularly in genes associated with disease resistance and starch biosynthesis [151]. More studies are being performed on important fruit and vegetables (i.e., banana, apple, tomato, melon, citrus, and grape) and oilseed crops (i.e., Brassica, soybean, sunflower, etc.) listed in Table 2. A few studies have uncovered new genes and rare alleles that regulate secondary metabolites associated with color and flavor [46,49,165]; pathogen resistance [43,47,56,79,145,165,166,167]; and abiotic stress tolerance [41,42,55]. These findings foster understanding of the genetic basis of diverse traits in the domesticated crops and offer promising prospects for the introduction of candidate genes (for disease resistance breeding and quality traits) through precise genome editing into elite cultivars.
In conclusion, the acquisition of pangenome holds immense potential for pursuing fundamental questions, especially for climate-change resilient breeding and overall crop improvement. Utilizing pangenomes enables researchers to (1) Identify gains and losses of genetic regions and structural variations strongly associated with desirable fitness phenotypes such as abiotic stress and disease tolerance, growth and development, yield, biomass, and performance. (2) Identify gains and losses in protein-coding regions and/or epigenetic features between crops and closely related species. (3) Identify pathways, gene networks, expression profiles, and transcript isoforms that are enriched for given major and minor quantitative trait loci (QTLs) with desired phenotypes and adaptation traits. (4) Project functional and phenotype homologs from a well-studied species onto a new/less-studied species through whole genome comparisons and synteny. (5) Advance plant breeding efforts by mapping/querying/visualizing public or personal project data to build or test hypotheses, discover markers, gain knowledge, and support state-of-the-art breeding applications. These goals and the knowledge required to achieve them represent important pieces of an overall puzzle and therefore require exploration. In summary, harnessing the full potential of pan-genomes for targeted crop breeding is an important goal of the plant research and breeding community. The integration of diverse genomic tools, including pan-genome, marker-assisted selection, genomic selection, and gene editing techniques, will play a pivotal role in developing climate-resilient crops that can adapt to and thrive in the face of climate changes, securing global food security and sustainability (see Figure 5).

5. Outlook, opportunities, and innovations in plant pan-genome research

Plant genomes are often very large and complex, making it difficult to produce high-quality genome assemblies with accurate gene annotations. For over last three decades, genotypic variations have been captured using various genetic markers, including RFLP, RAPD, SNPs, SSRs, microsatellite markers, etc. to establish connections between genotype and phenotype to aid plant breeding and cultivar improvement [168,169,170,171,172,173]. However, the advances in the short-read and long-read-based next-generation sequencing platforms and computational methods required for processing large-scale genomic and transcriptomic data have facilitated the rapid and cost-effective whole genome sequences, including de-novo genome assemblies [7,174,175]. These whole genome sequence comparisons are driving pan-genomic research and, first time in history, allowing researchers to explore intra- and inter-species structural variations at the resolution of nucleotide sequence level. We can expect a significant expansion in the availability of pangenomes across a broader range of plant species. The pangenomes facilitate the identification of genes that are conserved across species and genes that are unique to specific species or a subset of accessions. The availability of species-specific or genus-level pangenomes of crops is crucial for understanding the dynamics of their genome evolution, including the impact of artificial selection and domestication, crop diversification, and adaptation under varied environments [50,174,176,177]. For instance, a super-pangenome of the Citrullus genus comprising of 346 cultivated watermelon accessions and 201 wild accessions suggested that a duplication of the sugar transporter gene ClTST2 was likely selected during domestication for higher fruit sweetness; the wild accessions harbor many genes related to disease resistance compared to cultivated watermelon accessions; and identified domestication sweeps in cultivated watermelon [178].
In addition, plant pan-genome can help to advance the fundamental understanding of the plant kingdom at various scales. It can aid in improving functional annotations of genes and genomes [179,180], and provide insights into specific roles and interactions of different genetic elements. Furthermore, pangenomes could help to understand the evolution of metabolic diversity across diverse taxonomic clades [97]. This knowledge has implications for improving crop traits and developing more resilient and sustainable agricultural practices. Moreover, by analyzing the genomic diversity represented in pangenomes, scientists can understand the distribution and composition of vegetation across different environments. This information is critical for conservation efforts, ecological studies, and developing strategies to protect and sustainably manage plant resources.
Finally, we see a great application of pan-genome data in improving gene annotations and identifying evolutionary conserved sets of genes associated with important agronomic traits. Likewise, comparative genomic studies can help to identify and annotate clade-specific unique genes that determine metabolite compositions of important fruit and vegetable crops or other categories. These specialized metabolic pathways and associated entities can be easily curated in the pathway databases [97,121,123,124,128,181,182]. The workflows for interspecies gene family comparisons, GO annotations, and standard protocols for gene biocuration are very efficient and established; they can easily accommodate the insights gained from intra-species comparisons. For instance, the availability of whole genome sequences of plants has contributed tremendously to the knowledge of gene duplications, gene family evolution, and functional diversification of homologous genes [179,180,183]. Gene Ontology (GO) and Plant Ontology (PO) annotations have played a central role in accessing the potential gene functions [184,185,186]. Furthermore, the comprehensive analysis of plant transcriptomes has helped us to link genes with potential biological processes, pathways, and response to biotic and abiotic stress or stimulants [179,180,187,188,189]. Integrating pangenomes with other omics, such as transcriptomics, epigenomics, proteomics, and phenomics data, will enable a comprehensive understanding of gene regulation and functional mechanisms underlying important agronomic traits. In conclusion, as shown in Figure 5, pan-genomic research is driving functional genomics, evolutionary studies, and biodiversity exploration and consequently, holds great potential for crop improvement, environmental conservation, and sustainable agriculture.
It is important to emphasize that pan-genome construction and whole genome-level comparative analysis require substantial computational infrastructure, expertise in various facets including sequence generation, genome assembly and annotation; and subsequent bioinformatic analyses and data visualization. In general, all these tasks are beyond the capacity of individual research laboratories, and thus, they require extensive infrastructure and bioinformatic support from their institutions and public databases. Public data repositories, databases, genomic resources, and secondary knowledgebases play an essential role in aiding the community of researchers in providing ontologies [184], archiving and annotating genomic data, and supporting analysis and visualization of omics data to support data-driven hypotheses and experimental plans [90,97,115,128]. Here we have reviewed the tools and genomic resources available to the plant research community for conducting pan-genomic research.
We note here that the resources and tools for pan-genome visualization and to support researchers for exploring pan-genomic data are limited and at an early developmental stage. We have witnessed the growing number of publications on crop pan-genomes in the last five years (see Table 2); however, the pan-genomic and genomic diversity data for the majority of the crops are stored and archived with no associated tools/features required for visualization and effective use by other researchers. Thus, advancement and innovations in pan-genomic data visualization, analysis tools, and additional biocuration of genomic data are needed to facilitate meaningful intraspecies and interspecies genomic comparisons. Community biocuration can play an essential role in making sense of the big data and ensuring quality controls at various steps [190]. This will allow researchers to study pan-genomes in more detail and identify genes and genomic regions associated with important traits. Equally important is integrating crop pan-genomes in comprehensive knowledgebases that host analyzed and annotated genomic and pathway data for ensuring the FAIR data policy implementations [111,132].
From a technical aspect, integrating machine learning and artificial intelligence techniques will expedite the analysis of complex pan-genomic datasets, aid in identifying patterns, and accelerate the construction of predictive models and gene biocuration.
Furthermore, the genomic resources are more mature for the model plants and major crop species [132]. But the necessary financial and infrastructure support for minor crops, fruit and vegetable crops, and orphan crops is still insufficient [191,192,193]. Although the cost of sequencing a single genome has decreased significantly in recent years, sequencing multiple genomes can still be prohibitive, a significant barrier to conducting pan-genome research in poorly studied crops or orphan crops.
We expect that this review will help researchers to understand and evaluate the status and strategies being employed for pan-genomic studies in crops, and encourage them to seek necessary collaborations within the community or help to advocate for increased funding for developing the infrastructure, tools, and biocuration of genomic data that is vital for making needed progress. Furthermore, we hope this review can help students and young researchers learn about the status of crop genomic research and the future potential of pan-genomics.

Author Contributions

S.N. conceptualized the initial study plan and coordinated the collaboration with all the authors. S.N., C.D., S.K.S., and P.J. conducted a review of the literature and contributed to writing the manuscript. S.N., C.D. and P.J. created all the figures for the manuscript. All authors have read and agreed to this version of the manuscript.

Funding

SN acknowledges funding from National Aeronautics and Space Administration (NASA), USA #80NSSC22K0891. PJ acknowledges funding from NASA #80NSSC22K0855 and National Science Foundation, USA #2029854.

Data Availability Statement

We have provided all the data within this manuscript.

Acknowledgments

SN and PJ thank the members of the Gramene project team at Cold Spring Harbor Laboratory, the Ensembl Plants team at European Bioinformatics Institute, and the human Reactome project and the Ontario Institute for Cancer Research for their support and interest in strong cross-platform data integration. SKS acknowledges support from the 10KP project (https://db.cngb.org/10kp) and China National GeneBank (CNGB; https://www.cngb.org/). CD thanks the bioinformatics teams at The New Zealand Institute for Plant and Food Research Limited (PFR), breeders at Kiwifruit Breeding Centre (KBC), and researchers at China Agricultural University (CAU) for their support.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Computational Pan-Genomics, Consortium. "Computational Pan-Genomics: Status, Promises and Challenges." Brief Bioinform 19, no. 1 (2018): 118-35.
  2. Della Coletta, R.; Qiu, Y.; Ou, S.; Hufford, M.B.; Hirsch, C.N. How the pan-genome is changing crop genomics and improvement. Genome Biol. 2021, 22, 1–19. [Google Scholar] [CrossRef]
  3. Ho, S.S.; Urban, A.E.; Mills, R.E. Structural variation in the sequencing era. Nat. Rev. Genet. 2019, 21, 171–189. [Google Scholar] [CrossRef]
  4. Kyriakidou, M.; Tai, H.H.; Anglin, N.L.; Ellis, D.; Strömvik, M.V. Current Strategies of Polyploid Plant Genome Sequence Assembly. Front. Plant Sci. 2018, 9, 1660. [Google Scholar] [CrossRef] [PubMed]
  5. Sedlazeck, F.J.; Rescheneder, P.; Smolka, M.; Fang, H.; Nattestad, M.; von Haeseler, A.; Schatz, M.C. Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods 2018, 15, 461–468. [Google Scholar] [CrossRef] [PubMed]
  6. Wang, Y., J. Yu, M. Jiang, W. Lei, X. Zhang, and H. Tang. "Sequencing and Assembly of Polyploid Genomes." Methods in Molecular Biology 2545 (2023): 429-58.
  7. Sahu, S.K.; Liu, H. Long-read sequencing (method of the year 2022): The way forward for plant omics research. Mol. Plant 2023, 16, 791–793. [Google Scholar] [CrossRef] [PubMed]
  8. Zhou, Y.; Chebotarov, D.; Kudrna, D.; Llaca, V.; Lee, S.; Rajasekar, S.; Mohammed, N.; Al-Bader, N.; Sobel-Sorenson, C.; Parakkal, P.; et al. A platinum standard pan-genome resource that represents the population structure of Asian rice. Sci. Data 2020, 7, 1–11. [Google Scholar] [CrossRef]
  9. Wang, W.; Mauleon, R.; Hu, Z.; Chebotarov, D.; Tai, S.; Wu, Z.; Li, M.; Zheng, T.; Fuentes, R.R.; Zhang, F.; et al. Genomic variation in 3,010 diverse accessions of Asian cultivated rice. Nature 2018, 557, 43–49. [Google Scholar] [CrossRef]
  10. Schatz, M. C. G. Maron, J. C. Stein, A. Hernandez Wences, J. Gurtowski, E. Biggers, H. Lee, M. Kramer, E. Antoniou, E. Ghiban, M. H. Wright, J. M. Chia, D. Ware, S. R. McCouch, and W. R. McCombie. "Whole Genome De Novo Assemblies of Three Divergent Strains of Rice, Oryza Sativa, Document Novel Gene Space of Aus and Indica." Genome Biology 15, no. 11 (2014): 506.
  11. Jayakodi, M.; Padmarasu, S.; Haberer, G.; Bonthala, V.S.; Gundlach, H.; Monat, C.; Lux, T.; Kamal, N.; Lang, D.; Himmelbach, A.; et al. The barley pan-genome reveals the hidden legacy of mutation breeding. Nature 2020, 588, 284–289. [Google Scholar] [CrossRef]
  12. Walkowiak, S.; Gao, L.; Monat, C.; Haberer, G.; Kassa, M.T.; Brinton, J.; Ramirez-Gonzalez, R.H.; Kolodziej, M.C.; Delorean, E.; Thambugala, D.; et al. Multiple wheat genomes reveal global variation in modern breeding. Nature 2020, 588, 277–283. [Google Scholar] [CrossRef]
  13. Hirsch, C.N.; Foerster, J.M.; Johnson, J.M.; Sekhon, R.S.; Muttoni, G.; Vaillancourt, B.; Peñagaricano, F.; Lindquist, E.; Pedraza, M.A.; Barry, K.; et al. Insights into the Maize Pan-Genome and Pan-Transcriptome. Plant Cell 2014, 26, 121–135. [Google Scholar] [CrossRef]
  14. Liu, Y.; Du, H.; Li, P.; Shen, Y.; Peng, H.; Liu, S.; Zhou, G.-A.; Zhang, H.; Liu, Z.; Shi, M.; et al. Pan-Genome of Wild and Cultivated Soybeans. Cell 2020, 182, 162–176. [Google Scholar] [CrossRef] [PubMed]
  15. Li, Y.-H.; Zhou, G.; Ma, J.; Jiang, W.; Jin, L.-G.; Zhang, Z.; Guo, Y.; Zhang, J.; Sui, Y.; Zheng, L.; et al. De novo assembly of soybean wild relatives for pan-genome analysis of diversity and agronomic traits. Nat. Biotechnol. 2014, 32, 1045–1052. [Google Scholar] [CrossRef] [PubMed]
  16. Song, J.-M.; Guan, Z.; Hu, J.; Guo, C.; Yang, Z.; Wang, S.; Liu, D.; Wang, B.; Lu, S.; Zhou, R.; et al. Eight high-quality genomes reveal pan-genome architecture and ecotype differentiation of Brassica napus. Nat. Plants 2020, 6, 34–45. [Google Scholar] [CrossRef]
  17. Zhuang, W. Chen, M. Yang, J. Wang, M. K. Pandey, C. Zhang, W. C. Chang, L. Zhang, X. Zhang, R. Tang, V. Garg, X. Wang, H. Tang, C. N. Chow, J. Wang, Y. Deng, D. Wang, A. W. Khan, Q. Yang, T. Cai, P. Bajaj, K. Wu, B. Guo, X. Zhang, J. Li, F. Liang, J. Hu, B. Liao, S. Liu, A. Chitikineni, H. Yan, Y. Zheng, S. Shan, Q. Liu, D. Xie, Z. Wang, S. A. Khan, N. Ali, C. Zhao, X. Li, Z. Luo, S. Zhang, R. Zhuang, Z. Peng, S. Wang, G. Mamadou, Y. Zhuang, Z. Zhao, W. Yu, F. Xiong, W. Quan, M. Yuan, Y. Li, H. Zou, H. Xia, L. Zha, J. Fan, J. Yu, W. Xie, J. Yuan, K. Chen, S. Zhao, W. Chu, Y. Chen, P. Sun, F. Meng, T. Zhuo, Y. Zhao, C. Li, G. He, Y. Zhao, C. Wang, P. B. Kavikishor, R. L. Pan, A. H. Paterson, X. Wang, R. Ming, and R. K. Varshney. "The Genome of Cultivated Peanut Provides Insight into Legume Karyotypes, Polyploid Evolution and Crop Domestication." Nature Genetics 51, no. 5 (2019): 865-76.
  18. International Wheat Genome Sequencing, Consortium. "Shifting the Limits in Wheat Research and Breeding Using a Fully Annotated Reference Genome." Science 361, no. 6403 (2018).
  19. Edger, P.P.; Poorten, T.J.; VanBuren, R.; Hardigan, M.A.; Colle, M.; McKain, M.R.; Smith, R.D.; Teresi, S.J.; Nelson, A.D.L.; Wai, C.M.; et al. Origin and evolution of the octoploid strawberry genome. Nat. Genet. 2019, 51, 541–547. [Google Scholar] [CrossRef] [PubMed]
  20. Kyriakidou, M.; Anglin, N.L.; Ellis, D.; Tai, H.H.; Strömvik, M.V. Genome assembly of six polyploid potato genomes. Sci. Data 2020, 7, 1–6. [Google Scholar] [CrossRef] [PubMed]
  21. Shang, L.; Li, X.; He, H.; Yuan, Q.; Song, Y.; Wei, Z.; Lin, H.; Hu, M.; Zhao, F.; Zhang, C.; et al. A super pan-genomic landscape of rice. Cell Res. 2022, 32, 878–896. [Google Scholar] [CrossRef]
  22. He, Q.; Tang, S.; Zhi, H.; Chen, J.; Zhang, J.; Liang, H.; Alam, O.; Li, H.; Zhang, H.; Xing, L.; et al. A graph-based genome and pan-genome variation of the model plant Setaria. Nat. Genet. 2023, 55, 1232–1242. [Google Scholar] [CrossRef]
  23. Yap, I. V., D. Schneider, J. Kleinberg, D. Matthews, S. Cartinhour, and S. R. McCouch. "A Graph-Theoretic Approach to Comparing and Integrating Genetic, Physical and Sequence-Based Maps." Genetics 165, no. 4 (2003): 2235-47.
  24. Tettelin, H.; Masignani, V.; Cieslewicz, M.J.; Donati, C.; Medini, D.; Ward, N.L.; Angiuoli, S.V.; Crabtree, J.; Jones, A.L.; Durkin, A.S.; et al. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: Implications for the microbial "pan-genome". Proc. Natl. Acad. Sci. USA 2005, 102, 13950–13955. [Google Scholar] [CrossRef]
  25. Springer, N.M.; Ying, K.; Fu, Y.; Ji, T.; Yeh, C.-T.; Jia, Y.; Wu, W.; Richmond, T.; Kitzman, J.; Rosenbaum, H.; et al. Maize Inbreds Exhibit High Levels of Copy Number Variation (CNV) and Presence/Absence Variation (PAV) in Genome Content. PLOS Genet. 2009, 5, e1000734. [Google Scholar] [CrossRef]
  26. E Anderson, J.; Kantar, M.B.; Kono, T.Y.; Fu, F.; O Stec, A.; Song, Q.; Cregan, P.B.; E Specht, J.; Diers, B.W.; Cannon, S.B.; et al. A Roadmap for Functional Structural Variants in the Soybean Genome. G3 Genes|Genomes|Genetics 2014, 4, 1307–1318. [Google Scholar] [CrossRef]
  27. Golicz, A.A.; Bayer, P.E.; Barker, G.C.; Edger, P.P.; Kim, H.; Martinez, P.A.; Chan, C.K.K.; Severn-Ellis, A.; McCombie, W.R.; Parkin, I.A.P.; et al. The pangenome of an agronomically important crop plant Brassica oleracea. Nat. Commun. 2016, 7, 13390. [Google Scholar] [CrossRef]
  28. Tao, Y.; Luo, H.; Xu, J.; Cruickshank, A.; Zhao, X.; Teng, F.; Hathorn, A.; Wu, X.; Liu, Y.; Shatte, T.; et al. Extensive variation within the pan-genome of cultivated and wild sorghum. Nat. Plants 2021, 7, 766–773. [Google Scholar] [CrossRef]
  29. Xu, X.; Liu, X.; Ge, S.; Jensen, J.D.; Hu, F.; Dong, Y.; Gutenkunst, R.N.; Fang, L.; Huang, L.; Li, J.; et al. Resequencing 50 accessions of cultivated and wild rice yields markers for identifying agronomically important genes. Nat. Biotechnol. 2011, 30, 105–111. [Google Scholar] [CrossRef] [PubMed]
  30. Lam, H.-M.; Xu, X.; Liu, X.; Chen, W.; Yang, G.; Wong, F.-L.; Li, M.-W.; He, W.; Qin, N.; Wang, B.; et al. Resequencing of 31 wild and cultivated soybean genomes identifies patterns of genetic diversity and selection. Nat. Genet. 2010, 42, 1053–1059. [Google Scholar] [CrossRef] [PubMed]
  31. Gui, S.; Wei, W.; Jiang, C.; Luo, J.; Chen, L.; Wu, S.; Li, W.; Wang, Y.; Li, S.; Yang, N.; et al. A pan-Zea genome map for enhancing maize improvement. Genome Biol. 2022, 23, 1–22. [Google Scholar] [CrossRef] [PubMed]
  32. Allaby, R.G.; Ware, R.L.; Kistler, L. A re-evaluation of the domestication bottleneck from archaeogenomic evidence. Evol. Appl. 2018, 12, 29–37. [Google Scholar] [CrossRef]
  33. Tirnaz, S. Zandberg, W. J. W. Thomas, J. Marsh, D. Edwards, and J. Batley. "Application of Crop Wild Relatives in Modern Breeding: An Overview of Resources, Experimental and Computational Methodologies." Frontiers of Plant Science 13 (2022): 1008904.
  34. Papa, R.; Gepts, P. Asymmetry of gene flow and differential geographical structure of molecular diversity in wild and domesticated common bean (Phaseolus vulgaris L.) from Mesoamerica. Theor. Appl. Genet. 2003, 106, 239–250. [Google Scholar] [CrossRef]
  35. McNally, K.L.; Childs, K.L.; Bohnert, R.; Davidson, R.M.; Zhao, K.; Ulat, V.J.; Zeller, G.; Clark, R.M.; Hoen, D.R.; Bureau, T.E.; et al. Genomewide SNP variation reveals relationships among landraces and modern varieties of rice. Proc. Natl. Acad. Sci. 2009, 106, 12273–12278. [Google Scholar] [CrossRef]
  36. Brozynska, M.; Furtado, A.; Henry, R.J. Genomics of crop wild relatives: expanding the gene pool for crop improvement. Plant Biotechnol. J. 2015, 14, 1070–1085. [Google Scholar] [CrossRef]
  37. Bohra, A.; Kilian, B.; Sivasankar, S.; Caccamo, M.; Mba, C.; McCouch, S.R.; Varshney, R.K. Reap the crop wild relatives for breeding future crops. Trends Biotechnol. 2022, 40, 412–431. [Google Scholar] [CrossRef]
  38. McCouch, S. R. , and L. H. Rieseberg. "Harnessing Crop Diversity." Proceedings of the National Academy of Sciences of the United States of America 120, no. 14 (2023): e2221410120.
  39. McCouch, S. Toward a plant genomics initiative: Thoughts on the value of cross-species and cross-genera comparisons in the grasses. Proc. Natl. Acad. Sci. 1998, 95, 1983–1985. [Google Scholar] [CrossRef] [PubMed]
  40. Würschum, T.; Rapp, M.; Miedaner, T.; Longin, C.F.H.; Leiser, W.L. Copy number variation of Ppd-B1 is the major determinant of heading time in durum wheat. BMC Genet. 2019, 20, 64–8. [Google Scholar] [CrossRef] [PubMed]
  41. Knox, A.K.; Dhillon, T.; Cheng, H.; Tondelli, A.; Pecchioni, N.; Stockinger, E.J. CBF gene copy number variation at Frost Resistance-2 is associated with levels of freezing tolerance in temperate-climate cereals. Theor. Appl. Genet. 2010, 121, 21–35. [Google Scholar] [CrossRef] [PubMed]
  42. Maron, L. G. T. Guimaraes, M. Kirst, P. S. Albert, J. A. Birchler, P. J. Bradbury, E. S. Buckler, A. E. Coluccio, T. V. Danilova, D. Kudrna, J. V. Magalhaes, M. A. Pineros, M. C. Schatz, R. A. Wing, and L. V. Kochian. "Aluminum Tolerance in Maize Is Associated with Higher Mate1 Gene Copy Number." Proceedings of the National Academy of Sciences of the United States of America 110, no. 13 (2013): 5241-6.
  43. Cook, D.E.; Lee, T.G.; Guo, X.; Melito, S.; Wang, K.; Bayless, A.M.; Wang, J.; Hughes, T.J.; Willis, D.K.; Clemente, T.E.; et al. Copy Number Variation of Multiple Genes at Rhg1 Mediates Nematode Resistance in Soybean. Science 2012, 338, 1206–1209. [Google Scholar] [CrossRef] [PubMed]
  44. Liu, Q.; Xu, J.; Zhu, Y.; Mo, Y.; Yao, X.-F.; Wang, R.; Ku, W.; Huang, Z.; Xia, S.; Tong, J.; et al. The Copy Number Variation of OsMTD1 Regulates Rice Plant Architecture. Front. Plant Sci. 2021, 11. [Google Scholar] [CrossRef] [PubMed]
  45. Wang, Y.; Xiong, G.; Hu, J.; Jiang, L.; Yu, H.; Xu, J.; Fang, Y.; Zeng, L.; Xu, E.; Xu, J.; et al. Copy number variation at the GL7 locus contributes to grain size diversity in rice. Nat. Genet. 2015, 47, 944–948. [Google Scholar] [CrossRef]
  46. Bosman, R.N.; Vervalle, J.A.-M.; November, D.L.; Burger, P.; Lashbrooke, J.G. Grapevine genome analysis demonstrates the role of gene copy number variation in the formation of monoterpenes. Front. Plant Sci. 2023, 14, 1112214. [Google Scholar] [CrossRef]
  47. Falginella, L., S. D. Castellarin, R. Testolin, G. A. Gambetta, M. Morgante, and G. Di Gaspero. "Expansion and Subfunctionalisation of Flavonoid 3',5'-Hydroxylases in the Grapevine Lineage." BMC Genomics 11 (2010): 562.
  48. Nilsen, K. T., S. Walkowiak, D. Xiang, P. Gao, T. D. Quilichini, I. R. Willick, B. Byrns, A. N'Diaye, J. Ens, K. Wiebe, Y. Ruan, R. D. Cuthbert, M. Craze, E. J. Wallington, J. Simmonds, C. Uauy, R. Datla, and C. J. Pozniak. "Copy Number Variation of Tddof Controls Solid-Stemmed Architecture in Wheat." Proceedings of the National Academy of Sciences of the United States of America 117, no. 46 (2020): 28708-18.
  49. Gao, L.; Gonda, I.; Sun, H.; Ma, Q.; Bao, K.; Tieman, D.M.; Burzynski-Chang, E.A.; Fish, T.L.; Stromberg, K.A.; Sacks, G.L.; et al. The tomato pan-genome uncovers new genes and a rare allele regulating fruit flavor. Nat. Genet. 2019, 51, 1044–1051. [Google Scholar] [CrossRef]
  50. Liu, J.; Dawe, R.K. Large haplotypes highlight a complex age structure within the maize pan-genome. Genome Res. 2023, 33, 359–370. [Google Scholar] [CrossRef]
  51. Tao, Y.; Zhao, X.; Mace, E.; Henry, R.; Jordan, D. Exploring and Exploiting Pan-genomics for Crop Improvement. Mol. Plant 2019, 12, 156–169. [Google Scholar] [CrossRef]
  52. Bayer, P.E.; Golicz, A.A.; Scheben, A.; Batley, J.; Edwards, D. Plant pan-genomes are the new reference. Nat. Plants 2020, 6, 914–920. [Google Scholar] [CrossRef]
  53. Jayakodi, M.; Schreiber, M.; Stein, N.; Mascher, M. Building pan-genome infrastructures for crop plants and their use in association genetics. DNA Res. 2021, 28. [Google Scholar] [CrossRef] [PubMed]
  54. Li, W., J. Liu, H. Zhang, Z. Liu, Y. Wang, L. Xing, Q. He, and H. Du. "Plant Pan-Genomics: Recent Advances, New Challenges, and Roads Ahead." Journal of Genetics and Genomics 49, no. 9 (2022): 833-46.
  55. Yan, H.; Sun, M.; Zhang, Z.; Jin, Y.; Zhang, A.; Lin, C.; Wu, B.; He, M.; Xu, B.; Wang, J.; et al. Pangenomic analysis identifies structural variation associated with heat tolerance in pearl millet. Nat. Genet. 2023, 55, 507–518. [Google Scholar] [CrossRef] [PubMed]
  56. Zhou, H.; Yan, F.; Hao, F.; Ye, H.; Yue, M.; Woeste, K.; Zhao, P.; Zhang, S. Pan-genome and transcriptome analyses provide insights into genomic variation and differential gene expression profiles related to disease resistance and fatty acid biosynthesis in eastern black walnut (Juglans nigra). Hortic. Res. 2023, 10, uhad015. [Google Scholar] [CrossRef] [PubMed]
  57. Golicz, A. A., J. Batley, and D. Edwards. "Towards Plant Pangenomics." Plant Biotechnology Journal 14, no. 4 (2016): 1099-105.
  58. Garrison, E.; Sirén, J.; Novak, A.M.; Hickey, G.; Eizenga, J.M.; Dawson, E.T.; Jones, W.; Garg, S.; Markello, C.; Lin, M.F.; et al. Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat. Biotechnol. 2018, 36, 875–879. [Google Scholar] [CrossRef]
  59. Rakocevic, G.; Semenyuk, V.; Lee, W.-P.; Spencer, J.; Browning, J.; Johnson, I.J.; Arsenijevic, V.; Nadj, J.; Ghose, K.; Suciu, M.C.; et al. Fast and accurate genomic analyses using genome graphs. Nat. Genet. 2019, 51, 354–362. [Google Scholar] [CrossRef]
  60. Cheng, H.; Concepcion, G.T.; Feng, X.; Zhang, H.; Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 2021, 18, 170–175. [Google Scholar] [CrossRef]
  61. Padgitt-Cobb, L.K.; Kingan, S.B.; Wells, J.; Elser, J.; Kronmiller, B.; Moore, D.; Concepcion, G.; Peluso, P.; Rank, D.; Jaiswal, P.; et al. A draft phased assembly of the diploid Cascade hop (Humulus lupulus) genome. Plant Genome 2021, 14, e20072. [Google Scholar] [CrossRef]
  62. Eizenga, J. M. M. Novak, J. A. Sibbesen, S. Heumos, A. Ghaffaari, G. Hickey, X. Chang, J. D. Seaman, R. Rounthwaite, J. Ebler, M. Rautiainen, S. Garg, B. Paten, T. Marschall, J. Siren, and E. Garrison. "Pangenome Graphs." Annu Rev Genomics Hum Genet 21 (2020): 139-62.
  63. Hickey, G.; Heller, D.; Monlong, J.; Sibbesen, J.A.; Sirén, J.; Eizenga, J.; Dawson, E.T.; Garrison, E.; Novak, A.M.; Paten, B. Genotyping structural variants in pangenome graphs using the vg toolkit. Genome Biol. 2020, 21, 1–17. [Google Scholar] [CrossRef]
  64. Vernikos, G. S. "A Review of Pangenome Tools and Recent Studies." In The Pangenome: Diversity, Dynamics and Evolution of Genomes, edited by H. Tettelin and D. Medini, 89-112. Cham (CH), 2020.
  65. Glick, L.; Mayrose, I. The Effect of Methodological Considerations on the Construction of Gene-Based Plant Pan-genomes. Genome Biol. Evol. 2023, 15. [Google Scholar] [CrossRef]
  66. Koren, S.; Walenz, B.P.; Berlin, K.; Miller, J.R.; Bergman, N.H.; Phillippy, A.M. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 2017, 27, 722–736. [Google Scholar] [CrossRef]
  67. Kolmogorov, M.; Yuan, J.; Lin, Y.; Pevzner, P.A. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 2019, 37, 540–546. [Google Scholar] [CrossRef] [PubMed]
  68. Swain, M.T.; Tsai, I.J.; A Assefa, S.; Newbold, C.; Berriman, M.; Otto, T.D. A post-assembly genome-improvement toolkit (PAGIT) to obtain annotated genomes from contigs. Nat. Protoc. 2012, 7, 1260–1284. [Google Scholar] [CrossRef] [PubMed]
  69. Li, D.; Liu, C.-M.; Luo, R.; Sadakane, K.; Lam, T.-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 2015, 31, 1674–1676. [Google Scholar] [CrossRef] [PubMed]
  70. Tolstoganov, I.; Bankevich, A.; Chen, Z.; A Pevzner, P. cloudSPAdes: assembly of synthetic long reads using de Bruijn graphs. Bioinformatics 2019, 35, i61–i70. [Google Scholar] [CrossRef]
  71. Meleshko, D.; Mohimani, H.; Tracanna, V.; Hajirasouliha, I.; Medema, M.H.; Korobeynikov, A.; Pevzner, P.A. BiosyntheticSPAdes: reconstructing biosynthetic gene clusters from assembly graphs. Genome Res. 2019, 29, 1352–1362. [Google Scholar] [CrossRef]
  72. Li, H.; Feng, X.; Chu, C. The design and construction of reference pangenome graphs with minigraph. Genome Biol. 2020, 21, 1–19. [Google Scholar] [CrossRef]
  73. Garrison, E., A. Guarracino, S. Heumos, F. Villani, Z. Bao, L. Tattini, J. Hagmann, S. Vorbrugg, S. Marco-Sola, C. Kubica, D. G. Ashbrook, K. Thorell, R. L. Rusholme-Pilcher, G. Liti, E. Rudbeck, S. Nahnsen, Z. Yang, M. N. Moses, F. L. Nobrega, Y. Wu, H. Chen, J. de Ligt, P. H. Sudmant, N. Soranzo, V. Colonna, R. W. Williams, and P. Prins. "Building Pangenome Graphs." bioRxiv (2023).
  74. Hickey, G.; Monlong, J.; Ebler, J.; Novak, A.M.; Eizenga, J.M.; Gao, Y.; Abel, H.J.; Antonacci-Fulton, L.L.; Asri, M.; Baid, G.; et al. Pangenome graph construction from genome alignments with Minigraph-Cactus. Nat. Biotechnol. 2023, 1–11. [Google Scholar] [CrossRef]
  75. Armstrong, J.; Hickey, G.; Diekhans, M.; Fiddes, I.T.; Novak, A.M.; Deran, A.; Fang, Q.; Xie, D.; Feng, S.; Stiller, J.; et al. Progressive Cactus is a multiple-genome aligner for the thousand-genome era. Nature 2020, 587, 246–251. [Google Scholar] [CrossRef]
  76. Jonkheer, E. M. M. van Workum, S. Sheikhizadeh Anari, B. Brankovics, J. R. de Haan, L. Berke, T. A. J. van der Lee, D. de Ridder, and S. Smit. "Pantools V3: Functional Annotation, Classification and Phylogenomics." Bioinformatics 38, no. 18 (2022): 4403-05.
  77. Guarracino, A.; Heumos, S.; Nahnsen, S.; Prins, P.; Garrison, E. ODGI: understanding pangenome graphs. Bioinformatics 2022, 38, 3319–3326. [Google Scholar] [CrossRef]
  78. Ewels, P.A.; Peltzer, A.; Fillinger, S.; Patel, H.; Alneberg, J.; Wilm, A.; Garcia, M.U.; Di Tommaso, P.; Nahnsen, S. The nf-core framework for community-curated bioinformatics pipelines. Nat. Biotechnol. 2020, 38, 276–278. [Google Scholar] [CrossRef] [PubMed]
  79. Vaughn, J.N.; Branham, S.E.; Abernathy, B.; Hulse-Kemp, A.M.; Rivers, A.R.; Levi, A.; Wechter, W.P. Graph-based pangenomics maximizes genotyping density and reveals structural impacts on fungal resistance in melon. Nat. Commun. 2022, 13, 1–14. [Google Scholar] [CrossRef] [PubMed]
  80. Li, H. Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics 2018, 34, 3094–3100. [Google Scholar] [CrossRef] [PubMed]
  81. Marçais, G.; Delcher, A.L.; Phillippy, A.M.; Coston, R.; Salzberg, S.L.; Zimin, A. MUMmer4: A fast and versatile genome alignment system. PLOS Comput. Biol. 2018, 14, e1005944. [Google Scholar] [CrossRef]
  82. Rautiainen, M.; Marschall, T. GraphAligner: rapid and versatile sequence-to-graph alignment. Genome Biol. 2020, 21, 1–28. [Google Scholar] [CrossRef]
  83. Kavya, V.N.S.; Tayal, K.; Srinivasan, R.; Sivadasan, N. Sequence Alignment on Directed Graphs. J. Comput. Biol. 2019, 26, 53–67. [Google Scholar] [CrossRef]
  84. Büchler, T.; Olbrich, J.; Ohlebusch, E. Efficient short read mapping to a pangenome that is represented by a graph of ED strings. Bioinformatics 2023, 39. [Google Scholar] [CrossRef]
  85. Poplin, R.; Chang, P.-C.; Alexander, D.; Schwartz, S.; Colthurst, T.; Ku, A.; Newburger, D.; Dijamco, J.; Nguyen, N.; Afshar, P.T.; et al. A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 2018, 36, 983–987. [Google Scholar] [CrossRef]
  86. Yun, T.; Li, H.; Chang, P.-C.; Lin, M.F.; Carroll, A.; McLean, C.Y. Accurate, scalable cohort variant calls using DeepVariant and GLnexus. Bioinformatics 2020, 36, 5582–5589. [Google Scholar] [CrossRef]
  87. Chiang, C.; Layer, R.M.; Faust, G.G.; Lindberg, M.R.; Rose, D.B.; Garrison, E.P.; Marth, G.T.; Quinlan, A.R.; Hall, I.M. SpeedSeq: ultra-fast personal genome analysis and interpretation. Nat. Methods 2015, 12, 966–968. [Google Scholar] [CrossRef]
  88. Eggertsson, H.P.; Jonsson, H.; Kristmundsdottir, S.; Hjartarson, E.; Kehr, B.; Masson, G.; Zink, F.; E Hjorleifsson, K.; Jonasdottir, A.; Jonasdottir, A.; et al. Graphtyper enables population-scale genotyping using pangenome graphs. Nat. Genet. 2017, 49, 1654–1660. [Google Scholar] [CrossRef]
  89. Ebler, J.; Ebert, P.; Clarke, W.E.; Rausch, T.; Audano, P.A.; Houwaart, T.; Mao, Y.; Korbel, J.O.; Eichler, E.E.; Zody, M.C.; et al. Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes. Nat. Genet. 2022, 54, 518–525. [Google Scholar] [CrossRef]
  90. Naithani, S.; Geniza, M.; Jaiswal, P. Variant Effect Prediction Analysis Using Resources Available at Gramene Database. 1533. [CrossRef]
  91. Emms, D.M.; Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 2019, 20, 1–14. [Google Scholar] [CrossRef] [PubMed]
  92. Li, L. J. Stoeckert, Jr., and D. S. Roos. "Orthomcl: Identification of Ortholog Groups for Eukaryotic Genomes." Genome Research 13, no. 9 (2003): 2178-89.
  93. Miller, J.B.; Pickett, B.D.; Ridge, P.G. JustOrthologs: a fast, accurate and user-friendly ortholog identification algorithm. Bioinformatics 2018, 35, 546–552. [Google Scholar] [CrossRef] [PubMed]
  94. Zhou, S.; Chen, Y.; Guo, C.; Qi, J. PhyloMCL: Accurate clustering of hierarchical orthogroups guided by phylogenetic relationship and inference of polyploidy events. Methods Ecol. Evol. 2020, 11, 943–954. [Google Scholar] [CrossRef]
  95. Altenhoff, A. M., C. M. Train, K. J. Gilbert, I. Mediratta, T. Mendes de Farias, D. Moi, Y. Nevers, H. S. Radoykova, V. Rossier, A. Warwick Vesztrocy, N. M. Glover, and C. Dessimoz. "Oma Orthology in 2021: Website Overhaul, Conserved Isoforms, Ancestral Gene Order and More." Nucleic Acids Res 49, no. D1 (2021): D373-D79.
  96. Persson, E.; Sonnhammer, E.L.L. InParanoid-DIAMOND: faster orthology analysis with the InParanoid algorithm. Bioinformatics 2022, 38, 2918–2919. [Google Scholar] [CrossRef]
  97. Naithani, S.; Gupta, P.; Preece, J.; D’eustachio, P.; Elser, J.L.; Garg, P.; A Dikeman, D.; Kiff, J.; Cook, J.; Olson, A.; et al. Plant Reactome: a knowledgebase and resource for comparative pathway analysis. Nucleic Acids Res. 2019, 48, D1093–D1103. [Google Scholar] [CrossRef]
  98. Durant. ; Sabot, F.; Conte, M.; Rouard, M. Panache: a web browser-based viewer for linearized pangenomes. Bioinformatics 2021, 37, 4556–4558. [Google Scholar] [CrossRef]
  99. Droc, G.; Martin, G.; Guignon, V.; Summo, M.; Sempéré, G.; Durant, E.; Soriano, A.; Baurens, F.-C.; Cenci, A.; Breton, C.; et al. The banana genome hub: a community database for genomics in the Musaceae. Hortic. Res. 2022, 9, uhac221. [Google Scholar] [CrossRef]
  100. Yokoyama, T. T., Y. Sakamoto, M. Seki, Y. Suzuki, and M. Kasahara. "Momi-G: Modular Multi-Scale Integrated Genome Graph Browser." BMC Bioinformatics 20, no. 1 (2019): 548.
  101. Wick, R.R.; Schultz, M.B.; Zobel, J.; Holt, K.E. Bandage: interactive visualization of de novo genome assemblies. Bioinformatics 2015, 31, 3350–3352. [Google Scholar] [CrossRef]
  102. Beyer, W.; Novak, A.M.; Hickey, G.; Chan, J.; Tan, V.; Paten, B.; Zerbino, D.R. Sequence tube maps: making graph genomes intuitive to commuters. Bioinformatics 2019, 35, 5318–5320. [Google Scholar] [CrossRef]
  103. Gonnella, G.; Niehus, N.; Kurtz, S. GfaViz: flexible and interactive visualization of GFA sequence graphs. Bioinformatics 2018, 35, 2853–2855. [Google Scholar] [CrossRef]
  104. Mikheenko, A.; Kolmogorov, M. Assembly Graph Browser: interactive visualization of assembly graphs. Bioinformatics 2019, 35, 3476–3478. [Google Scholar] [CrossRef]
  105. Kunyavskaya, O.; Prjibelski, A.D. SGTK: a toolkit for visualization and assessment of scaffold graphs. Bioinformatics 2018, 35, 2303–2305. [Google Scholar] [CrossRef] [PubMed]
  106. Durbin, R. Efficient haplotype matching and storage using the positional Burrows–Wheeler transform (PBWT). Bioinformatics 2014, 30, 1266–1272. [Google Scholar] [CrossRef] [PubMed]
  107. Novak, A.M.; Garrison, E.; Paten, B. A graph extension of the positional Burrows–Wheeler transform and its applications. Algorithms Mol. Biol. 2017, 12, 1–12. [Google Scholar] [CrossRef]
  108. Grytten, I., K. D. Rand, A. J. Nederbragt, G. O. Storvik, I. K. Glad, and G. K. Sandve. "Graph Peak Caller: Calling Chip-Seq Peaks on Graph-Based Reference Genomes." PLoS Comput Biol 15, no. 2 (2019): e1006731.
  109. Wang, J.; Yang, W.; Zhang, S.; Hu, H.; Yuan, Y.; Dong, J.; Chen, L.; Ma, Y.; Yang, T.; Zhou, L.; et al. A pangenome analysis pipeline provides insights into functional gene identification in rice. Genome Biol. 2023, 24, 1–22. [Google Scholar] [CrossRef] [PubMed]
  110. Qamar, M.T.U.; Zhu, X.; Xing, F.; Chen, L.-L. ppsPCP: a plant presence/absence variants scanner and pan-genome construction pipeline. Bioinformatics 2019, 35, 4156–4158. [Google Scholar] [CrossRef]
  111. Harper, L.; Campbell, J.; Cannon, E.K.S.; Jung, S.; Poelchau, M.; Walls, R.; Andorf, C.; Arnaud, E.; Berardini, T.Z.; Birkett, C.; et al. AgBioData consortium recommendations for sustainable genomics and genetics databases for agriculture. Database 2018, 2018. [Google Scholar] [CrossRef]
  112. Adam-Blondon, A.-F.; Alaux, M.; Pommier, C.; Cantu, D.; Cheng, Z.-M.; Cramer, G.; Davies, C.; Delrot, S.; Deluc, L.; Di Gaspero, G.; et al. Towards an open grapevine information system. Hortic. Res. 2016, 3, 16056. [Google Scholar] [CrossRef]
  113. Bolser, D., D. M. Staines, E. Pritchard, and P. Kersey. "Ensembl Plants: Integrating Tools for Visualizing, Mining, and Analyzing Plant Genomics Data." Methods in Molecular Biology 1374 (2016): 115-40.
  114. Gupta, P.; Naithani, S.; Preece, J.; Kim, S.; Cheng, T.; D’eustachio, P.; Elser, J.; Bolton, E.E.; Jaiswal, P. Plant Reactome and PubChem: The Plant Pathway and (Bio)Chemical Entity Knowledgebases. 2443. [CrossRef]
  115. Tello-Ruiz, M.K.; Naithani, S.; Gupta, P.; Olson, A.; Wei, S.; Preece, J.; Jiao, Y.; Wang, B.; Chougule, K.; Garg, P.; et al. Gramene 2021: harnessing the power of comparative genomics and pathways for plant research. Nucleic Acids Res. 2020, 49, D1452–D1463. [Google Scholar] [CrossRef] [PubMed]
  116. Pasha, A.; Subramaniam, S.; Cleary, A.; Chen, X.; Berardini, T.; Farmer, A.; Town, C.; Provart, N. Araport Lives: An Updated Framework for Arabidopsis Bioinformatics. Plant Cell 2020, 32, 2683–2686. [Google Scholar] [CrossRef] [PubMed]
  117. Shamimuzzaman, M., J. M. Gardiner, A. T. Walsh, D. A. Triant, J. J. Le Tourneau, A. Tayal, D. R. Unni, H. N. Nguyen, J. L. Portwood, 2nd, E. K. S. Cannon, C. M. Andorf, and C. G. Elsik. "Maizemine: A Data Mining Warehouse for the Maize Genetics and Genomics Database." Frontiers of Plant Science 11 (2020): 592730.
  118. Gladman, N.; Olson, A.; Wei, S.; Chougule, K.; Lu, Z.; Tello-Ruiz, M.; Meijs, I.; Van Buren, P.; Jiao, Y.; Wang, B.; et al. SorghumBase: a web-based portal for sorghum genetic information and community advancement. Planta 2022, 255, 1–8. [Google Scholar] [CrossRef] [PubMed]
  119. Arkin, A.P.; Cottingham, R.W.; Henry, C.S.; Harris, N.L.; Stevens, R.L.; Maslov, S.; Dehal, P.; Ware, D.; Perez, F.; Canon, S.; et al. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nat. Biotechnol. 2018, 36, 566–569. [Google Scholar] [CrossRef] [PubMed]
  120. Yates, A.D.; Allen, J.; Amode, R.M.; Azov, A.G.; Barba, M.; Becerra, A.; Bhai, J.; I Campbell, L.; Martinez, M.C.; Chakiachvili, M.; et al. Ensembl Genomes 2022: an expanding genome resource for non-vertebrates. Nucleic Acids Res. 2021, 50, D996–D1003. [Google Scholar] [CrossRef] [PubMed]
  121. Naithani, S.; Preece, J.; D'Eustachio, P.; Gupta, P.; Amarasinghe, V.; Dharmawardhana, P.D.; Wu, G.; Fabregat, A.; Elser, J.L.; Weiser, J.; et al. Plant Reactome: a resource for plant pathways and comparative analysis. Nucleic Acids Res. 2016, 45, D1029–D1039. [Google Scholar] [CrossRef]
  122. Tello-Ruiz, M.K.; Naithani, S.; Stein, J.C.; Gupta, P.; Campbell, M.; Olson, A.; Wei, S.; Preece, J.; Geniza, M.J.; Jiao, Y.; et al. Gramene 2018: unifying comparative genomics and pathway resources for plant research. Nucleic Acids Res. 2017, 46, D1181–D1189. [Google Scholar] [CrossRef]
  123. Naithani, S.; Raja, R.; Waddell, E.N.; Elser, J.; Gouthu, S.; Deluc, L.G.; Jaiswal, P. VitisCyc: a metabolic pathway knowledgebase for grapevine (Vitis vinifera). Front. Plant Sci. 2014, 5. [Google Scholar] [CrossRef]
  124. Naithani, S.; Partipilo, C.M.; Raja, R.; Elser, J.L.; Jaiswal, P. FragariaCyc: A Metabolic Pathway Database for Woodland Strawberry Fragaria vesca. Front. Plant Sci. 2016, 7, 242. [Google Scholar] [CrossRef]
  125. Woodhouse, M. R. K. Cannon, J. L. Portwood, 2nd, L. C. Harper, J. M. Gardiner, M. L. Schaeffer, and C. M. Andorf. "A Pan-Genomic Approach to Genome Databases Using Maize as a Model System." BMC Plant Biology 21, no. 1 (2021): 385.
  126. Kanehisa, M.; Furumichi, M.; Sato, Y.; Kawashima, M.; Ishiguro-Watanabe, M. KEGG for taxonomy-based analysis of pathways and genomes. Nucleic Acids Res. 2022, 51, D587–D592. [Google Scholar] [CrossRef]
  127. Paley, S.; Karp, P.D. The BioCyc Metabolic Network Explorer. BMC Bioinform. 2021, 22, 1–6. [Google Scholar] [CrossRef] [PubMed]
  128. Naithani, S.; Jaiswal, P. Pathway Analysis and Omics Data Visualization Using Pathway Genome Databases: FragariaCyc, a Case Study. 1533. [CrossRef]
  129. Hawkins, C.; Ginzburg, D.; Zhao, K.; Dwyer, W.; Xue, B.; Xu, A.; Rice, S.; Cole, B.; Paley, S.; Karp, P.; et al. Plant Metabolic Network 15: A resource of genome-wide metabolism databases for 126 plants and algae. J. Integr. Plant Biol. 2021, 63, 1888–1905. [Google Scholar] [CrossRef] [PubMed]
  130. Foerster, H., A. Bombarely, J. N. D. Battey, N. Sierro, N. V. Ivanov, and L. A. Mueller. "Solcyc: A Database Hub at the Sol Genomics Network (Sgn) for the Manual Curation of Metabolic Networks in Solanum and Nicotiana Specific Databases." Database (Oxford) 2018 (2018).
  131. Goodstein, D.M.; Shu, S.; Howson, R.; Neupane, R.; Hayes, R.D.; Fazo, J.; Mitros, T.; Dirks, W.; Hellsten, U.; Putnam, N.; et al. Phytozome: A comparative platform for green plant genomics. Nucleic Acids Res. 2012, 40, D1178–D1186. [Google Scholar] [CrossRef] [PubMed]
  132. Deng, C.H., S. Naithani, S. Kumari, I. Cobo-Simon, E.H. Quezada-Rodriguez, M. Skrabisova, N. Gladman, M.J. Correll, A.B. Sikiru, O.O. Afuwape, A. Marrano, I. Rebollo, W. Zhang, and S. Jung. "Agricultural Sciences in the Big Data Era: Genotype and Phenotype Data Standardization, Utilization and Integration.." Preprints - American Chemical Society, Division of Petroleum Chemistry (2023).
  133. Sun, C., Z. Hu, T. Zheng, K. Lu, Y. Zhao, W. Wang, J. Shi, C. Wang, J. Lu, D. Zhang, Z. Li, and C. Wei. "Rpan: Rice Pan-Genome Browser for Approximately 3000 Rice Genomes." Nucleic Acids Res 45, no. 2 (2017): 597-605.
  134. Zhao, Q.; Feng, Q.; Lu, H.; Li, Y.; Wang, A.; Tian, Q.; Zhan, Q.; Lu, Y.; Zhang, L.; Huang, T.; et al. Pan-genome analysis highlights the extent of genomic variation in cultivated and wild rice. Nat. Genet. 2018, 50, 278–284. [Google Scholar] [CrossRef] [PubMed]
  135. Gui, S.; Yang, L.; Li, J.; Luo, J.; Xu, X.; Yuan, J.; Chen, L.; Li, W.; Yang, X.; Wu, S.; et al. ZEAMAP, a Comprehensive Database Adapted to the Maize Multi-Omics Era. iScience 2020, 23, 101241. [Google Scholar] [CrossRef]
  136. Valentin, G. Abdel, D. Gaetan, D. Jean-Francois, C. Matthieu, and R. Mathieu. "Greenphyldb V5: A Comparative Pangenomic Database for Plant Genomes." Nucleic Acids Res 49, no. D1 (2021): D1464-D71.
  137. Bayer, P.E.; Petereit, J.; Durant. ; Monat, C.; Rouard, M.; Hu, H.; Chapman, B.; Li, C.; Cheng, S.; Batley, J.; et al. Wheat Panache: A pangenome graph database representing presence–absence variation across sixteen bread wheat genomes. Plant Genome 2022, 15, e20221. [Google Scholar] [CrossRef]
  138. Blake, V.C.; Woodhouse, M.R.; Lazo, G.R.; Odell, S.G.; Wight, C.P.; A Tinker, N.; Wang, Y.; Gu, Y.Q.; Birkett, C.L.; Jannink, J.-L.; et al. GrainGenes: centralized small grain resources and digital platform for geneticists and breeders. Database 2019, 2019. [Google Scholar] [CrossRef]
  139. Montenegro, J.D.; Golicz, A.A.; Bayer, P.E.; Hurgobin, B.; Lee, H.; Chan, C.-K.K.; Visendi, P.; Lai, K.; Doležel, J.; Batley, J.; et al. The pangenome of hexaploid bread wheat. Plant J. 2017, 90, 1007–1013. [Google Scholar] [CrossRef]
  140. Li, N.; He, Q.; Wang, J.; Wang, B.; Zhao, J.; Huang, S.; Yang, T.; Tang, Y.; Yang, S.; Aisimutuola, P.; et al. Super-pangenome analyses highlight genomic diversity and structural variation across wild and cultivated tomato species. Nat. Genet. 2023, 55, 852–860. [Google Scholar] [CrossRef]
  141. Barchi, L.; Rabanus-Wallace, M.T.; Prohens, J.; Toppino, L.; Padmarasu, S.; Portis, E.; Rotino, G.L.; Stein, N.; Lanteri, S.; Giuliano, G. Improved genome assembly and pan-genome provide key insights into eggplant domestication and breeding. Plant J. 2021, 107, 579–596. [Google Scholar] [CrossRef]
  142. Ou, L.; Li, D.; Lv, J.; Chen, W.; Zhang, Z.; Li, X.; Yang, B.; Zhou, S.; Yang, S.; Li, W.; et al. Pan-genome of cultivated pepper (Capsicum) and its use in gene presence-absence variation analyses. New Phytol. 2018, 220, 360–363. [Google Scholar] [CrossRef] [PubMed]
  143. Zhang, B.; Huang, H.; Tibbs-Cortes, L.E.; Vanous, A.; Zhang, Z.; Sanguinet, K.; Garland-Campbell, K.A.; Yu, J.; Li, X. Streamline unsupervised machine learning to survey and graph indel-based haplotypes from pan-genomes. Mol. Plant 2023, 16, 975–978. [Google Scholar] [CrossRef] [PubMed]
  144. Torkamaneh, D.; Lemay, M.; Belzile, F. The pan-genome of the cultivated soybean (PanSoy) reveals an extraordinarily conserved gene content. Plant Biotechnol. J. 2021, 19, 1852–1862. [Google Scholar] [CrossRef]
  145. Hübner, S.; Bercovich, N.; Todesco, M.; Mandel, J.R.; Odenheimer, J.; Ziegler, E.; Lee, J.S.; Baute, G.J.; Owens, G.L.; Grassa, C.J.; et al. Sunflower pan-genome analysis shows that hybridization altered gene content and disease resistance. Nat. Plants 2018, 5, 54–62. [Google Scholar] [CrossRef] [PubMed]
  146. Jin, S.; Han, Z.; Hu, Y.; Si, Z.; Dai, F.; He, L.; Cheng, Y.; Li, Y.; Zhao, T.; Fang, L.; et al. Structural variation (SV)-based pan-genome and GWAS reveal the impacts of SVs on the speciation and diversification of allotetraploid cottons. Mol. Plant 2023, 16, 678–693. [Google Scholar] [CrossRef]
  147. Liu, H.; Wang, X.; Liu, S.; Huang, Y.; Guo, Y.-X.; Xie, W.-Z.; Liu, H.; Qamar, M.T.U.; Xu, Q.; Chen, L.-L. Citrus Pan-Genome to Breeding Database (CPBD): A comprehensive genome database for citrus breeding. Mol. Plant 2022, 15, 1503–1505. [Google Scholar] [CrossRef]
  148. Li, Q.; Qi, J.; Qin, X.; Dou, W.; Lei, T.; Hu, A.; Jia, R.; Jiang, G.; Zou, X.; Long, Q.; et al. CitGVD: a comprehensive database of citrus genomic variations. Hortic. Res. 2020, 7, 1–8. [Google Scholar] [CrossRef]
  149. Sun, X.; Jiao, C.; Schwaninger, H.; Chao, C.T.; Ma, Y.; Duan, N.; Khan, A.; Ban, S.; Xu, K.; Cheng, L.; et al. Phased diploid genome assemblies and pan-genomes provide insights into the genetic history of apple domestication. Nat. Genet. 2020, 52, 1423–1432. [Google Scholar] [CrossRef]
  150. Song, J.M.; Liu, D.X.; Xie, W.; Yang, Z.; Guo, L.; Liu, K.; Yang, Q.; Chen, L. BnPIR: Brassica napus pan-genome information resource for 1689 accessions. Plant Biotechnol. J. 2021, 19, 412–414. [Google Scholar] [CrossRef]
  151. Qi, W.; Lim, Y.-W.; Patrignani, A.; Schläpfer, P.; Bratus-Neuenschwander, A.; Grüter, S.; Chanez, C.; Rodde, N.; Prat, E.; Vautrin, S.; et al. The haplotype-resolved chromosome pairs of a heterozygous diploid African cassava cultivar reveal novel pan-genome and allele-specific transcriptome features. GigaScience 2022, 11. [Google Scholar] [CrossRef]
  152. Ruperao, P.; Thirunavukkarasu, N.; Gandham, P.; Selvanayagam, S.; Govindaraj, M.; Nebie, B.; Manyasa, E.; Gupta, R.; Das, R.R.; Odeny, D.A.; et al. Sorghum Pan-Genome Explores the Functional Utility for Genomic-Assisted Breeding to Accelerate the Genetic Gain. Front. Plant Sci. 2021, 12. [Google Scholar] [CrossRef]
  153. Varshney, R.K.; Roorkiwal, M.; Sun, S.; Bajaj, P.; Chitikineni, A.; Thudi, M.; Singh, N.P.; Du, X.; Upadhyaya, H.D.; Khan, A.W.; et al. A chickpea genetic variation map based on the sequencing of 3,366 genomes. Nature 2021, 599, 622–627. [Google Scholar] [CrossRef]
  154. Zhao, J.; Bayer, P.E.; Ruperao, P.; Saxena, R.K.; Khan, A.W.; Golicz, A.A.; Nguyen, H.T.; Batley, J.; Edwards, D.; Varshney, R.K. Trait associations in the pangenome of pigeon pea ( Cajanus cajan ). Plant Biotechnol. J. 2020, 18, 1946–1954. [Google Scholar] [CrossRef] [PubMed]
  155. Yu, J.; Golicz, A.A.; Lu, K.; Dossa, K.; Zhang, Y.; Chen, J.; Wang, L.; You, J.; Fan, D.; Edwards, D.; et al. Insight into the evolution and functional characteristics of the pan-genome assembly from sesame landraces and modern cultivars. Plant Biotechnol. J. 2018, 17, 881–892. [Google Scholar] [CrossRef] [PubMed]
  156. Li, J.; Yuan, D.; Wang, P.; Wang, Q.; Sun, M.; Liu, Z.; Si, H.; Xu, Z.; Ma, Y.; Zhang, B.; et al. Cotton pan-genome retrieves the lost sequences and genes during domestication and selection. Genome Biol. 2021, 22, 1–26. [Google Scholar] [CrossRef]
  157. Sun, Y.; Wang, J.; Li, Y.; Jiang, B.; Wang, X.; Xu, W.-H.; Wang, Y.-Q.; Zhang, P.-T.; Zhang, Y.-J.; Kong, X.-D. Pan-Genome Analysis Reveals the Abundant Gene Presence/Absence Variations Among Different Varieties of Melon and Their Influence on Traits. Front. Plant Sci. 2022, 13, 835496. [Google Scholar] [CrossRef]
  158. Li, H.; Wang, S.; Chai, S.; Yang, Z.; Zhang, Q.; Xin, H.; Xu, Y.; Lin, S.; Chen, X.; Yao, Z.; et al. Graph-based pan-genome reveals structural and sequence variations related to agronomic traits and domestication in cucumber. Nat. Commun. 2022, 13, 1–14. [Google Scholar] [CrossRef]
  159. Qiao, Q. P. Edger, L. Xue, L. Qiong, J. Lu, Y. Zhang, Q. Cao, A. E. Yocca, A. E. Platts, S. J. Knapp, M. Van Montagu, Y. Van de Peer, J. Lei, and T. Zhang. "Evolutionary History and Pan-Genome Dynamics of Strawberry (Fragaria Spp.)." Proceedings of the National Academy of Sciences of the United States of America 118, no. 45 (2021).
  160. Wang, H.; Tu, R.; Ruan, Z.; Chen, C.; Peng, Z.; Zhou, X.; Sun, L.; Hong, Y.; Chen, D.; Liu, Q.; et al. Photoperiod and gravistimulation-associated Tiller Angle Control 1 modulates dynamic changes in rice plant architecture. Theor. Appl. Genet. 2023, 136, 1–19. [Google Scholar] [CrossRef] [PubMed]
  161. Yu, B.; Lin, Z.; Li, H.; Li, X.; Li, J.; Wang, Y.; Zhang, X.; Zhu, Z.; Zhai, W.; Wang, X.; et al. TAC1, a major quantitative trait locus controlling tiller angle in rice. Plant J. 2007, 52, 891–898. [Google Scholar] [CrossRef]
  162. Boukail, S.; Macharia, M.; Miculan, M.; Masoni, A.; Calamai, A.; Palchetti, E.; Dell’acqua, M. Genome wide association study of agronomic and seed traits in a world collection of proso millet (Panicum miliaceum L.). BMC Plant Biol. 2021, 21, 1–12. [Google Scholar] [CrossRef]
  163. Liu, C.; Wang, Y.; Peng, J.; Fan, B.; Xu, D.; Wu, J.; Cao, Z.; Gao, Y.; Wang, X.; Li, S.; et al. High-quality genome assembly and pan-genome studies facilitate genetic discovery in mung bean and its improvement. Plant Commun. 2022, 3, 100352. [Google Scholar] [CrossRef]
  164. D’Hont, A.; Denoeud, F.; Aury, J.M.; Baurens, F.-C.; Carreel, F.; Garsmeur, O.; Noel, B.; Bocs, S.; Droc, G.; Rouard, M.; et al. The banana (Musa acuminata) genome and the evolution of monocotyledonous plants. Nature 2012, 488, 213–217. [Google Scholar] [CrossRef] [PubMed]
  165. Fernie, A.R.; Aharoni, A. Pan-Genomic Illumination of Tomato Identifies Novel Gene–Trait Interactions. Trends Plant Sci. 2019, 24, 882–884. [Google Scholar] [CrossRef] [PubMed]
  166. Huff, M.; Hulse-Kemp, A.M.; E Scheffler, B.; Youngblood, R.C.; A Simpson, S.; Babiker, E.; Staton, M. Long-read, chromosome-scale assembly of Vitis rotundifolia cv. Carlos and its unique resistance to Xylella fastidiosa subsp. fastidiosa. BMC Genom. 2023, 24, 1–17. [Google Scholar] [CrossRef] [PubMed]
  167. Oren, E. Dafna, G. Tzuri, I. Halperin, T. Isaacson, M. Elkabetz, A. Meir, U. Saar, S. Ohali, T. La, C. Romay, Y. Tadmor, A. A. Schaffer, E. S. Buckler, R. Cohen, J. Burger, and A. Gur. "Pan-Genome and Multi-Parental Framework for High-Resolution Trait Dissection in Melon (Cucumis Melo)." Plant Journal 112, no. 6 (2022): 1525-42.
  168. Hasan, N.; Choudhary, S.; Naaz, N.; Sharma, N.; Laskar, R.A. Recent advancements in molecular marker-assisted selection and applications in plant breeding programmes. J. Genet. Eng. Biotechnol. 2021, 19, 1–26. [Google Scholar] [CrossRef]
  169. Garrido-Cardenas, J.A.; Mesa-Valle, C.; Manzano-Agugliaro, F. Trends in plant research using molecular markers. Planta 2017, 247, 543–557. [Google Scholar] [CrossRef]
  170. Moncada, P.; McCouch, S. Simple sequence repeat diversity in diploid and tetraploid Coffea species. Genome 2004, 47, 501–509. [Google Scholar] [CrossRef]
  171. McCouch, S.R.; Chen, X.; Panaud, O.; Temnykh, S.; Xu, Y.; Cho, Y.G.; Huang, N.; Ishii, T.; Blair, M. Microsatellite marker development, mapping and applications in rice genetics and breeding. Plant Mol. Biol. 1997, 35, 89–99. [Google Scholar] [CrossRef]
  172. Tanksley, S.D.; McCouch, S.R. Seed Banks and Molecular Maps: Unlocking Genetic Potential from the Wild. Science 1997, 277, 1063–1066. [Google Scholar] [CrossRef]
  173. Morales, K.Y.; Singh, N.; Perez, F.A.; Ignacio, J.C.; Thapa, R.; Arbelaez, J.D.; Tabien, R.E.; Famoso, A.; Wang, D.R.; Septiningsih, E.M.; et al. An improved 7K SNP array, the C7AIR, provides a wealth of validated SNP markers for rice breeding and genetics studies. PLOS ONE 2020, 15, e0232479. [Google Scholar] [CrossRef]
  174. Miller, J.R.; Zhou, P.; Mudge, J.; Gurtowski, J.; Lee, H.; Ramaraj, T.; Walenz, B.P.; Liu, J.; Stupar, R.M.; Denny, R.; et al. Hybrid assembly with long and short reads improves discovery of gene family expansions. BMC Genom. 2017, 18, 1–12. [Google Scholar] [CrossRef] [PubMed]
  175. Cheng, C.; Fei, Z.; Xiao, P. Methods to improve the accuracy of next-generation sequencing. Front. Bioeng. Biotechnol. 2023, 11, 982111. [Google Scholar] [CrossRef] [PubMed]
  176. Myburg, A.A.; Grattapaglia, D.; Tuskan, G.A.; Hellsten, U.; Hayes, R.D.; Grimwood, J.; Jenkins, J.; Lindquist, E.; Tice, H.; Bauer, D.; et al. The genome of Eucalyptus grandis. Nature 2014, 510, 356–362. [Google Scholar] [CrossRef] [PubMed]
  177. Shulaev, V.; Sargent, D.J.; Crowhurst, R.N.; Mockler, T.C.; Folkerts, O.; Delcher, A.L.; Jaiswal, P.; Mockaitis, K.; Liston, A.; Mane, S.P.; et al. The genome of woodland strawberry (Fragaria vesca). Nat. Genet. 2010, 43, 109–116. [Google Scholar] [CrossRef]
  178. Wu, S.; Sun, H.; Gao, L.; Branham, S.; McGregor, C.; Renner, S.S.; Xu, Y.; Kousik, C.; Wechter, W.P.; Levi, A.; et al. A Citrullus genus super-pangenome reveals extensive variations in wild and cultivated watermelons and sheds light on watermelon evolution and domestication. Plant Biotechnol. J. 2023, 21, 1926–1928. [Google Scholar] [CrossRef]
  179. Naithani, S.; Dikeman, D.; Garg, P.; Al-Bader, N.; Jaiswal, P. Beyond gene ontology (GO): using biocuration approach to improve the gene nomenclature and functional annotation of rice S-domain kinase subfamily. PeerJ 2021, 9, e11052. [Google Scholar] [CrossRef]
  180. Naithani, S.; Komath, S.S.; Nonomura, A.; Govindjee, G. Plant lectins and their many roles: Carbohydrate-binding and beyond. J. Plant Physiol. 2021, 266, 153531. [Google Scholar] [CrossRef]
  181. Monaco, M.K.; Sen, T.Z.; Dharmawardhana, P.D.; Ren, L.; Schaeffer, M.; Naithani, S.; Amarasinghe, V.; Thomason, J.; Harper, L.; Gardiner, J.; et al. Maize Metabolic Network Construction and Transcriptome Analysis. Plant Genome 2013, 6. [Google Scholar] [CrossRef]
  182. Jaiswal, P. , and B. Usadel. "Plant Pathway Databases." Methods in Molecular Biology 1374 (2016): 71-87.
  183. Naithani, S.; Nonogaki, H.; Jaiswal, P. Exploring Crossroads Between Seed Development and Stress Response. 2017. [Google Scholar] [CrossRef]
  184. The Gene Ontology Consortium; A Aleksander, S. ; Balhoff, J.; Carbon, S.; Cherry, J.M.; Drabkin, H.J.; Ebert, D.; Feuermann, M.; Gaudet, P.; Harris, N.L.; et al. The Gene Ontology knowledgebase in 2023. Genetics 2023, 224. [Google Scholar] [CrossRef]
  185. Cooper, L. , and P. Jaiswal. "The Plant Ontology: A Tool for Plant Genomics." Methods in Molecular Biology 1374 (2016): 89-114.
  186. Walls, R.L.; Cooper, L.; Elser, J.; Gandolfo, M.A.; Mungall, C.J.; Smith, B.; Stevenson, D.W.; Jaiswal, P. The Plant Ontology Facilitates Comparisons of Plant Development Stages Across Species. Front. Plant Sci. 2019, 10, 631. [Google Scholar] [CrossRef]
  187. Naithani, S.; Mohanty, B.; Elser, J.; D’eustachio, P.; Jaiswal, P. Biocuration of a Transcription Factors Network Involved in Submergence Tolerance during Seed Germination and Coleoptile Elongation in Rice (Oryza sativa). Plants 2023, 12, 2146. [Google Scholar] [CrossRef] [PubMed]
  188. Naithani, S. Dharmawardhana, and J. B. Nasrallah. "Scr." In Handbook of Biologically Active Peptides, edited by Abba J. Kastin, 58-66: Academic Press, 2013.
  189. Bolger, M.; Schwacke, R.; Usadel, B. MapMan Visualization of RNA-Seq Data Using Mercator4 Functional Annotations. 2021, 2354, 195–212. [CrossRef]
  190. Naithani, S.; Gupta, P.; Preece, J.; Garg, P.; Fraser, V.; Padgitt-Cobb, L.K.; Martin, M.; Vining, K.; Jaiswal, P. Involving community in genes and pathway curation. Database 2019, 2019. [Google Scholar] [CrossRef] [PubMed]
  191. Gupta, P.; Geniza, M.; Naithani, S.; Phillips, J.L.; Haq, E.; Jaiswal, P. Chia (Salvia hispanica) Gene Expression Atlas Elucidates Dynamic Spatio-Temporal Changes Associated With Plant Growth and Development. Front. Plant Sci. 2021, 12. [Google Scholar] [CrossRef] [PubMed]
  192. Hendre, P.S.; Muthemba, S.; Kariba, R.; Muchugi, A.; Fu, Y.; Chang, Y.; Song, B.; Liu, H.; Liu, M.; Liao, X.; et al. African Orphan Crops Consortium (AOCC): status of developing genomic resources for African orphan crops. Planta 2019, 250, 989–1003. [Google Scholar] [CrossRef]
  193. Chang, Y.; Liu, H.; Liu, M.; Liao, X.; Sahu, S.K.; Fu, Y.; Song, B.; Cheng, S.; Kariba, R.; Muthemba, S.; et al. The draft genomes of five agriculturally important African orphan crops. GigaScience 2019, 8, 152. [Google Scholar] [CrossRef]
Figure 1. A conceptual depiction of the pan-genes across genomes of eight accessions belonging to the same clade. Each ring represents one accession. The top-left “core genes” represent conserved genes across eight accessions. The white section in a ring indicates the absence of ortholog(s). The “soft cores” represents genes found in >=95% of accessions. The “cloud genes “ found only in one or two taxa. The rest between the “cloud genes” and “soft core genes” are “shell genes”.
Figure 1. A conceptual depiction of the pan-genes across genomes of eight accessions belonging to the same clade. Each ring represents one accession. The top-left “core genes” represent conserved genes across eight accessions. The white section in a ring indicates the absence of ortholog(s). The “soft cores” represents genes found in >=95% of accessions. The “cloud genes “ found only in one or two taxa. The rest between the “cloud genes” and “soft core genes” are “shell genes”.
Preprints 82974 g001
Figure 2. An illustration of three popular approaches currently used for pan-genome construction are the reference-based iterative method, de novo genome assembly, and graph-based pangenome assembly.
Figure 2. An illustration of three popular approaches currently used for pan-genome construction are the reference-based iterative method, de novo genome assembly, and graph-based pangenome assembly.
Preprints 82974 g002
Figure 3. A conceptual view of pan-genome data and its applications in creating a common reference set of genome graph carrying all the possible chromosomal rearrangements and mapped features along with whole-genome alignments, gene orthology, expression, pathways, function, and aligned synteny views to help accelerate the knowledge discovery and hypothesis-driven research.
Figure 3. A conceptual view of pan-genome data and its applications in creating a common reference set of genome graph carrying all the possible chromosomal rearrangements and mapped features along with whole-genome alignments, gene orthology, expression, pathways, function, and aligned synteny views to help accelerate the knowledge discovery and hypothesis-driven research.
Preprints 82974 g003
Figure 4. A pan-gene overview for TAC1 transcription factor (reference gene OsTAC1; Os09g0529300) and its orthologs from various accessions of cultivated rice O. sativa, other members of Oryza genus, and two other monocots maize and sorghum at Gramene oryza pansite. Users can explore (A) an overall pan-gene alignment view that can be expanded to a multiple sequence alignment (using amino acid sequences of the encoded proteins) and gene neighborhood conservation.
Figure 4. A pan-gene overview for TAC1 transcription factor (reference gene OsTAC1; Os09g0529300) and its orthologs from various accessions of cultivated rice O. sativa, other members of Oryza genus, and two other monocots maize and sorghum at Gramene oryza pansite. Users can explore (A) an overall pan-gene alignment view that can be expanded to a multiple sequence alignment (using amino acid sequences of the encoded proteins) and gene neighborhood conservation.
Preprints 82974 g004
Figure 5. Plant pan-genomes can help to integrate heterogeneous omics data to understand gene function, genome evolution, and speciation and enable genomic selection, genome editing, and phenotype prediction to support and sustain agriculture production.
Figure 5. Plant pan-genomes can help to integrate heterogeneous omics data to understand gene function, genome evolution, and speciation and enable genomic selection, genome editing, and phenotype prediction to support and sustain agriculture production.
Preprints 82974 g005
Table 1. A list of popular open-source tools for pan-genome assembly and visualization.
Table 1. A list of popular open-source tools for pan-genome assembly and visualization.
Tool name and URL Remarks and Citation
Genome assembly
Hifiasm
https://github.com/chhylp123/hifiasm
Construct haplotype-resolved assemblies from accurate HiFi Reads [60]
Canu
https://github.com/marbl/canu
Assemble genomes of any size from single molecule sequences and provides graphical fragment assembly that can be integrated with complementary phasing and scaffolding methods [66].
Flye
https://github.com/fenderglass/Flye
Assemble single molecule, long-read sequencing data into genomes using repeat graphs [67].
PAGIT
https://www.sanger.ac.uk/tool/pagit
PAGIT is a package of tools for generating high-quality draft genome sequences by ordering contigs, closing gaps, correcting sequence errors, and transferring annotation. PAGIT is compiled for Linux/UNIX systems and is available as a virtual machine [68].
MEGAHIT
https://github.com/voutcn/megahit
Ultra-fast NGS assembler for metagenomes [69].
SPADes
https://cab.spbu.ru/software/spades/
A set of genome assembly and analysis tools for long- and short-reads [70,71].
Pangenome graph construction, normalization, identification of structural variants, and visualization
vgtools
(vg construct, 'vg call', 'vg giraffe', 'vg map', or 'vg mpmap')
https://github.com/vgteam/vg
Toolset for eukaryotic pangenome graph construction, read mapping, variant calling, and graph visualization [58].
Minigraph
https://github.com/lh3/minigraph
Tool for graph construction, mapping, and variant calling [72].
PGGB
https://github.com/pangenome/pggb
Pan-genome graph construction, normalization, and visualization, using ODGI as the backbone [73].
MGRgraph
https://github.com/LeilyR/Multi-genome-Reference
An Algorithm to Build a Multi-genome.
Cactus
https://github.com/ComparativeGenomicsToolkit/cactus
A reference-free multiple genome alignment program that can use progressive mode to build pan-genome across different species [74,75]
PanTools
https://pantools.readthedocs.io/en/latest/user_guide/install.html
A platform for pan-genome graph construction, read mapping, phylogeny analysis, pan-graph query, and pan-gene annotation [76].
Smoothxg
https://github.com/pangenome/smoothxg
Local re-construction of variation graphs
ODGI
https://github.com/pangenome/odgi
Optimized Dynamic Genome/Graph Implementation (ODGI) is a tool suite that represents graphs, including structurally complex regions with minimal memory overhead [77]. It is a pangenome toolbox with more than 30 tools to transform, analyze, simplify, validate, annotate, and visualize pangenome graphs.
nf-core/pangenome
https://github.com/nf-core/pangenome
Nextflow pipeline for all-vs-all alignment, pangenome graph construction, normalization, remove redundancy, and visualization (through ODGI) [78].
SeqWish
https://github.com/ekg/seqwish
Build a variation graph from pairwise alignments [73].
PanPipe
https://github.com/USDA-ARS-GBRU/PanPipes
End-to-end pan-genomic graph construction and genetic analysis pipeline [79].
PanGene
https://github.com/lh3/pangene
It is used for ortholog and paralog analysis and building pan-gene graphs.
Minimap2
https://github.com/lh3/minimap2
This alignment program can be used for mapping DNA or long mRNA sequences to a reference [80].
NGMLR
https://github.com/philres/ngmlr
This program align PacBio long reads to genomes and for detecting complex structural variations [5].
MUMmer4
https://mummer.sourceforge.net/
https://github.com/mummer4/mummer
This is a genome to genome aligner [81].
GraphAligner
https://github.com/maickrau/GraphAligner
A tool for aligning long reads to genome graphs [82].
V-ALIGN
https://github.com/tcsatc/V-ALIGN
V-ALIGN allows gapped sequence alignment directly on the input graph and supports affine and linear gaps [83].
PaSGAL
https://github.com/ParBLiSS/PaSGAL
Parallel Sequence to Graph Aligner (PaSGAL) facilitates local sequence alignment of sequences to variation graphs, splicing graphs, etc.
GED-MAP
https://github.com/thomas-buechler-ulm/gedmap
Prototype of efficient short read mapping to pangenome graph [84].
DeepVariant
https://github.com/google/deepvariant
A universal SNP and small-indel variant caller [85,86].
SpeedSeq
https://github.com/hall-lab/speedseq
A platform for alignment, variant calling, and functional annotation [87].
graphTyper
https://github.com/DecodeGenetics/graphtyper
This is a graph-based variant caller. It realigns short-read sequence data to a pangenome and is used for discovering and genotyping sequence variants [88].
PanGenie
https://github.com/eblerjana/pangenie
Kmer-based genotyper for structural variation detection on pangenome graphs from short-read sequencing data to genotype a broad spectrum of genetic variation [89].
VEP
https://ensembl.gramene.org/tools.html
Variant Effect Prediction (VEP) helps in analyzing the consequences of sequence variations [90].
OrthoFinder
https://github.com/davidemms/OrthoFinder
This an ortholog inference method [91].
OrthoMCL
https://orthomcl.org/orthomcl/app
A method for identification of ortholog groups for eukaryotic genomes [92].
JustOrthologs
https://github.com/ridgelab/JustOrthologs/
JustOrthologs is a fast ortholog identification algorithm that uses conservation of gene structure [93].
PhyloMCL
https://sourceforge.net/projects/phylomcl/files/Materials/
PhyloMCL: Accurate clustering of hierarchical orthogroups guided by phylogenetic relationship and inference of polyploidy events [94].
OMA
https://github.com/DessimozLab/OmaStandalone/tree/v2.4.0
https://omabrowser.org/oma/home/
Orthologous Matrix (OMA) is a method and database for orthologs identification from public domains and customized genomes [95].
InParanoid-Diamond
https://bitbucket.org/sonnhammergroup/inparanoid/src
InParanoid algorithm for orthology analysis and gene family identification [96]. This is used for orthology projection of plant gene families in the Plant Reactome (https://plantreactome.gramene.org) [97].
Panache
https://github.com/SouthGreenPlatform/panache
This is a web browser-based viewer for linearized pangenomes [98]. It is being used in visualizing pan-genome data in several plant portals, e.g., the banana genome hub [99].
MoMI-G
https://github.com/MoMI-G/MoMI-G/
Genome graph browser for SVs visualization. Users can filter and visualize annotations and inspect SVs with read alignments over the genome graph [100].
panGraphViewer
https://github.com/TF-Chan-Lab/panGraphViewer
PanGraphViewer was developed using Python3 for pangenome graph visualization and runs on all major operating systems.
Bandage
https://rrwick.github.io/Bandage/
This is an interactive tool for visualizing de novo assembled genomes [101].
Bandage-NG
https://github.com/asl/BandageNG
GUI program to interact with assembly graphs based on OGDF (The Open Graph Drawing Framework and Open Graph algorithms and Data structures Framework)
sequenceTubeMaps
https://github.com/vgteam/sequenceTubeMap
Interactive visualization of genomes [102].
GfaViz
https://github.com/ggonnella/gfaviz
Interactive visualization of graphical fragment assembly (GFA) genome graphs [103].
AGB
https://github.com/almiheenko/AGB
Assembly Graph Browser (AGB) is an Interactive tool for constructing and visualization of large assembly graphs, repeat sequence analysis [104].
IGGE
https://github.com/immersivegraphgenomeexplorer/IGGE
An interactive graph genomes browser.
GFAViewer
https://lh3.github.io/gfatools/
Part of the GFA server for online visualization of GFA files
SGTK
https://github.com/olga24912/SGTK
Scaffold graph toolkit for construction and interactive visualization of scaffold graphs using sequencing data [105].
Maffer
https://github.com/pangenome/maffer
Convert sorted graphs to multiple alignment format (MAF).
Gfatools
https://github.com/lh3/gfatools
A set of tools to parse, subgraph, and convert GFA or rGFA format to FASTA/BED format.
Pgge
https://github.com/pangenome/pgge
This is a pan-genome graph evaluator
WGT
https://github.com/Kuanhao-Chao/Wheeler_Graph_Toolkit
Tools and algorithms for recognizing, visualizing, and generating Wheeler graphs.
GBWT
https://github.com/jltsiren/gbwt
A tool used for haplotype matching and storage using the positional Burrows-Wheeler transform (PBWT) approach [106,107].
Spodgi
https://github.com/pangenome/spodgi
Convert ODGI genome graph file to SPARQL database.
GraphPeakCaller
https://github.com/uio-bmi/graph_peak_caller
A tool for calling transcription factor peaks on graph-based reference genomes using ChIP-seq data [108].
PSVCP
https://github.com/wjian8/psvcp_v1.01
A pangenome analysis pipeline (PSVCP) to construct pan-genome, call structural variants, and run population genotyping used for rice pan-genome [109].
ppsPCP
http://cbi.hzau.edu.cn/ppsPCP/
It is designed specifically for constructing fully annotated plant pan-genomes. It scans presence/absence variants [110].
Table 2. A list of pan-genomic portals and data resources for various crops.
Table 2. A list of pan-genomic portals and data resources for various crops.
Pan-genome resource Remarks
Gramene
Link: https://www.gramene.org/pansites
Species: maize, rice, grapevine, and sorghum
The Gramene hosts 128 reference plant genomes [115] and a pan-genome site for four crops: maize, rice, grapevine, and sorghum.
SorghumBase
Sorghum: https://www.sorghumbase.org
SorghumBase portal hosts a sorghum pan-genome browser consisting of five sorghum reference genome assemblies and genetic variant information for natural diversity panels and ethyl methanesulfonate (EMS)-induced mutant populations [118].
RPAN
Link: https://cgm.sjtu.edu.cn/3kricedb
In addition to RPAN, the data and analyzed outputs from 3K RGP are available at the following websites:
http://snp-seek.irri.org/
http://www.rmbreeding.cn/index.php
http://www.ricecloud.org
https://aws.amazon.com/public-data-sets/3000-rice-genome
Species: rice (O. sativa) and its wild relatives.
The Rice Pan-genome Browser (RPAN) hosts the Rice Pan-genome browser and genomic variation data generated from 3,010 diverse rice accessions generated by 3000 Rice Genome Project (3K RGP) [8,9,133,134]. It contains ~370Mbp IRGSP genome and ~260Mbp novel sequences comprising 50,995 genes (23,914 core genes). RPAN provides a phylogenetic tree browser to view the phylogeny of 3K rice accessions and a genome browser to view gene annotation and presence-absence variations (PAVs). Users can access pan-gene views and associated genetic variations.
RiceSuperPIRdb
Link: http://www.ricesuperpir.com
Species: 251 genomes representing domesticated rice accessions and wild relatives (202 O. sativa, 28 O. rufipogan, 11 O. glaberrima, and 10 O. barthii accessions).
The Rice Super Pan-genome Information Resource Database (RiceSuperPIRdb) provides an interactive web-based browser for the rice super pan-genome. It was built using reference-free high-quality whole genome alignment of 251 independent genome assemblies. Genome annotations (including annotations for transposable elements) and node-specific K-mer spectrum pangenome graphs are available for each assembly. In addition, genetic variation graphs support linking query data. This super pan-genome also facilitates the identification of lineage-specific haplotypes for trait-associated genes [21].
PanOryza
Link: https://panoryza.org
Species: It hosts the data for Magic-16 rice accessions; see
https://panoryza.cgrb.oregonstate.edu/node/5
PanOryza provides consistency in the rice gene annotation across all rice varieties and the rice pan-genome browser supported by JBrowse Genome Browser.
MaizeGDB
Link: https://nam-genomes.org
Species: maize
MaizeGDB hosts 48 maize genomes, including 26 high-quality PacBio genome assemblies of the Nested Associated Mapping (NAM) population founder lines and their associated datasets. It allows users to connect genomes, gene models, expression, methylome, sequence variation, structural variation, transposable elements, and diversity data across the maize pan-genome framework. That is supported by the JBrowse genome browser [125].
ZEAMAP
Link: www.zeamap.com
Species: maize
ZEAMAP is a comprehensive database incorporating multiple reference genomes, annotations, comparative genomics, transcriptomes, open chromatin regions, chromatin interactions, high-quality genetic variants, phenotypes, metabolomics, genetic maps, genetic mapping loci, population structures, and populational DNA methylation signals within maize inbred lines" [135]
GreenPhylDB
Link: https://www.greenphyl.org/cgi-bin/index.cgi
Species: 46 plant species. 19 pangenomes, including rice, maize, banana, grape, and cacao. In addition, it hosts 27 reference genomes.
GreenPhylDB is part of the South Green Bioinformatics platform (https://www.southgreen.fr) [136]. It aids exploration of gene families and homologous relationships among plant genomes.
The Wheat Panache web portal
Link: http://www.appliedbioinformatics.com.au/wheat_panache
Species: wheat
This wheat pangenome graph visualization is supported by Panache tool. It allows users to explore SVs across the chosen wheat accessions [137].
GrainGenes
Link: https://wheat.pw.usda.gov/GG3/pangenome
Species: wheat, barley, rye, oat
GrainGenes hosts molecular and phenotypic information for wheat, barley, rye, oat, etc. including several genome assemblies, genome browsers, and a T. aestivum (bread wheat) pangenome [138].
Wheat-Pangenome
Link: http://appliedbioinformatics.com.au/cgi-bin/gb2/gbrowse/WheatPan/
Species: bread wheat (Triticum aestivum)
Wheat-Pangenome facilitates comparison of an improved reference for the Chinese Spring wheat genome with 18 wheat cultivars [139].
SGN
Links: https://solgenomics.net
Subsites:
http://solomics.agis.org.cn/tomato
https://solgenomics.net/projects/tgg
https://solgenomics.net/organism/Solanum_lycopersicum/genome/
https://solgenomics.net/organism/Solanum_melongena/genome
Species: tomato, potato, pepper, petunia, and eggplant.
The Solanaceae Genomics Network (SGN) database hosts genomic information about the nightshade family. Currently, it has pan-genome data for tomato and eggplant. International Tomato Genome Sequencing Project produced the tomato pan-genome data, that provides a collection of tomato reference genome assemblies from 46 accessions (22 Solanum lycopersicum, 13 Solanum lycopersicum var. cerasiforme; and 11 Solanum pimpinellifolium) that can be viewed at http://solomics.agis.org.cn/tomato/tool/jbrowse_nav and downloaded from http://solomics.agis.org.cn/tomato/ftp/ [140]. For details about the eggplant pan-genome and pan-plastome data see Barchi et al., 2021 [141].
PepperPan
Link: http://www.pepperpan.org:8012/
Species: Capsicum annuum (pepper) and its wild relatives
The pepper pan-genome was constructed by mapping the sequences of 383 cultivars to the Zunla-1 genome as the reference [142]. The novel contig sequences (accession number GWHAAAT00000000) is available at http://bigd.big.ac.cn/gwh.
BRIDGEcereal
Link: https://bridgecereal.scinet.usda.gov
Species: wheat, maize, barley, sorghum, and rice.
The Blastn Recovered Insertion and Deletion near Gene Explorer (BRIDGEcereal) is a web application for mining pangenomes of cereals. It facilitates the identification of potential indels (insertion or deletions) for genes of interest with publicly accessible pangenomes of five major cereal crops, including wheat, maize, barley, sorghum, and rice [143].
PanSoy
Link: https://www.soybase.org/projects/SoyBase.C2021.01.php
Species: Glycine soja (wild soybean) and Glycine max (soybean)
From GmHapMap collection 204 phylogenetically and geographically distinct soybean accessions were chosen for construction of soybean pan-genome using de novo genome assembly method [144].
Sunflower Genome Database
Link: https://www.sunflowergenome.org
Species: Helianthus annuus (sunflower)
Sunflower pan-genome was generated using sequence from 287 cultivated lines, 17 Native American landraces, and 189 wild accessions representing 11 compatible wild species. Raw data used for pan-genome construction is available at NCBI and SNPs data is available at Sunflower Genome Database [145].
COTTONOMICS
Link: http://cotton.zju.edu.cn
Species: Cotton
It provides genome-wide, gene-scale structural variations detected from 11 assembled allopolyploid cotton genomes and are linked to important agronomic traits [146].
BGH
Link: https://banana-genome-hub.southgreen.fr
Species: Musa Ensete, and genomics data of 15 Musaceae species
The Banana Genome Hub (BGH), a web-based platform, supports users in exploring genes and gene families, gene expression patterns, associated SNP markers, etc. Users can also view chromosome structures, synteny, presence, absence variation, and genome ancestry mosaics [99].
CPBD
Links: http://citrus.hzau.edu.cn/
Species: sweet orange (Citrus sinensis), mandarin (Citrus reticulata), pummelo (Citrus grandis), grapefruit (Citrus paradisi), and lemon (Citrus limon).
The Citrus Pan-genome to Breeding Database (CPBD) was built using 23 genomes of 17 citrus species and has genetic variation data from 167 citrus accessions mapped to two reference genomes [147].
CitGVD
Links: http://citgvd.cric.cn/home/index
Species: citrus accessions
The Citrus Genome Database (CitGVD) hosts citrus genomic data, genetic variation data, and built-in analysis tools. It contains 1493258964 non-redundant SNPs, INDELs, and 84 phenotypes from 346 citrus individuals. Users can browse/search annotated genetic variations and visualize results graphically in a genome browser or tabular outputs. This portal supports comparative genomics between two or among several citrus accessions [148].
Apple pan-genome
Link: http://bioinfo.bti.cornell.edu/apple_genome
Species: Apple (Malus domestica) and its wild progenitors M. sieversii, and M. sylvestris
Apple pan-genome was constructed using phased diploid genome assemblies of Malus domestica cv. Gala, M. sieversii and M. sylvestris, and 91 sequenced genomes of additional accessions [149].
BnPIR
Link: http://cbi.hzau.edu.cn/bnapus
Species: Brassica oleracea, Brassica macrocarpa (cultivated and wild cabbage), Brassica napus
The Brassica napus pan-genome information resource (BnPIR) hosts eight high-quality B. napus reference genomes generated using PacBio sequencing and a collection of 1688 rapeseed re-sequencing data. This pangenome resource provides a pan-gene module, pan-genome Browser, and synteny data. It contains multi-omics data and common bioinformatics tools [150].
Cassava Pan-genome
Link: https://cassavabase.org/
Species: cassava (Manihot esculenta)
Two high-quality, chromosome-scale haploid genomes assemblies for African cassava cultivar TME204 (resistant to cassava mosaic diseases caused by African cassava mosaic viruses) were generated using both short-read and long-read sequencing methods (Illumina PE reads, PacBio CLRs, and HiFi reads [151].
Other public pan-genome data available (not yet included in crop databases or supported by Genome Browser and associated tools)
Pearl millet
Link: http://117.78.45.2:91/download/
Species: Pearl millet
Pearl millet pan-genome was constructed using whole genome assemblies of 11 accessions generated using a combination of PacBio long-read sequences, Bionano optical mapping data, Hi-C data, and Illumina short-read sequence data [55].
Sorghum pan-genome
Link: The bulk data is available at http://dataverse.icrisat.org/dataset.xhtml?persistentId=doi:10.21421/D2/RIO2QM.
Species: sorghum
This pan-genome was assembled using iterative mapping of whole-genome sequence data from 176 sorghum accessions to a sorghum reference assembly v3.0.1 from Phytozome [152]. The sorghum pan-genome consists of 210,805 sequences, including sequences from the sorghum reference genome assembly v3.1.0. In addition, it has 209935 assembled contig sequences from 176 sorghum accessions. The total genes represented in this data are 35719 genes (34211 genes from reference).
Barley pan-genome
Link: https://bitbucket.org/ipk_dg_public/barley_pangenome/src/master/
https://galaxy-web.ipk-gatersleben.de/libraries
Species: Barley cultivars, landraces, and a wild relative.
This first-generation barley pan-genome is based on chromosome-scale sequence assemblies for the 20 barley varieties, including landraces, cultivars, and a wild barley from global barley diversity, and by mapping whole-genome shotgun sequencing data from additional 300 barley accessions [11].
Soybean pan-genome
Links: The genetic diversities from the 29 genomes is available at https://figshare.com/s/689ae685ad2c368f2568
SNPs and small indels) data from the 2,898 accessions is available at (http://bigd.big.ac.cn/gvm/getProjectDetail?project=GVM000063.
Species: Glycine soja (wild soybean) and 26 Glycine max (soybean) accession
This graph-based pan-genome assembly was generated using de novo genome assemblies for 26 representative soybeans selected from 2,898 deeply sequenced accessions [14]. The sequencing data, assembled chromosomes, unplaced scaffolds, and annotations are available at the Genome Sequence Archive and Genome Warehouse database in BIG Data Center (https://bigd.big.ac.cn/gsa/index.jsp) under Accession Number PRJCA002030.
Chickpea pan-genome
Links:
Pan-genome assembly and annotations: https://doi.org/10.6084/m9.figshare.16592819
The variant calls: https://cegresources.icrisat.org/cicerseq
Manhattan and QQ-plots for Genome-Wide Association Study (GWAS) analysis: https://doi.org/10.6084/m9.figshare.15015309
The chickpea pan-genome was constructed using sequenced 3,366 germplasm lines, including 3,171 cultivated and 195 wild accessions, using iterative mapping and assembly method [153].
Pigeon pea
Link: https://research-repository.uwa.edu.au/en/datasets/pigeon-pea-pangenome-contig-assembly-annotation-snps-pav
Species: pigeon pea (Cajanus cajan)
The pigeon pea pan-genome based on 89 accessions, including 70 from South Asia, 8 from sub-Saharan Africa, 7 from South-East Asia, 2 from Mesoamerica and 1 from Europe) was constructed. The pangenome was generated using the reference genome assembly (C. cajan_V1.0) and iterative mapping and assembly method [154].
Sesame pan-genome
Species: sesame (Sesamum indicum L.)
The sesame pan-genome was constructed by mapping genome sequencing data from two landraces S. indicum cv. Baizhima and Mishuozhima and two cultivars, Yuzhi11, and Swetha to the S. indicum var. Zhongzhi13 reference genome [155].
Cotton Variome
Links: Genetic variation is available at https://www.ncbi.nlm.nih.gov/bioproject/PRJNA576032 and https://figshare.com/s/cb3c104782a1dcd90ab0
Species: Gossypium hirsutum and Gossypium barbadense
This is a comprehensive resource providing genetic variation data from 1961 cottons accessions from G. hirsutum and G. barbadense [156].
Melon pan-genome
Link: https://figshare.com/articles/dataset/melon_pangenome/17195072
Pan-genome of Cucumismelo L. consists of geneome sequence data from 297 accessions [157]
Cucumber pan-genome
Data availability: Genome assemblies of the 11 cucumber accessions have been deposited in NCBI GenBank under the accession number PRJNA657438.
Pan-genome graph constructed for 11 cucumber accessions. These 11 representative accessions from the 115-line core collection were sequenced using the PacBio platform, and the genome assemblies were generated using long-read and short-read data [158].
Strawberry pan-genome
The genome assembly and annotation files are available in the Genome Database for Rosaceae. The pan-genome browser or query support is not available.
Link: https://www.rosaceae.org/species/fragaria/all
Species: cultivated and wild strawberry accessions
This strawberry pan-genome was generated using chromosome-scale reference genomes assemblies of five diploid strawberry species (Fragaria mandschurica, Fragaria daltoniana, Fragaria pentaphylla, F. nilgerrensis, and F. viridis) strawberry species and genome resequencing data of 128 accessions [159].
Walnut pan-genome
Link:https://db.cngb.org/search/project/CNP0001209.
Species: walnut (Juglans nigra)
A high-quality genome assembly of black walnut (Juglans nigra) genotype NWAFU168 using both short-read and long-read platforms (Illumina, Pacbio, and Hi-C) was constructed. A Walnut pan-genome was built using this reference genome and mapping sequence data from 74 walnut accessions [56].
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2025 MDPI (Basel, Switzerland) unless otherwise stated