GWAS analysis of wheat pre-breeding germplasm for terminal drought stress using next generation sequencing technology

Bread wheat (Triticum aestivum L.) is one of the most important cereal crops for food security. Of all the stresses that curtail wheat productivity, drought has the most detrimental effects. Especially terminal drought stress i.e. at the time of flowering imposes a big challenge to sustain grain production. In the current study, 339 pre-breeding lines derived from three-way crosses of exotics x elite lines were evaluated in the irrigated and drought stress environments at Obregon, Mexico for the year 2016 and 2018. Drought significantly reduced yield (Y), spike length (SL), number of grains per spikes (NGS) and thousand kernel weight (TKW) by 46.4, 19.2, 23.5 and 25.9%, respectively in comparison to irrigated conditions. Kernel abortion (KA), highly correlated with Y, increased significantly (11.6%) under drought stress environment. Population structure analysis in this panel revealed three sub-populations and a genome wide linkage disequilibrium (LD) decay was at 2.5 cM. Single marker and haplotypes-based genome wide association study (GWAS) revealed significant associations on three chromosomes; 4A (HB10.7), 2D (HB6.10) and 3B (HB8.12) with Y, SL and TKW, respectively. Likewise, associations on chromosomes 6B (HB17.1) and 3A (HB7.11) were identified for NGS and on 3A (HB7.12) for KA. Five traits i.e. normalized difference vegetation index (NDVI), canopy temperature depression (CTD) days to heading (DTH), NGS, KA were associated at chromosome 3A both under irrigated and drought conditions however, different haplotypes were estimated. Twenty-six SNPs were part of 10 haplotype blocks associated with Y, SL, TKW, NGS and KA. In silico analysis of the associated SNPs/haplotypes showed hits with candidate genes known to confer abiotic stress resistance in model species and crops. Potential candidate genes include those coding for sulfite exporter TauE/SafE family in Arabidopsis thaliana, TBC domain containing protein in Oryza sativa subsp. Japonica and heat shock proteins in Aegilops tauschii subsp. tauschii were revealed. The SNPs linked to the promising genes identified in the study can be used for marker-assisted selection.


Introduction
Uneven pattern of rainfalls and more frequent spells of drought are among fatal threats for crop production. Equally, global freshwater resources are shrinking and further creating harsh conditions for global agriculture. Therefore, there is a pressing need to develop crops that produce higher yield with less water. Wheat (Triticum aestivum L.) is grown in more than 85 countries of the globe with about 2.1 million km 2 total harvested area and contributes to about the 20% of the total dietary calories and proteins worldwide [1]. Due to its cultivation in wide range of environments ranging from very favorable ones in Western Europe to severely stressed ones in parts of Asia, Africa and Australia, it is highly prone to various biotic and abiotic stresses. Of all the stresses curtailing wheat productivity, drought has the most detrimental effects [2]. Especially terminal drought stress i.e. drought stress at the time of flowering reduces grain yield through ceased growth of spike and reduction in grain number and grain weight [3] [4][5] [5]. Improvement of drought stress tolerance has therefore become the prime target for wheat breeders.
Traditional breeding methods have achieved tremendous success in the last century [6] in pushing the annual genetic gain in wheat by using desirable genetic variations within primary gene pools. However, to feed the burgeoning world population and to develop climate smart varieties more efforts are required. Genetic resources such as landraces and wild relative collections, maintained in gene banks, are a reservoir of unexplored beneficial alleles. Worldwide wheat breeders have access to up to 800,000 accessions many of which show adaptation to different abiotic stresses [7], for example, Creole wheat landraces (the landraces introduced to Mexico from Europe). However, breeders are reluctant to introgress these rare unexplored alleles into wheat cultivars because of the challenges involved in identifying useful, novel diversity and introgressing it into well-adapted elite cultivars with minimum linkage drag. 5 Pre-breeding plays a key role in minimizing the barrier in mobilizing novel genetic variation into breeding programs introducing less undesired, linked genes. It refers to the development of bridging, semi-finished or intermediate germplasm having introgressions from exotic or unimproved germplasm [8]. The Seeds of Discovery (SeeD) project was launched at CIMMYT to create new pre-breeding germplasm in order to enhance the use of genetic resources in breeding programs (SeeD 2011). Within the framework of SeeD, a panel of pre-breeding lines (PBLs) has been generated by three-way (exotic/elite1//elite2) crosses of hundreds of exotics (landraces and synthetics) with 25 elites [9]. These PBLs have been distributed to different international partners for multi environment trials for the release of varieties and for novel gene discoveries for complex traits. We investigated a subset of this germplasm under terminal drought and irrigated environments to identify genomic regions associated with drought tolerance traits using genome wide association (GWA) approach. GWA studies have been reported for various traits in wheat, including grain yield and yield components and physiological traits under multiple environments [10][11] [12][13] [14][15] [16]. GWA aided by haplotype based mapping has proven even better for complex traits [17][18] [19]). We conducted both single marker and haplotypes-based GWA in the present study.

Plant material and crossings
Three hundred and thirty-nine pre-breeding lines derived from the three-ways top-crosses and their performance was evaluated under drought conditions at Obregon, Mexico. In the first crossing scheme, the landraces/accessions from the gene bank collected from different areas of the 6 world were crossed with best elite line of wheat and then F1 hybrids were top crossed with second best elite line (landraces/elite1//elite2). Likewise, in the second scheme synthetics wheat derived from the crosses of Triticum turgidum subsp durum, Triticum dicoccum and Aegilops tauschii were crossed with best elite line and then F1 top crossed with second best elite (synthetic/elite1//elite2).
In total 25 elite parents developed at CIMMYT and other places, displaying higher yield potential and rust (leaf and stripe) resistance were used in different cross combinations. These crossing schemes resulted into 1,200 TC1F1 crosses. However, 244 of these TC1F2 populations were advanced, based on their performance in the field for different yield contributing traits. A selected bulk method was used to advance these generations up to TC1F5 and 339 best lines were selected for further evaluation.

2.2.Experimental setup
The plant material was sown at the experimental station "campo experimental Norman E Borlaug (CENEB)" of CIMMYT Obregon, Sanora during the years 2015-16 and 2017-18. This diverse germplasm was planted in two replicates following the alpha lattice design under wellwatered and drought conditions. Each genotype was planted in 2 rows each with 2-meter length and 40 cm distance between each row. Five irrigations, each containing approximately 125 mm were given to normal irrigated treatment. In contrast, only two irrigations (approximately (55+125=180 mm)) were applied until tillering stage in drought treatment.

2.3.Phenotypic characterization
Different agro-physiological parameters were recorded during and after maturity and are as follow: (NGS for drought plants/NGS for well-watered) x 100 j) Grain yield (Y) of each genotype was recorded after harvesting the both the rows and yield was expressed in kilograms per hectare. 8

2.4.Molecular characterization
Leaf samples were taken at booting stage from TC1F5 plants, snap frozen in liquid nitrogen and stored at -80 o C until further analysis. Genomic DNA was extracted using modified cetyltrimethylammonium bromide (CTAB) protocol [9], followed by quantification by Nano-Drop

2.5.Haplotype Characterization
Haplotype map was generated in R using the algorithm-based script described by (Gabriel et al. (2002). In brief, D prime (D′) was generated through 95% confidence and three categories were made for the comparison called, "strong linkage disequilibrium (LD)", "inconclusive" or "strong recombination". Haplotype block was created using the R code as used in previous study Singh et al. [9]. Briefly, we calculated linkage disequilibrium parameter D' and D' 95% confidence intervals between SNPs and each comparison was categorized as "strong LD," "inconclusive," or 9 "strong recombination." A haplotype block was created if 95% of the comparisons in one block were in "strong LD." Cut off p-value for Hardy Weinberg was established at 0.001 while minimum value for marker allele frequency was set to 0.05. Haplotypes were not constructed for the individuals having more than 75% missing data. If the multiple SNPs were indicating the same genetic position, only the first marker was considered to construct haplotype map. The haplotypes were displayed as blocks of marker numbers and alleles and "HB" prefix was used for each haplotype block followed by the chromosome number followed by dot and then the increment number from 1 to N where, N is the total number of the haplotype blocks along the chromosome.

2.6.Genome Wide Association results
Population structure and genome wide linkage disequilibrium (LD) decay has been reported in this panel before by Ledesma-Ramírez et al. [18]. Principal component analysis (PCA) file from Ledesma-Ramírez et al. [18] was used while kinship matrix was generated using β, and μ are the corresponding effects; and residual effects are represented by e in the matrix. A and B matrix were fitted as fixed effects, and C and e matrix were fitted as random effects.

2.7.Candidate gene analysis
Significant markers from the above-mentioned analysis had their 69 bp sequences used as input query at EnsemblPlants using the BLAST tool at default settings. The search was performed primarily targeting the 5 Triticeae species available in this database. The results were selected based on minimum length of 69, score > 80 and sequence identity > 90 %. The full sequence of the hits that fulfilled the established criterion were retrieved and queried at NCBI BLASTN tool using default settings to investigate the underlying function, if any, and creating a list of potential gene candidates for future validation.    Figure S1.b-c and Table S1) For CTD, association was obtained on chromosome 7B for both the treatments ( Figure S2.b-c and Table S1).

ii. Plant height and earliness
GLM and MLM of GWAS indicated a significant association (P<0.05) of PH at chromosome 19 (7A) for both years and treatments ( Figure S3.b-e and Table S1). Significant associations of DTH were recorded ( Figure S4.b-e and Table S1) on chromosomes 3A and 6B for both the years under irrigated conditions. On chromosome 3A, favorable allele TA in HB717 showed earliness in DTH in comparison to CC allele under irrigated conditions ( Figure S4.f). A strong association for DTM on chromosome 1B during both years and treatments ( Figure S5.b-e and Table S1) was identified. Haplotype HB2.5 with favorable allele AT showed earliness in comparison to other alleles ( Figure S5.f).

iii. Grain yield (Y) and yield related traits
For Y, significant associations (P<0.001) were identified on chromosomes 4A and 6B in both conditions and years (Figure 1.b, 1c, 1d & 1e and Table 2). Haplotype based analysis also identified HB10.1 on chromosome 4A to be associated with Y with haplotype TCG as favorable allele as compared to CTC for the year 2018 in both conditions (Figure 1.f). Likewise, for another associations on chromosome 4A, HB10.6 and HB10.7, haplotype AC was favorable in comparison to CT for 2016 (Table S2) in both conditions (Figure 1.g, h). For SL, GWAS analysis indicated  Table 2). For TKW, association was identified on chromosome 3B ( Figure 5.b-c and; Table 2). The haplotype-based analysis indicates that haplotype TAGGCT (HB8.12) is favorable in comparison to TAGGCC or CGATC under irrigated condition of both the years ( Figure 5.d).

3.3.Multiple quantitative traits loci (QTL's) at same chromosome
Single marker trait analysis revealed significant associations of NDVI, CTD, DTH, NGS and KA at chromosome 3A under both irrigated and drought conditions. TKW and KA were also significantly associated at chromosome 3B under drought conditions (Table 2 and S1). Moreover, maker-trait association of NGS and Y was significantly associated on chromosome 6B.

3.4.In silico results
In silico analysis of important associations revealed many interesting candidate gene hits.   [19] [24]. We conducted both single marker and haplotypes-based GWAS to dissect QTLs for drought tolerance in a panel of pre-breeding lines derived by three-way crosses of exotics with CIMMYT's best 25 elites.
Panel investigated in the present study has been analyzed for population structure and LD decay which showed a moderate population structure in the panel and an LD decay of 2.5 cM genome wide [18]. The genome wide LD decay of 2.5 cM indicates its higher genetic diversity as compared to various other wheat panels used in previous studies reporting from 5 [30]. These results are expected since the panel has been drawn from a large set of pre-breeding lines developed by crossings among exotics and elites [9]. Each pre-breeding line acquired approximately 25% of the exotic genome and 75% of the elite genome at an early stage thus allowing recombination between exotic and elite genomes. Further, many of the agronomical and physiological traits investigated on the panel including DTM, SL, NGS, TKW and Y showed high heritability both under irrigated and drought conditions. Shokat et al. [31] reported that plant traits exhibiting higher heritability and genetic advance can directly be selected for crop improvement.
Both single marker and haplotype-based GWAS revealed the association of Y on chromosome 4A in both the years under irrigated and drought conditions (Figure 1.b-e, percentage variation explained as large as 41% in some studies [34]. The present and previous studies therefore suggest an important role of 4A chromosome in drought adaptation in wheat. Three haplotypes blocks, i.e. HB10.1 (TCG), HB10.6 (AC) and HB10.7 (AC), within 9.3 to 60.4 cM resulted in significantly higher yield across years (Figure 1.f-h). BLASTN analysis indicates that SNPs in these genomic regions are part of a gene coding for sulfite exporter TauE/SafE family protein in Arabidopsis thaliana. This protein family is known to be involved in taurine metabolism [35] and transport of anion across the cell membrane [36]. However, role of TauE/SafE protein family under drought stress is not known yet.
SL is an important yield contributing trait and longer compact spikes are one of the attractive plant traits to obtain higher Y in wheat [37]. Mwadzingeni [40] reported that QTL related to spike morphology is located on 2DS. BLASTN analysis showed hits with candidate gene "LOC109764454" in Aegilops tauschii subsp. tauschii indicating that this genomic region may have been contributed by synthetic parents of current germplasm. The locus LOC109764454 confers a heat shock protein in Aegilops tauschii [41].
Limited availability of water at the time of anthesis reduces pollen viability [42] and seed setting which increases kernel abortion (KA). Maintenance of high number of grains spike -1 (NGS) under drought is one of the important challenges in T. aestivum to obtain ideal yield under drought condition. This trait is positively correlated with grain yield under drought and irrigated conditions [43] [44]. We found higher heritability and significant association of NGS with chromosome 6B and 3A both under irrigated and drought conditions and trait KA was also associated with chromosome 3A (Figure 3.b-c & Figure 4.b). Likewise, NDVI, CTD and DTH, were also associated with chromosome 3A. In line of these, Li et al. [19] reported the significant association of NGS and Y at chromosome 3A. Likewise, Pradhan et al. [45] also reported the marker traits association of NGS on chromosome 3A explaining 12-16% of variability. Other studies indicate that chromosome 6B is the third major chromosome of wheat covering more than 5% of wheat diversity with more 2000 genes [46]. Nadolska-Orczyk et al. [47] reported the presence of genes controlling supernumerary spikelets trait on this chromosome. In present study, HB 17.1 (TC) and HB7.11 (CT) were found to be associated with higher NGS under drought positioning at 5.7 to 6.4 cM and 106.6 to 106.9 cM, respectively ( Figure 3.e-f). HB7.12 with allele CGGTC responsible for lower KA was present at 111.6 to 115.9 cM (Figure 4.c). HB7.11 and HB7.12 are closely placed indicating that NGS and KA may be linked together. BLASTN analysis indicated that SNP 3222182 (part of HB17.1) is the part of gene "ATY51569" coding for omega gliadin-D1, gluten protein in T. aestivum [48]. Additionally, another SNP 1050615 from the same haplotype block is the part of gene "LOC109767680" coding for osa-like proteins. The role of osa-MiRNAs has recently been demonstrated in rice in degrading and synthesizing nucleic acids and thus manipulating their abundance under drought stress [49].
The SNP 1038112 from HB7.12 associated with KA showed hits with gene "LOC109781107" of Aegilops tauschii coding for NPF 8.2-like protein [50]. In A. thaliana, NPF 8.2-like protein is reported to be responsible for the transport of phytohormones [51] [52]. NPF family members display sequence and structural homologies with peptide transporter proteins involved in the uptake of di-and tri-peptides in organisms. In plants, they were initially characterized as nitrate or peptide transporters [53]. In recent years, several other substrates have been identified including abscisic acid (ABA). ABA has been well known to have a significant role under drought stress tolerance by controlling ABA-regulated mechanism [54] .
Thousand kernel weight (TKW) is one of the main contributors of grain yield [55] and less reduction in TKW enables plants to maintain ideal yield under drought conditions [56]. Liu et al. [57] reported positive association of TKW with grain yield under irrigated and drought conditions indicating that plant genotypes having higher TKW under irrigated conditions often have a chance to maintain higher TKW under drought conditions. We found significant association of TKW at chromosome 3B ( Figure 5.b-c)  thaliana involved in transport of phytohormones. Stable associations of certain markers were estimated at chromosome 3B to obtain higher TKW under irrigated conditions for both the years.
These sequences have been reported in Chinese spring cultivar earlier, but their function was again unknow. This sequence is the part of gene "Os01g0100100", working in O. sativa subsp. japonica as a part of TBC domain containing protein. All these genes can further be studied to understand their role in bread wheat grown under drought stress andto find their association with yield and yield contributing traits. Likewise, validated genes can further be utilized in marker assisted selection to improve the drought tolerance in wheat. Moreover, reported germplasm can be helpful to improve the tolerance of bread wheat commercial cultivars.

Funding
Authors are thankful to Islamic Development Bank, Saudi Arabia and Wheat Seed of Discovery project at international maize and wheat improvement center (CIMMYT), Mexico for providing funds to conduct this research.