Novel Insights into the Composition and Characterization of the Glutenin Genes in Common Wheat Xinmai 26

: High molecular weight glutenin subunits (HMW-GS) and Low molecular weight glutenin subunits (LMW-GS) in mature grains play important roles in the formation of glutenin macropolymer and gluten quality. To characterize the expressed glutenin genes of the bread wheat variety Xinmai 26 during seed development, in this study, we measured the dough rheological properties of mature grains through farinograph and gluten testing system, and revealed its strong gluten quality. The compositions of HMW-GS and LMW-GS were analyzed by sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) and matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF-MS). Further, a total of 18 full-length transcripts were obtained by third-generation RNA sequencing, including 5 transcripts of HMW-GS genes and 13 transcripts of LMW-GS genes (8 intact genes and 5 pseudogenes). The deduced protein structures of transcript sequences exhibit the typical HMW-GS and LMW-GS structural characteristics. Moreover, a specific functional marker was developed to make better use of the extra cysteine residue of 1Dx5 subunit. This study provides an efficient method to accurately identify glutenin genes in bread wheat through matching full-length transcripts to their spectrum of glutenin, 85%, detector gain: 13.3X. in positive ion mode. All good reproducibility 0.05 relative standard deviation (RSD) in three replicates. mean molecular of three

3 quality is the highest in comparison with Glu-A1 locus and Glu-B1 locus [15]. HMW-GS are widely selected and exploited in wheat quality breeding programs since they can be easily identified by SDS-PAGE and molecular markers [16]. The genes encoding HMW-GS are located at the Glu-A1, Glu-B1 and Glu-D1 loci on the long arms of the group 1 chromosomes, each locus containing two tightly linked genes encoding one xand one y-type subunit [17]. Only three to five HMW-GSs could be observed in the bread wheat cultivars as a consequence of allelic variations and gene silence [18]. The structure of HMW-GS is generally composed of a large central repetitive domain flanked by short non-repetitive N and C-terminal domains. The quantity and distribution of cysteine residues in the three domains are of great significance for the wheat flour milling quality, since disulfide bond between two cysteine residues can alter the polymer structure and protein conformation [19]. Especially, the serine (118th amino acid) residue in the repetition region of 1Dx5 subunit is replaced by cysteine [20], which is considered superior for wheat quality. The extra cysteine residue is thought to be associated with higher elasticity, better baking and noodle processing quality.
LMW-GS accounts for about 50% of gluten proteins and contributes to 30% of the technological quality. Compared with few HMW-GSs, LMW-GSs are encoded by a multigene family located at the Glu-A3, Glu-B3 and Glu-D3 loci on the short arms of chromosomes 1A, 1B and 1D, respectively [21]. Gupta and Shepherd [22]  m, LMW-s and LMW-i (m, s, and i represent methionine, serine and isoleucine, respectively.). Typically, the N-terminal amino acid sequence of the LMW-s type subunits is only SHIPGL-, while LMW-m type subunits contain various N-terminal sequences such as METSHIGPL-, METSRIPGL-, and METSCIPGL- [24]. The LMWi type subunit starts directly with the repetitive region of ISQQQQ-after the signal peptide due to lacking the N-terminal domain [25]. Most LMW-GSs contain eight cysteine residues, although their positions vary in the three types of subunits, and these cysteine residues play crucial role in the formation of intra-and inter-molecular disulfide bonds in the gluten macropolymer [26].
It is undeniable that the allelic variation of the LMW-GS is very difficult to be distinguished by SDS-PAGE due to their complexity, heterogeneity and co-migration with gliadins [27]. Therefore, a variety of new techniques have been developed to identify the allelic variation of LMW-GS, including two-dimensional electrophoresis , ultraperformance liquid chromatography (UPLC), reversed-phase highperformance liquid chromatography (RP-HPLC), and matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF-MS) [28,29].
MALDI-TOF-MS is currently considered to be the most efficient method to characterize wheat storage proteins due to its accuracy and sensitivity, and requires relatively small samples and only 4-5 minutes per sample for analysis [30]. In regard to cloning of glutenin genes, the sequences could be acquired through PCR amplification from genomic DNA with Sanger sequencing and BAC sequencing as previous studies [31,32]. With the improvement of genome sequencing technology, the Pacific Bioscience (PacBio) third-generation sequencing platform can offer high throughput of 50,000-70,000 reads per reaction and a read length over 3 kb to ensure assembly of genes with repetitive sequences [3,33]. Although the inherent sequencing errors are inevitable in this approach, the errors are random and can be corrected by redundancy.
Moreover, the SMRT (single molecule real-time) cell template structure enables the polymerase to loop multiple times on a single molecule. Therefore, the obtained CCS (circular consensus sequences) could achieve improved accuracy [34]. This technology is efficient and economical in generating sequences from gene families and has been applied to wheat gluten studies [35].
The bread wheat variety Xinmai 26, with high protein content and strong gluten strength, is widely cultivated in the Huanghuai region. Meanwhile, several new varieties with similar high quality have been bred utilizing the accession as one of the parents. However, the characterization of expressed glutenin genes of this variety, especially LMW-GS genes, are still unknown during seed development. In this study, the compositions of HMW-GS and LMW-GS in the bread wheat Xinmai 26 are identified by SDS-PAGE and MALDI-TOF-MS, and its dough rheological character are measured. Further, the full-length transcript sequences acquired by PacBio Sequel were analyzed and characterized to efficiently identify the mRNAs of transcribed HMW-GS and LMW-GS genes from Xinmai 26. Moreover, a specific functional marker was developed to detect the extra cysteine residue of 1Dx5 subunit. The expressed glutenin genes could be accurately determined through matching full-length transcripts to their spectrum of glutenin, which can provide valuable gene resources for wheat quality breeding.

Dough rheological properties of Xinmai26
Dough rheological properties of Xinmai26 were measured through farinograph with 50 g of flour (Fig. 2, Table 1A) and gluten testing system with 10 g of flour (Table 1B).
Water absorption (WA), development time (DT), stability time (ST) and degree of softening (DS) based on the farinograph were determined to be 68.4 %, 23.2 min, 23.4 min, and 6 FU, respectively. As one of the most important indexes in evaluating product price calculations, WA is positively related to the yield of finished bakery product and maintenance of food character. Apparently, the WA value of 68.4 % herein exceeds the upper limit of most bread wheat [48]. Moreover, higher ST and lower DS in the Xinmai26 indicate that the corresponding dough would be able to sustain long mechanical processing treatments, implying that the Xinmai26 has strong gluten elasticity. Additionally, wet gluten content (WGC), dry gluten content (DGC), gluten  [49]. The larger GI of Xinmai26 reflects its stronger gluten strength.

Separation and identification of HMW-GS and LMW-GS
The HMW-GS and LMW-GS of Xinmai26 were separated and identified based on the standard wheat cv. 'Chinese spring' by SDS-PAGE (Fig. 3). The HMW-GS encoded by the Glu-1 loci are determined as 1Ax1, 1Bx7 and 1By8, 1Dx5 and 1Dy10, respectively.
A total of 8 protein bands for Glu-3 loci were identified using the method described by Ibba et al [50], including 1 of the Glu-A3 locus, 1 of the Glu-B3 locus, and 6 of the Glu-  [29]. However, only one specific spectrum peak (38,841 Da) was detected in the Xinmai26. The Glu-D3 alleles were found to be the most complicated in the Glu-3 loci. Their characteristic peak number for each allele ranges In summary, the characteristic peak numbers of the glutenin genes in the MALDI-TOF-MS were consistent with those of protein bands in SDS-PAGE.

Identification of full-length transcripts of glutenin genes and their corresponding genes.
In order to obtain all the glutenin genes expressed in Xinmai 26, the third-generation transcriptome sequencing was performed through PacBio Sequel II plantform on seeds 15 days after flowering, which generated 64,953,298 subreads with mean length 1,566 bp. A total of 1,217,179 reads of circular consensus sequence (CCS) were extracted from the subreads, and 1,150,675 full-length non-chimeric sequences (FLNCs) were acquired after refining by IsoSeq3 pipeline. Due to the highly complex sequence structure of HMW-GS and LMW-GS, the redundant sequences were further removed utilizing a custom pipeline in FLNCs. Finally, 5 transcripts of HMW-GS genes and 13 transcripts of LMW-GS genes were obtained respectively in Xinmai26. To verify the accuracy of transcript sequences of glutenin genes acquired by PacBio Sequel II, the comparison was conducted between the gene sequence of transcript624 acquired through PCR amplification from genomic DNA with Sanger sequencing and the transcript sequence of that obtained by third-generation sequencing (Fig. S1). In total 11 gene sequences of transcript624 with PCR amplification were aligned from seven independent clones, amongst two clones were sequenced for three times. Obviously, the sequences of the same clone were completely identical through Sanger sequencing.
A few SNPs appeared in gene sequences from different clones, which may be caused by base mismatch in the process of PCR amplification. The final sequence obtained by correcting each other with seven independent clones was identical with the transcript sequence acquired by PacBio Sequel II, indicating that the results with third-generation sequencing are more reliable due to more cycles and stronger error correction ability.  (Table S2), and encoded 830, 789, 848, 720, and 648 amino acid residues, respectively. The molecular weight calculated from the deduced amino acid sequences of all five transcripts were highly consistent with the peak values of those detected by MALDI-TOF-MS, error ranging from 0.05 % to 0.54 % (Table 2). High identities were found among the five transcript sequences and those published HMW-GS genes from Triticeae, ranging from 99.96 % to 100 %.
Meanwhile, in order to further confirm the phylogenetic relationship between the five transcript sequences and HMW-GS gene family from Triticeae, the phylogenetic tree was constructed based on the deduced amino acid sequences of 93 genes, including transcript1854, transcript1782, transcript272, transcript1528 and transcript624 genes obtained in this study and the other 88 HMW-GS genes registered in GenBank (Fig.   S2). As clearly shown in this figure, two clades can be classified for the 93 HMW-GS genes: x-type and y-type. The three subgroups from the x-type group and the two subgroups from the y-type group were clearly separated by the Glu-1Ax, Glu-1Bx, Glu-1Dx, Glu-1By and Glu-1Dy. Since the HMW-GS alleles controlled by the same locus were closely clustered in each individual lineage, the 1Ax1, 1Bx7, 1Dx5, 1By8 and 1Dy10 from Xinmai 26 were determined to be encoded by transcript1854, transcript1782, transcript272, transcript1528 and transcript624 genes, respectively.
For transcripts of the LMW-GS genes, altogether 13 complete coding sequences were acquired (Table S1), in which eight genes (transcript4321, transcript6479, transcript206, transcript2357, transcript4445, transcript1264, transcript3598, and transcript907) were found to contain the full-ORF (Open Reading Frame). The other 5 genes (transcript3234, transcript3226, transcript3547, transcript2435, transcript308) were pseudogenes with at least one stop codon. The sequence alignments indicated that the 13 transcript sequences were completely identical with those published LMW-GS gene sequences from NCBI. Based on the sequence similarity, four genes (transcript3234, transcript3226, transcript3547, transcript907) could be attributed to the Glu-A3 alleles. The transcript sequences of two genes (transcript3598, transcript2435) were consistent with those of the Glu-B3 alleles. The other seven genes (transcript4321, transcript6479, transcript206, transcript2357, transcript4445, transcript1264, transcript308) were assigned to the Glu-D3 alleles. Particularly, the nucleotide sequences of transcript3234 and transcript308 were completely consistent with those of specific A3-400 (Glu-A3c) and D3-589 (Glu-D3a) from MCC (micro-core collections) obtained by the LMW-GS gene molecular marker system [23]. Therefore, the characteristic peaks of Glu-A3 and Glu-D3 allele in the MALDI-TOF-MS were respectively determined to be those of Glu-A3c and Glu-D3a. The specific spectrum peak (38,841  The LMW-m and LMW-s type subunits composed of other genes from Glu-B3 and Glu-D3 were grouped into the other branch since the sequences of them generally display higher consistency. The phylogenetic tree indicated that LMW-i type genes had undergone greater divergence during evolution compared to LMW-s and LMW-m genes [51].

Molecular characterization of the glutenin genes in the Xinmai26
The deduced protein structures of transcript1854, transcript1782, transcript272, The domains A and C were characterized by the presence of most of the cysteines (Fig.   5). Based on the typical position of cysteines residues [52], the first two residues in the x-type HMW-GS (transcript1854, transcript1782, transcript272) were linked by an intramolecular bond, and thus the others were available for intermolecular linkages. The secondary structures of the HMW-GS and LMW-GS obtained in this study were predicted by the PSIPRED server, as shown in Table 3. A total of 26 α-helixes and 6 β-strands from the HMW-GS only appeared in the N-and/or C-terminal domains, in which 23 of them were found in the N-terminal domains. The other 9 secondary structures were distributed in the C-terminal domains. Compared with the Glu-A1, the subunit combinations of transcript1782 (1Bx7) + transcript1528 (1By8) from Glu-B1 and transcript272 (1Dx5) + transcript624 (1Dy10) from Glu-D1 contained more αhelixes and β-strands, up to 15 and 11, respectively. As for subunit transcript1854 (1Ax1), a total of 59 amino acid residues were involved in the formation of α-helixes (four in N-terminal domain and two in C-terminal domain). In addition, in terms of the number of amino acid residues involved in the formation of secondary structure, the largest number of amino acids (73) came from the transcript1528 subunits (1By8). The secondary structures of the LMW-GS were mainly composed of α-helixes except for a few β-strands. A total of 81 α-helixes were predicted in the LMW-GS, and only 3 βstrands (6 amino acid residues) appeared in the C-terminal domain. In addition, the four α-helixes were also found in the repetitive domain of LMW-GS, and the proportion of amino acid residues participating in secondary structure of LMW-GS was much higher than that of HMW-GS, up to 32.17% -44.25%. Among the LMW-GS genes, the LMWi type subunit transcript907 (Glu-A3c), located at the 1A chromosome, was established to contain nine α-helixes. It was also found that the number of amino acid residues involved in the formation of its secondary structure were the largest, reaching 137. The Glu-A3c of the Aroona NILs was confirmed to play an important role in dough resistance and extensibility [53].  (Fig. S4). To ensure the accuracy and reliability of the dCAPS marker, the amplified products of 95 bp from two wheat varieties were sequenced (Fig. S5). The results showed that only Xinmai 26 contains the recognition site of Sal I enzyme.

Specific functional marker for the extra cysteine residue in the 1Dx5 subunit
Subsequently, the subunit composition and the extra cysteine residue of Glu-D1 in 143 wheat varieties was determined by SDS-PAGE and the dCAPS marker (Table S1). The

Advantages of full-length glutenin genes obtained by three-generation transcriptome sequencing
Glutenin genes play a major determinant for the bread-making quality among bread wheat varieties. A large number of HMW-GS genes have been isolated and characterized based on PCR amplification in previous reports [54][55][56][57][58]. For example, the x-type HMW-GS of Aegilops umbellulata was determined to carry much longer repetitive domains, which may have potential value in improving the processing properties of hexaploid wheat varieties (Liu et al., 2003). Since HMW-GS contains long repetitive sequences, especially the x-type subunit with high molecular weight, effective primer walking cannot be performed in some regions. Therefore, the full length of many HMW-GS genes was obtained by nested deletion [59]. In addition, LMW-GS genes have also been isolated and characterized by using cDNA or genomic DNA library screening [60,61]. A study based on BAC library screening and proteomics analysis showed that Glu-A3, Glu-B3, and Glu-D3 in the Chinese bread wheat cultivar Xiaoyan 54 contain 4, 3, and 7 genes, respectively [62]. Particularly, the 18, 17 and 17 LMW-GS gene sequences were successfully isolated from Norin 61, Glenlea, and Xiaoyan 54, respectively through the LMW-GS gene marker system [63]. Thus, the LMW-GS gene marker system is considered to be useful and efficient in identifying and characterizing LMW-GS genes in bread wheat [23,50,63,64]. Moreover, this method has advantages over gene-specific PCR and library screening in isolating LMW-GS genes. Recently, with the improved genome sequencing of Chinese Spring with PacBio long reads and utilization of BioNano genome maps to improve and validate the sequence assemblies [33], the complexities of the wheat gluten genomic regions could be better resolved [65]. Through PacBio RSII third-generation RNA Due to the high GC content in glutenin genes, the accuracy of high-fidelity enzyme remains a challenge during PCR amplification. In this study, the comparison was conducted between the gene sequence of transcript624 acquired through PCR amplification with Sanger sequencing and the transcript sequence obtained by thirdgeneration sequencing. The result indicates that the sequences of the same clone are completely identical by Sanger sequencing. A few SNPs was found in gene sequences from different clones of the transcript624. The final sequence obtained by correcting each other with seven independent clones is identical with the transcript sequence acquired by PacBio Sequel II, showing that the third-generation transcriptome sequencing could improve the sequence accuracy of glutenin genes since the errors are random and can be overcome by redundancy. Therefore, the results obtained by thirdgeneration sequencing are more reliable due to more cycles and stronger error correction ability.

Composition of glutenin genes in the Xinmai26 and their effects on gluten quality
The strong gluten wheat variety Xinmai 26 has been cultivated in Huanghuai wheat region of China for many years, and the average stability time of the dough can reach 23 minutes (2019-2020 season). New wheat varieties with strong gluten such as Xinmai 38 and Xinmai 45 have been bred using Xinmai 26 as one of parents. It is well known that HMW-GS and LMW-GS in mature seeds play important roles in the formation of glutenin macropolymer and gluten quality, especially for dough extensibility and strength [68]. The effects of HMW-GS on dough properties and pan bread quality have been well studied. According to the contributions of individual HMW-GS to dough strength quality, different subunit combinations at each of the three loci were ranked as: 1Ax1 > 1Ax2* > 1AxN at Glu-A1, 1Bx7 + 1By8 ≥ 1Bx13 + 1By 16 > 1Bx17 + 1By18 = 1Bx7 + 1By9 at Glu-B1, and 1Dx5 + 1Dy10 > 1Dx2 + 1Dy12 > 1Dx4 + 1Dy12 at Glu-D1 [69]. The Xinmai26 displays the optimal subunit combinations in this study, which is determined to be 1Ax1, 1Bx7 + 1By8, and 1Dx5 + 1Dy10 based on SDS-  [71]. The processing quality of wheat flour can be improved by increasing the ratio of HMW/LMW subunits.
The effects of Glu-3 alleles and those of Glu-1 alleles are largely additive, and the interactions between these loci also have significant effects [25]. It is generally accepted that Glu-A3 and Glu-B3 alleles play a major role in determining the flour processing qualities among the three Glu-3 loci, while Glu-D3 alleles play minor roles in determining quality variation in bread wheat [53]. Particularly, the Glu-A3 locus was considered to have the biggest contribution to quality among all Glu-3 loci. In detail, the ranking of alleles for dough strength is Glu-A3d > Glu-A3b > Glu-A3c > Glu-A3f > Glu-A3a > Glu-A3e, whereas ranking for dough extensibility is slightly different, viz.,  [72] demonstrated that the LMW-GS allele Glu-A3a encodes a specific LMW-i type B-subunit that significantly affects wheat dough strength. Cane et al [73] believed that Glu-A3e was correlated with inferior dough resistance and extensibility, whereas Zheng et al [74] found that Glu-A3e was a favorable allele for dough-mixing properties. In this study, the LMW-i type subunit transcript907 was determined to be Glu-A3c based on the molecular weight and MALDI-TOF-MS. The Glu-A3c of the Aroona NILs was confirmed to play an important role in dough resistance and extensibility [53].
The secondary structure is the foundation for a highly complex spatial conformation and is composed of α-helices and β-strands in the wheat gluten. Masci et al [75] believed that helix-helix interactions were involved in guiding the formation of the intramolecular disulfide bonds. The higher α-helix content may contribute to better quality of the dough. The β-strands are generally considered to endow the protein with high elasticity and to improve the capability to resist distortion [51]. In this study, the secondary structures occured mainly in the N-and/or C-terminal domains of HMW-GS, in which 17 α-helixes and 6 β-strands were found in the N-terminal domains ( Table 3).
The only 9 α-helixes appeared in the C-terminal domains. This result indicates that the N-terminal domains contributes more to the elasticity and the capability to resist distortion of the dough than the C-terminal domains in the HMW-GS. Interestingly, the secondary structure mainly exists in the form of α-helixes in the LMW-GS, which is mainly concentrated in the C-terminal domains. A total of 73 α-helixes and 3 β-strands were found in the C-terminal domains and only 4 α-helixes appear in the N-terminal domains. Considering the distribution of the cysteine residues in gluten (Fig. 5), the Cterminal domains of HMW-GS may be more favorable for bonding with the N-terminal domains of LMW-GS to form the elastic backbone of the dough.

Plant materials
Wheat variety Xinmai 26 was provided by Xinxiang Academy of Agricultural Sciences and preserved in the Plant Germplasm Resources and Genetic Engineering Laboratory, Henan University, which was sown and harvested in the 2019-2020 crop season.
Experimental plot was 2 m in length with 3 rows and a row spacing of 30 cm, with 21 seeds per row. Common wheat cultivars 'Chinese Spring' (CS) (null, 1Bx7+1By8, 1Dx2+1Dy12) was used as standard wheat in determining HMW-GS and LMW-GS from Xinmai26. The 143 wheat materials detected by specific functional markers were listed in Table S1.

SDS-PAGE analysis
Glutenin was extracted according to the method described by Wan et al. [36].
Specimens were separated by SDS-PAGE with a discontinuous system of 4 % (w/v) stacking gel and 10 % (w/v) separating gel. 6 μL samples were separately loaded onto lanes of the gel and electrophoresed at approximately 15 ℃ and 12 mA for 16 h.

Glutenin extraction
Glutenin was prepared by the modified method of Zhang et al. [40]. Flour (15 mg) of mature seeds was violently shocked with 70% ethanol for 1 hour, which followed by centrifugation at 10,000 g for 10 min and removed the supernatant. The obtained sediment was extracted with 1 mL of 55 % (v/v) propanol at 65 °C for 30 min to remove gliadins. After centrifugation at 10,000 g for 10 min, the supernatant was removed.
These steps were repeated two times with the resulting pellet to remove gliadin completely. The resulting pellet was then extracted with 150 μL of extraction buffer (50% (v/v) propanol, 0.08 M Tris-HCl (pH 8.0) containing 1% (w/v) dithiothreitol (DTT)) at 65 °C for 30 min followed by centrifugation at 10,000 g for 5 min. Glutenins in the supernatant fraction were precipitated with 100 μL of cold acetone. Alternately, cysteine residues in the glutenins were alkylated by the addition of 150 μL of extraction buffer containing 1.

MALDI-TOF-MS analysis
Add 60 μL of 30% acetonitrile (ACN), 0.4 % trifluoroacetic acid (TFA) and 69.6 % H2O to dissolve the precipitation for 1 hour at room temperature. Sinapinic acid (SA) was used as matrix, which was dissolved in 50% ACN and 0.05% TFA (10 mg/ml). 1.00 GS/s, laser shots: 100, laser power: 85%, laser frequency: 80, and detector gain: 13.3X. Spectra were obtained in positive ion mode. All subunits showed good reproducibility with less than 0.05 % relative standard deviation (RSD) in three replicates. The mean molecular weights of three repeats were adopted in this study.

PacBio RNA sequencing and data analysis
The timing of anthesis and grain development were inspected and recorded after heading. The grain samples were similarly collected at the 15 DAF (days after flowering). Total RNA was extracted using TRIzol reagent (Invitrogen Corp., Carlsbad, CA), according to the manufacturer's protocol. RNA quality was detected on 1% agarose gels. RNA purity was measured using a nanophotometer (Implen, Inc., Westlake Village, CA, USA). The cDNA libraries (1-10 kb) were constructed and sequenced through PacBio Sequel II platform. Iso-seq reads were processed using the IsoSeq3 pipeline (https://github.com/PacificBiosciences/IsoSeq) with parameter " --min-passes 2". Due to the highly complex sequence structure of the glutenin gene, a custom pipeline was used to remove redundant sequences to acquire HMW-GS and LMW-GS genes in IsoSeq3, respectively (Fig. 1). Briefly, the nucleotide sequences of the published glutenin genes in NCBI were firstly removed redundant sequences utilizing CD-hit software with parameter "-c 0.9". The obtained non-redundant sequences were merged with those of the annotated glutenin genes in Chinese Spring to form query sequences.
Then, the Blat software [41] was used to search for full-length non-chimeric (FLNC) sequence generated by IsoSeq3 based on the query sequences, in which the sequences with coverage and identity greater than 70% were retained, error-corrected and removed redundant using IsoCon software [42]. The obtained sequences were aligned by Mafft software [43] and manually removed redundant again according to the support score acquired by IsoCon. Finally, the 5 transcript sequences from HMW-GS and 13 transcript sequences from LMW-GS were obtained, respectively.

DNA extraction and molecular cloning
Genomic DNA of wheat varieties were extracted according to the method reported by Su et al [44]. The cloning and sequencing of 1Dy10 gene from Xinmai 26 was conducted according to the method described by Wang et al [45]. The DNA sequencing was performed by the Beijing Genomics Institution (BGI, China). The nucleotide sequences for the target genes were analyzed from seven independent clones, in which the two clones were sequenced three times by Sanger sequencing.

Sequence alignment, phylogenetic analysis and secondary structure prediction
The open reading frame (ORF) of the target gene was translated into amino acid sequence using the NCBI ORF Finder program (http://www.ncbi.nlm.nih.gov). The alignment of sequences was carried out using the multiple sequence alignment software Clustal X 2.0 software [46]. Based on the multiple alignment of the full-length amino acid sequences (including signal peptide sequences) of HMW-GS and LMW-GS, the phylogenetic trees were respectively constructed by IQtree software using the ultrafast bootstrap approximation method [47]. The bootstrap values in the phylogenetic tree were estimated based on 1000 replications. Prediction of secondary structure of deduced amino acid sequences was carried out by PSIPRED server (http://bioinf.cs.ucl.ac.uk/psipred/).

Development of specific functional marker
The derived cleaved amplified polymorphic sequences (dCAPS) marker was developed The program for PCR amplification was as follows: initial denaturation at 94 °C for 3 min, 30 cycles of 94 °C for 25 s, 58 °C for 25 s, 72 °C for 15 s, and a final extension at 72 °C for 10 min. The PCR products were digested with restriction enzyme Sal I (Takara, Japan). The 10 μL reaction volume consisted of 5 μL PCR product, 1 U restriction enzyme and 1 × buffer. The reaction was performed at 37℃ for 3 hours, and the digested products were detected with 3% agarose gel.

Conclusion
This study provides an efficient method for accurately identifying glutenin genes in bread wheat through third-generation RNA sequencing. The Xinmai26 displays the optimal subunit combinations, in which HMW-GS is determined to be 1Ax1, 1Bx7 + 1By8, 1Dx5 + 1Dy10 and LMW-GS is determined to be Glu-A3c and Glu-D3a based on the specific spectrum peaks of MALDI-TOF-MS and their molecular weight. The type of Glu-B3 allele could not be determined due to low expression level in the mature grain. Moreover, a specific functional marker was developed to make better use of 1Dx5 subunit with the extra cysteine residue, which will play an important role in wheat quality breeding through molecular marker-assisted selection.

Conflicts of interest
The authors declare no conflict of interest.