Genome wide identification of cotton C-repeat binding factor (CBF) and overexpression of Gthu17439 (GthCBF4) gene confer cold stress tolerance in Arabidopsis thaliana

Low temperature is a common biological abiotic stress in major cotton growing areas. Cold stress significantly affects the growth, yield and yield quality of cotton. Therefore, it is important to develop a more robust and cold stress tolerant cotton germplasms. Climate change and erratic climatic condition, plants have evolved various survival mechanisms, one of which induction of various stress responsive transcriptome factors, such as the C-repeat binding factor GthCBF4, which have been found to enhance cold tolerance in various plants. In this study detailed evaluation of the cotton C-repeat binding factor has been carried out. A total of29, 28, 25, 21, 30, 26 and 15 proteins encoded by the C-repeat binding factor were identified in G. herbaceum, G. arboreum, G. thurberi, G. raimondii, G. turneri, G. longicalyx and G. australe, respectively. Phylogeny evaluation revealed that the proteins were grouped into seven clades, with clade 1 and 6 being the largest. Moreover, majority of the proteins encoded by the genes were predicted to be located within. the nucleus, while some are distributed in other parts of the cell. Based on the transcriptome and RT-qPCR analysis, Gthu17439 (GthCBF4) was highly upregulated and was further validated through forward genetics. The Gthu17439 (GthCBF4) overexpressed plants showed a significantly tolerance to cold stress, with higher growth vigour compared to the wild types. The results showed that the Gthu17439 (GthCBF4) could be playing a significant role in enhancing cold stress tolerance in cotton and can be further exploited in developing a more cold stress tolerance cotton germplasm


Introduction
Cotton is a thermophilic crop and is more sensitive to low temperatures [1]; China being the major cotton growing countries globally, the site specific regions within China, such as Xinjiang, , is often affected by low temperature which significantly results in negative effects on plant growth and development [2]. Cold stress lead to inhibition of seed germination, reduction of plant growth and reproduction, as well as a decrease in crop yield and quality [3]. However, many crops, such as rice (Oryza sativa), maize (Zea mays), tomato (Solarium lycopersicuni), soybean (Glycine max) and cotton (Gossypium hirsutum), do lack the ability to adapt to low temperature environments and can only grow 3 of 17 The CBF Transcription factors have been widely identified and isolated from rice, tomato, Brassica napus, wheat, barley and maize, which show that the CBF family is large in scale and complex in structure [2,28]. It has been reported that AtCBF gene was overexpressed in Brassica napus transgenic plants, and improved the cold tolerance [2,29]. The combination of AtDREB1A gene with stress-induced RD29A promoter improved the tolerance in transgenic tobacco to drought and low temperature stress [30,31]. Through phylogenetic analysis of Arabidopsis CBFS and its orthologous genes in other plants, it was found that CBFS were highly conservative in phylogeny [16].
As an important oil and fiber crop, cotton has been planted in more than 70 countries and plays an important role in the global economy. However, cotton yield is often adversely affected by biotic and abiotic stresses. Therefore, studying the molecular adaptation mechanism of cotton stress resistance, improving the stress resistance of cotton is of great significance for improving cotton yield. 21 CBF genes have been cloned from G. hirsutum; they can be divided into GhCBF Ⅰ, GhCBF Ⅱ, GhCBF Ⅲ and GhCBF Ⅳ [24]. It provides useful clues for understanding the cold tolerance mechanism in cotton. However, due to the limited genome sequence, the expression profile of CBF family and its phylogenetic relationship with other plant CBF members are still unclear. In order to better understand the function and evolutionary relationship of CBF gene family in cotton, we analyzed the structural variation and evolution pattern of CBF family based on the genome-wide data of several cotton species, and explored the molecular mechanism of cold adaptation formation in G. thurberi. This study provides some ideas and reference for further research on the molecular mechanism of CBF gene regulating cold adaptation in cotton. The study of digging cotton endogenous genes from wild cotton as cold-tolerance-related genes and transferring them to tetraploid cultivars is beneficial to improve or enhance the genetic characteristics of existing varieties.

Identification of CBF family genes in the cotton genome
The availability of the whole sequences for the seven cotton species enabled us to identify the CBF proteins harbored in their genome. The Pfam domain PF00847 was used as the query to obtain the CBF proteins, and finally get 29 members of G. herbaceum, 28 members of G. arboreum, 25 members of G. thurberi, 21 members of G. raimondii, 30 members of G. turneri, G. longicalyx has 26 members and G. australe has 15 members. Three representative cotton species from these seven species were chosen for further detailed analysis: G. herbaceum, G. thurberi and G. australe. The CBF CDS length in G. herbaceum range from 306 bp to1, 230 bp. In G. australe，it is 429 bp to 1,077 bp. In the physiochemical properties analysis of the CBF proteins, the results show a great difference. For the CBF proteins obtained from the G. herbaceum, their molecular weights ranged from 11,241.88 (Table S1).

Phylogenetic analysis of cotton CBF gene family
In order to determine the phylogenetic relationship of CBF proteins, we constructed a phylogenetic tree by MEGA7.0, using the Neighbor-joining (NJ) method with minimal evolution and maximum parsimony. The CBF proteins were clustered into 7 clades and designated as clade 1 to 7 (Fig. 1). Clade 1 contains 61 CBF protein sequences at most, while clade 3 contains only 7 CBF amino acid sequences. Consistent with previous classification, all of the Arabidopsis CBFs were distributed among the clade 6 [2,22,29]. Except G. australe did not appear in clade 3, the other cotton species were distributed in all 7 groups.

Chromosomal mapping, Gene structure and C-terminal conserved motifs analysis
All the genes located on various chromosomes in the three cotton genomes, A, D and AD, and were named according to their position on the chromosome. In the A genome, G. herbaceum the CBF proteins were detected on 12 chromosomes, only Chr01 without CBF members. The most gene loci were detected in chromosome Chr05 and Chr07 ( Fig. 2A). In the D genome, chromosome Gthu_1, Gthu_8 and Gthu_9 harbored no genes, Gthu_5, Gthu_7, and Gthu_12 had more gene loci ( Fig 2B). Finally, in G. australe of the G genome, no CBF genes were found in chromosome G6, G9, G11 and G12, but the highest gene loci was only observed in chromosome G7 with 5 genes, respectively (  We employed MEME to detect conserved motifs in the CBF family. There are 10 conserved motifs distributed in each CBF family (Fig. 3). Almost all CBF proteins have the same 3 motifs, motifs 1, 2, and 4. Analyzing the arrangement of exons and introns can provide important insights into the evolution of gene families [9]. To study the exon/intron structure of CBF gene, the CDS and the genome sequence were compared. The results showed that most of G. herbaceum contains one exon, G. australe only has one exon, and G. thurberi either Contains 2 or more exons (Fig. 4A-D). Compared with phylogenetic analysis, most members in the same group have similar exon-intron structure and gene length.
In addition, through the online software Wolfsport (https://wolfpsort.hgc.jp/) we tried to determine the subcellular localization of the proteins encoded by the CBF genes. Among the three cotton species, the highest proportion of CBF protein is embedded in the nucleus The locate signals of all the CBF members in G. herbaceum and G. australe are predicted in the nucleus but in G. thurberi, only some are in the nucleus Except located in Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 12 January 2021 doi:10.20944/preprints202101.0216.v1 the nucleus, the distribution of other CBF family members is very different. The chloroplast, mitochondria, plasma membrane, vacuole membrane and chloroplast also have the locate signals. (Table S2) To study the exon/intron structure of CBF gene, the CDS and the genome sequence were compared. The results showed that most of G. herbaceum contains one exon, G. australe only has one exon, and G. thurberi either Contains 2 or more exons (Fig. 4A-D). Compared with phylogenetic analysis, most members in the same group have similar exon-intron structure and gene length.

RNA-seq analysis and RT-qPCR validation of the CBF genes under cold stress conditions
G. thurberi transcriptome data was used to analyze the expression patterns of 24 CBF genes under cold stress (Fig. 5A). Among them, only 17 genes are differentially expressed under cold stress. According to G. thurberi transcriptome data, 12 differentially expressed CBF genes were selected, and 15 and 13 genes were selected from the two cotton varieties G. herbaceum and G. australe. RT-qPCR was detected to analyze the expression pattern; most genes are up-regulated in the three cotton species (Fig. 5B). Among the 12 genes in G. thurberi, there are 10 genes whose expression levels are up-regulated, while only 2 genes are down-regulated, but the trend is generally consistent with the transcriptome data. In G. herbaceum, 9 genes were up-regulated, and 6 were down-regulated. In G. australe, 8 were up-regulated and 5 were down-regulated. Integrate the transcriptome data and RT-qPCR result; we selected a highly expressed gene GthCBF12.5 (GthCBF4) for further functional verification.

Analysis of subcellular localization results
The results showed that the control group showed green fluorescence signals on both the nucleus and cell membrane, while the fusion protein pCAM-BIA2300-eGFP-Flag-GthCBF4 only had green fluorescence signals in the nucleus (Fig. 6), indicating that the protein encoded by the gene was localized in the nucleus. Two higher expression lines OE-1 and OE-3 were selected for phenotypic identification (Fig. 7A). Two-week-old transgenic and wild-type Arabidopsis plants were selected for cold tolerance test. As Fig.7B shown, two overexpression lines returned to green under the restoration culture, with new leaves growing, while most of the wild-type had withered and turned yellow. the survival rates of the two transgenic overexpression lines reached 60% and 63%, while the wild-type was only 8% (Fig. 7C). The expression analysis of GthCBF4 was detected by real-time quantitative RT-qPCR after low temperature (-15 ℃) treatment for 0, 1, 3 hours (Fig. 7D). The gene GthCBF4 expression was significantly up-regulated in the overexpression lines, and the expression increased with the extension of treatment time. Trypan blue staining method was detected to reflect the cell damage under cold stress (Fig. 7E). Under normal conditions, the stained blue area on the leaves of the transgenic overexpression plants and wild-type plants was very small. While under cold treatment, the blue areas on transgenic leaves were significantly smaller than the wild-type, the color depth was also lighter.
DAB staining method was detected to reflect the accumulation of H2O2 in Arabidopsis leaves. the accumulation of H2O2 in the transgenic overexpression leaves and the wild-type were very low under normal conditions, and the production of brown matter was hardly seen. But after cold treatment, the brown area on the transgenic Arabidopsis leaves was obviously larger than the wild type, and the color depth was also deeper (Fig.  7E).

Overexpression plants and evaluation of physio-morphological traits under low temperature environment
As Fig. 8 A and B showed that no wild type lines could germinate after treatment, while the germination rates of the two overexpression lines are 16% and 7%, respectively. On the other hand, the root length of the two overexpression lines is significantly longer than the WT (Fig. 8C, D). We further evaluated known stress responsive genes such as the COR15A、 RD29A、 KIN1 and COR47 [26]. After treatment wild type and CBF triple mutants with low-temperature (-15°C) for 0h, 1h and 3h, the expression of COR genes was assessed by real-time quantitative RT-qPCR. All the four abiotic stress responsive genes were significantly upregulated in the overexpression plants under low-temperature (-15°C) stress conditions (Fig. 8E).

Discussion
Cotton is an important economic crop, but the production of cotton is affected by various abiotic stress factors, especially the yield under drought, salinity and cold stress. Although abiotic stress is a major challenge in cotton growth, there is no detailed study on CBF gene in cotton [32]. In previous studies, CBF family have been identified in cotton (Gossypium hirsutum) [24], wheat [33], lettuce [34], Brassica napus [35], Barley [36] and soybean [37], but there is few study in diploid cotton. In this work, genome wide identification, characterization and functional analysis of the proteins encoded by the cotton CBF genes was done, identified in the three cotton species, with 29, 25 and 15 CBF pro-Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 12 January 2021 doi:10.20944/preprints202101.0216.v1 teins in G. herbaceum, G. thurberi and G. australe, respectively. The results obtained showed that the proteins encoded by the CBF genes in cotton were higher compared to other plants such as lettuce with 14, brassica napus with 10 and soybean with 14 CBF genes, but less than hexaploid wheat with 65 CBF genes and barley with 20 CBF genes. Studies have found that Arabidopsis CBF family members have an AP2 domain, and each has a conserved amino acid sequence upstream and downstream of AP2. The upstream of AP2 is PKK/PKKPAGR (RAGRxxKFx ETRHP) and the downstream is DSAWR [38]. If the PKK/PKKPAGR mutation can inhibit the binding ability of CBF to the COR promoter of its downstream genes, thereby weakening the level of CBF regulation, this sequence is necessary for CBF to perform its transcription factor function. cotton genome contains a large and complex CBF subfamily, they contain conserved AP2/EREBP domains and CBF characteristics, indicating that cotton CBFs have similar function with other CBFs in dicotyledons [25]. Phylogenetic analysis showed that the CBF family was divided into seven groups, among these genes CBF1, CBF2 and CBF3 are all induced by cold stress in Arabidopsis thaliana. Therefore, we speculate that CBF genes may also respond to abiotic stress, especially to cold. Further analysis showed that the isolated CBF gene was highly expressed under cold stress, consistent with previous research results.
In plants, the transcriptional regulation of osmotic stress response mainly depends on two main cis-regulatory elements, which are related to stress response genes ABREs and dehydration response elements (DREs). DREs are mainly involved in ABA-independent pathways, and ABRE is responsible for detecting ABA-mediated osmotic stress signals. In our work, we found that the CBF genes in the three cotton species are rich in stress response genes (ABRE), dehydration response element (DRE) and low temperature response element (LTR). It is speculated that exogenous environmental stress can induce the expression of CBF gene through its response to cis-acting elements, and further improve the resistance of plants to environmental stress. The results of subcellular localization confirmed that the fusion gene pCAMBIA2300-eGFP-Flag-GthCBF4 was expressed in the nucleus of tobacco leaf epidermal cells, and the GthCBF4 protein was localized in the nucleus, which confirmed previous studies and was consistent with the results of bioinformatics analysis.
Overexpression of Arabidopsis CBF gene in other plant species or overexpression of CBF of other species in Arabidopsis has revealed the potential of the CBF genes in enhancing frost resistance [39]. Moreover, it shows CBF gene plays an important role in plant cold tolerance. Recently, the expression of CBF1 and CBF3 were down regulated with the RNAi and antisense technique, resulting in a 25-50% reduction in cold-treated plants [24]. In this study, the GthCBF4 was strongly up-regulated in cotton seedlings under low temperature treatment. The survival rate of GthCBF4 transgenic Arabidopsis thaliana plants was significantly improved after freezing treatment. From the trypan blue staining and DAB staining, it can be seen that the staining of overexpression plants should be lighter; indicating that overexpression of the GthCBF4 gene can reduce the damage to plants caused by freezing treatment. It strongly proves GthCBF4 is the key gene related to cold stress. According to the measurement results of the germination rate and root elongation , it can be seen that under the same cold treatment, the two overexpression lines and the wild type are significantly different, which further proves that the gene GthCBF4 can enhance the ability to resist cold damage. In generally, the expression level of these four COR genes is correlated with the freezing tolerance level. Proof of previous research COR15A, which encodes a chloroplast-targeted polypeptide, strengthens the cold resistance of chloroplasts [40]. Previous studies have confirmed that the CBF gene can induce the expression of COR47, RD29A and KIN1 genes to improve the cold tolerance of plants [41]. It can be seen that the expression levels of stress response genes in the two overexpression lines are significantly higher than the wild type. And with the extension of the freezing treatment time, the gene expression level is also increasing. The up-regulation of the stress responsive genes further confirmed that the overexpression plants ability to tolerate the effects of cold was significantly strength. In Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 12 January 2021 doi:10.20944/preprints202101.0216.v1 the past two decades, the transcriptional network of CBF signaling pathway has been extensively studied. Although many COR genes have been identified in genome-wide expression profiles, only about 10-25% of them are regulated by CBF, which means that more early cold regulated transcription factors are involved in improving cold tolerance. More research is needed to identify more transcription regulators and explore their potential relationships to increase our understanding of cold related transcription networks.

Identification of CBF family genes in the cotton genome
Wild diploid cotton species data was downloaded from CottonGen (https://www.cottongen.org/) to construct a local BLAST database. The Arabidopsis CBF protein sequences were used as probes to compare with the wild diploid cotton. The E-value threshold for BLASTP was set at 1e −10 to obtain the final dataset of CBF proteins. Then, the Pfam (http://pfam.sanger.ac.uk/search) and SMART (http://smart.embl-heidelberg.de/) databases were detected to confirm each predicted CBF protein sequence [42]. Redundant sequences and incomplete sequences were removed. The sequences of 10 Gossypioides kirkii and 9 Theobroma cacao CBF proteins were obtained from the Cot-tonGen (https://www.cottongen.org/) databases, respectively. In addition, physicochemical parameters including the molecular weight (MW) and isoelectric point (pI) of each gene product were calculated using compute the pI/Mw tool from ExPASy (http://www. expasy.org/tools/).

Sequence alignment and phylogenetic analysis of cotton CBF gene family
An alignment of multiple CBF protein sequences from A. thaliana, Gossypioides kirkii and Theobroma cacao was generated using the ClustalW program. A neighbor-joining analysis of the generated alignment was performed using the unweighted pair-group method with arithmetic mean algorithm to construct an unrooted phylogenetic tree. Bootstrap value was 1000, and other parameters were used by default value. The tree was visualized with MEGA 7.0 software [43].

Gene structure and C-terminal conserved motifs analysis
Structural information for the CBF genes includes chromosomal location and gene length. Exons and introns were predicted by comparing the coding sequences with genomic sequences. The conserved motif analysis of CBF protein sequence was predicted by MEME online software. Use the CDD (https://www.ncbi.nlm.nih. gov/cdd/) database to search for the conserved domain information of CBF, and use the TBtools mapping tool to draw the conserved domains [44].

Retrieval and analysis of promoter sequences
The 2000 bp sequence upstream of ATG were extracted the transcription start site of the CBF gene sequence, and submit the obtained sequence to the PlantCARE website (http://www.dna.affrc.go.jp/PLACE/signalscan.html). Identification of possible cis-acting elements in the promoter region is used to identify putative cis-regulatory elements in the promoter sequence [45]. In addition, we carried out the subcellular localization prediction of all the CBF proteins by an online tool WoLFPSORT (http://www.genscript. com/wolf-psort.html) [46].

RNA extraction and qRT-PCR analysis
The total RNA was extracted using EASYspin plus plant RNA kit (Aidlab, Biotech, Beijing, China), following the manufacturer's instructions. The Nanodrop 1000spectrophotometer was used to determine the quantity and quality of the RNA samples. The primers used for qRT-PCR were designed using primer premier 5 software for all genes using Primer Premier 5 Software (Table S3). The cotton GhActin gene, forward primer sequence 5'ATCCTCCGTCTTGACCTTG3' and reverse primer sequence 5'TGTCCGTCAGGCAACTCAT3'), was used as a reference gene for the analysis. Real-time PCR reactions were carried out in a final volume of 25 μl, using a SYBR Green Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 12 January 2021 doi:10.20944/preprints202101.0216.v1 master mix and an ABI7500 thermal cycler (Applied Biosystems, Foster City, CA, U.S.A.), following manufacturer's instructions. The analysis of each sample was done; three technical replicates and biological replicates of each sample were taken for the analysis. Each sample was analyzed in triplicate. Three biological replications of the experiment were carried out.

Plant material
The seeds of G. herbaceum, G. thurberi and G. austral (Depilation and cut outing seed shell as pretreat) germinated in sand at 25°C for 4 days. Then seedlings were transferred to the hydroponic facility equipped with Hoagland nutrient solution [47]. The greenhouse conditions were 28°C during the day/25°C at night, the photoperiod was 16 hours, and the relative humidity was 60-70%. In the three-leaf stage, cotton seedlings were kept at 4°C under normal light, and then the leaves were harvested under cold treatments of 0, 0.5, 3, 6, 12 and 24 h. Each treatment was repeated three times. The leaf samples were immediately collected in liquid nitrogen, frozen and stored at -80°C until RNA extraction.

GthCBF4 subcellular localization analysis
To explore the subcellular localization of the GthCBF4 gene, a pCAM-BIA2300-eGFP-Flag-GthCBF4 fusion vector of CBF and GFP was constructed and transiently expressed in the epidermal cells of tobacco leaves, which was driven by the 35s promoter and transformed Agrobacterium LBA4404 competent cells. In addition, Agrobacterium competent cells expressing only the GFP gene were used as a negative control. Four-week-old tobaccos cotyledon flat leaves were selected for infusion, and cultivated in dark for 24h-36h after infiltration, and fluorescence observation was performed under a laser confocal microscope.

Functional verification of GthCBF4 in Arabidopsis Screening of Transgenic Arabidopsis
To explore the function of GthCBF4 gene，this study constructed a pBI121-GthCBF4 recombinant vector and transformed into the promoter cells of A. tumefaciens GV3101 by adopting a freeze-thaw method [48,49]. The wild-type Arabidopsis thaliana was soaked by the dipping method，and used 50 mg/mL kanamycin for positive selection until the T3 generation. In the T2 generation seedlings, the expression lines of gene GthCBF4 were screened by RT-qPCR, and two high-expressing transgenic lines were obtained.

Phenotypic identification, physiological and biochemical parameters of WT and overexpressed plants under low temperature stress
The transgenic and wild-type Arabidopsis thaliana grown in plastic bowls for two weeks and their growth is basically the same are selected for treatment. The Arabidopsis thaliana was placed in an environment of -15°C for 3 hours. After the treatment, the treated plants were moved to a 4°C light incubator to thaw for 4 hours, and finally they were cultivated under normal light conditions at 23°C. After 7 days, take pictures and count the survival rate of the plants. For each tissue, at least three independent biological replicates were carried out. The germination rate and root length was determined under cold stress; a t-test was used to verify the significance of the difference in root length between the mutant and the wild types. The trypan blue staining method was adopted to reflect the cell damage of transgenic Arabidopsis under cold stress. Moreover, the DAB staining was applied to reflect the accumulation of peroxidase in transgenic Arabidopsis under cold stress. DAB staining was performed using DAB chromogenic kit (Nanjing Jiancheng Bioengineering Institute, Nanjing, China).

Conclusions
In conclusion, through the family analysis of diploid wild cotton CBF, we determined the distribution of CBF proteins with 29 in G. herbaceum, 25 in G. thurberi and 15 in G. australe. According to the results of RNA-seq and RT-QPCR, a highly up-regulated gene GthCBF4 under cold stress was selected. Combined with the functional verification test of overexpression of Arabidopsis thaliana, it is proved that GthCBF4 plays an important role in improving the cold resistance of G. thurberi. This result provides a solid foundation for further research on the molecular function of GthCBF4 protein in cotton, and provides ideas for further improving the role of CBF gene in the cold stress regulatory network.
Supplementary Materials: Table S1. Physiochemical properties of the CBL genes. Physiochemical properties and cis-regulatory element analysis of the proteins encoded by the cotton CBF genes. (A-D). GC content, exon number, mean exon length and mean intron length. (E). Cis-regulatory elements obtained for the various proteins encoded by the IQD genes in the G. herbaceum, G. thurberi and G. australe.; Table S2. Subcellular localization of the proteins encoded by the CBL genes; Table S3: List of primers for RT-qPCR analysis