1. Introduction
Cotton (Gossypium spp.) is not only one of the most important natural fiber crops in the world but also an ideal material for studying genome evolution, polyploidization, and cell elongation [
1]. 35% of the world's fibers come from cotton [
2]. There are approximately 50 species in the genus Gossypium [
3], and the long-domesticated tetraploid cotton,
Gossypium hirsutum, has a large yield potential and wide adaptability, with over 95% of cotton cultivation area planted with this species [
4]. With global climate warming, freshwater scarcity, extreme weather such as high temperatures, and frequent droughts, drought has become a significant factor limiting cotton production[
5]. Drought stress affects cotton yield and quality by altering metabolic activities and biological functions [
6]. Ul-Allah et al. [
7] reported that drought stress on cotton fiber development can lead to a yield loss of approximately 45%, and Abdelraheem et al. [
8] found that water deficit during flowering reduces fiber strength, increases short fiber content, and lowers quality. Therefore, identifying drought-resistant genes in cotton, elucidating the molecular mechanisms of drought tolerance, and breeding drought-resistant cotton varieties are of significant importance for the textile industry.
Plants respond to drought stress through morphological, physiological, and biochemical changes [
9]. They primarily resist drought stress by osmotic regulation, reactive oxygen species (ROS) scavenging, and maintaining cell membrane stability [
10]. During this process, changes in the levels of certain substances are closely related to drought tolerance and can serve as important indicators for evaluating drought resistance in plants. When plants are subjected to drought stress, lipid peroxidation occurs, and malondialdehyde (MDA), the final product of lipid peroxidation [
11], reflects the extent of damage to the plant cell membrane caused by environmental stress. Therefore, drought-resistant cotton varieties accumulate lower levels of MDA under drought conditions compared to sensitive varieties [
12]. Proline (Pro) is a key osmotic regulator in plants [
13]. When plants face drought stress, they produce large amounts of proline by upregulating the expression of the synthesis enzyme gene P5CS [
14], maintaining cell osmotic potential to resist drought stress. Superoxide dismutase (SOD) can scavenge reactive oxygen species (ROS) generated during drought stress [
15], reducing lipid peroxidation levels. Chlorophyll is an essential substance for photosynthesis in plants. Under drought stress, damage to the thylakoid structure of chloroplasts leads to a decrease in chlorophyll content (CHI), weakening photosynthesis [
16]. Therefore, we selected MDA, Pro, SOD, and chlorophyll content (CHI) as indicators to reflect the extent of drought stress in cotton.
RNA sequencing (RNA-seq) is a method for sequencing all mRNA transcribed from a species under specific conditions to detect gene expression levels [
17]. With the development of high-throughput sequencing technology, RNA-seq is increasingly used to identify key genes involved in plant responses to environmental stresses [
18]. In the study of cotton drought resistance, Zheng et al. [
19] conducted PEG 6000 simulated drought stress on the drought-resistant material Xinluzhong 82 and the drought-sensitive material Kexin No. 1, and identified two candidate genes, GhGABA-T and GhAO, which are closely related to cotton drought stress through whole transcriptome sequencing. Liu et al. [
20] performed transcriptome analysis on
Gossypium hirsutum under drought stress and identified that the expression of GhMYB102 significantly upregulated after drought stress. They confirmed through molecular biology methods that GhMYB102 enhances cotton drought resistance by participating in ABA synthesis or regulating the expression of drought-related genes. However, current studies on the cotton drought-resistant transcriptome mostly focus on the screening of differentially expressed genes, lacking a systematic analysis of the gene co-expression network. Using Weighted Gene Co-expression Network Analysis (WGCNA), tens of thousands of genes can be divided into functional modules according to their expression trends [
21], and the association between the modules and the drought-resistant phenotype can be established to reveal the key regulatory hubs (Hub genes) and their interaction relationships in cotton drought resistance.
Currently, there is limited research on cotton drought resistance genes through physiological indicators under drought stress, RNA-seq, and WGCNA. This study used multi-timepoint drought stress treatments on TM-1 (drought-sensitive) and Jin (drought-resistant) at the flowering and boll stage, dynamically monitoring the changes in physiological indicators such as MDA, Pro, SOD, and CHI in cotton leaves in response to drought stress. RNA-seq was used to construct gene expression profiles at different stress stages, WGCNA was employed to identify gene modules significantly associated with key physiological indicators of drought resistance, and Cytoscape was used for network visualization [
22]. Finally, we identified core regulatory genes responsive to cotton drought stress. This study lays the foundation for further understanding the molecular mechanisms of cotton drought resistance and developing new drought-resistant cotton varieties.
2. Experimental Methods
2.1. Planting and Treatment
TM-1 (sensitive to drought) and Jin (drought-resistant) were planted in 2024 in Huoyanghe City, Xinjiang Uygur Autonomous Region (44°20′N–47°04′N, 83°51′E–85°51′E). The experimental field was arranged with one film covering three rows (1 film 3 rows) and one film for planting (1 film 1 planting), each row 2 meters long. Before the flower-boll stage, manage according to normal irrigation and normal precipitation. In the boll stage of upland cotton, stop watering the drought treatment area until harvest. Samples were taken from the third leaf from the top of the plant at 0, 7, 10, 15, and 25 days after drought stress, wrapped in foil, and stored in cryotubes. Immediately after collection, the samples were placed in liquid nitrogen and stored. Each group was repeated three times, and the samples were sent to NovoZhiyuan Ltd. in Tianjin for library construction and sequencing.
2.2. Physiological Indicator Detection
Leaves were sampled from both "TM-1" and "Jin" materials at various time points: pre-stress (0d), 7 days (7d), 10 days (10d), 15 days (15d), and 25 days (25d) after natural drought treatment in the field. The leaves were then immediately placed in liquid nitrogen for storage. According to the instruction manual of the physiological indicator testing kit from Beijing Box Biotech Co., Ltd., chlorophyll content was measured using the plant chlorophyll content detection kit (Catalog No. KPL003M). Absorbance at 663 nm and 645 nm was measured with a microplate reader. Proline (PRO) content was measured using the proline detection kit (Catalog No. AKAM003M), with absorbance at 520 nm. Malondialdehyde (MDA) content was determined with the malondialdehyde detection kit (Catalog No. AKFA013M), measuring absorbance at 450 nm, 532 nm, and 600 nm. Superoxide dismutase (SOD) activity was measured with the superoxide dismutase activity detection kit (Catalog No. AKAO001M-50S), using absorbance at 560 nm to calculate SOD enzyme activity. Each treatment had three biological replicates.
2.3. Library Construction and Quality Inspection
Total RNA was extracted as the starting material for library construction. mRNA with a polyA tail was enriched using Oligo(dT) magnetic beads. The mRNA was then fragmented using divalent cations in Fragmentation Buffer. Using the fragmented mRNA as a template, cDNA was synthesized using random oligonucleotides as primers in an M-MuLV reverse transcription system. The RNA strand was degraded by RNaseH, and the second cDNA strand was synthesized in a DNA polymerase I system with dNTPs as substrates. The purified double-stranded cDNA underwent end repair, A-tailing, and ligation of sequencing adapters. AMPure XP beads were used to select cDNA fragments of approximately 370–420 bp in length. PCR amplification was performed, and the PCR product was purified again using AMPure XP beads. The final library was constructed. After library construction, preliminary quantification was carried out using the Qubit 2.0 Fluorometer. The library was diluted to 1.5 ng/µl, and the insert size was analyzed using an Agilent 2100 Bioanalyzer. Once the insert size met expectations, qRT-PCR was used to precisely quantify the effective concentration of the library (effective concentration above 1.5 nM) to ensure library quality.
2.4. Sequencing
Once the library passed quality control, different libraries were pooled according to their effective concentrations and the required sequencing output. Illumina sequencing was then performed to generate 150 bp paired-end reads. The sequencing principle is Sequencing by Synthesis (SBS). In the sequencing flow cell, four fluorescently labeled dNTPs, DNA polymerase, and adapter primers are added for amplification. As the complementary strand is extended during each sequencing cluster, the incorporation of a fluorescently labeled dNTP releases a corresponding fluorescence signal. The sequencer captures these fluorescence signals and converts them into sequencing peaks through software, which are then used to derive the sequence information of the target fragment.
2.5. Data Quality Control
The image data of sequencing fragments obtained from the high-throughput sequencer is converted into sequence data (reads) through CASAVA base calling, and the file is in fastq format, which mainly contains the sequence information of the sequencing fragments and their corresponding quality information. The raw data obtained by sequencing contains a small number of reads with sequencing adapters or low sequencing quality. To ensure the quality and reliability of data analysis, the raw data needs to be filtered. This mainly involves removing reads with adapters, reads containing N (N indicates undetermined base information), and low-quality reads (reads where more than 50% of bases have a Qphred score ≤5). At the same time, the clean data is evaluated for Q20, Q30, and GC content. Subsequent analyses are all performed based on the clean data for high-quality analysis.
2.6. Sequence Alignment to the Reference Genome
2.7. Gene Expression Quantification
Feature Counts (1.5.0-p3) [
25] was used to calculate the number of reads mapped to each gene. Then, the FPKM of each gene was calculated based on gene length, and the number of reads mapped to the gene was computed. FPKM refers to the expected number of transcript sequence fragments per kilobase of transcript length per million base pairs of sequencing. It also considers the effects of sequencing depth and gene length on read counts and is currently one of the most widely used methods for estimating gene expression levels.
2.8. Differential Expression Analysis
For samples with biological replicates, DESeq2 software [
26] was used to perform differential expression analysis between two comparison groups. DESeq2 provides statistical programs for determining differential expression in count-based gene expression data using a model based on the negative binomial distribution. Genes with an adjusted P-value ≤ 0.05 discovered through DESeq2 were classified as differentially expressed. Before conducting differential gene expression analysis, the read counts for each sequencing library were adjusted using a normalization factor via the edgeR package [
27]. Differential expression analysis between two conditions was performed using the edgeR package (3.22.5). The P-value was adjusted using the Benjamini & Hochberg method [
28], with the corrected P-value and |log2 fold change| used as thresholds for significant differential expression.
2.9. Differential Gene Enrichment Analysis
Differentially expressed gene GO enrichment analysis was performed using the clusterProfiler (3.8.1) software [
29], with gene length bias corrected. GO terms with a corrected P-value less than 0.05 were considered significantly enriched by differentially expressed genes. KEGG is a database resource used to understand the high-level functions and utilities of biological systems, particularly from molecular-level information such as large-scale molecular datasets generated by genome sequencing and other high-throughput databases, including systems such as cells, organisms, and ecosystems. We used clusterProfiler (3.8.1) software to analyze the statistical enrichment of differentially expressed genes in KEGG pathways.
2.10. WGCNA Analysis
Weighted Gene Co-expression Network Analysis (WGCNA) is a systems biology approach used to describe gene association patterns between different samples. It can identify highly co-expressed gene sets and, based on the internal connectivity of the gene set and its association with phenotypes, identify candidate biomarker genes or therapeutic targets. The R package WGCNA is a functional collection used for performing various weighted association analyses. It can be used for network construction, gene selection, gene cluster identification, topological feature calculation, data simulation, and visualization. Cytoscape (Cytoscape_v3.9.1) was used to visualize the gene modules.
2.11. qRT-PCR Validation
Drought-tolerant material Jin and drought-sensitive material TM-1 were selected as the extreme materials for candidate gene qRT-PCR analysis. Leaf samples were collected from field-grown cotton under natural drought conditions at 0d, 7d, 10d, 15d, and 25d during the flowering and bolling stage. Total RNA was extracted using the RNAprep Pure Polysaccharide Polyphenol Plant Total RNA Extraction Kit (Tiangen, Beijing, China), and cDNA was synthesized using a reverse transcription kit (Tiangen). Cotton Ubiquitin7 (UBQ7) was chosen as the reference gene. Based on the reaction system provided by the Abm BlasTaqTM 2X qPCR Mix (dye) manual, the running program was set, and the relevant genes were amplified using the Applied BiosystemsTM 7500 Fast Real-Time PCR Systems (3 replicates). Relative expression levels were analyzed using the 2-ΔΔCt method. Finally, GrandPad Prism version 10 [
30] was used for data visualization.
3. Results
3.1. Identification of Drought Resistance of Jin and TM-1
To further compare the drought resistance levels between TM-1 and Jin, the physiological parameters of TM-1 and Jin under different drought stress conditions were measured. After drought stress, the malondialdehyde (MDA) content in TM-1 leaves significantly increased with the intensification of drought stress, while the increase in MDA content in Jin leaves was relatively smaller. Chlorophyll (CHI) and superoxide dismutase (SOD) content showed significant differences between TM-1 and Jin after 15 days of drought stress. Proline (Pro) significantly increased in the leaves of both TM-1 and Jin under drought stress, and the Pro content in Jin leaves was significantly higher than in TM-1 with increasing drought stress. These results indicate that there are significant differences in drought resistance between Jin and TM-1. To further investigate the molecular mechanisms and candidate genes responsible for these differences in drought resistance, RNA-seq analysis was performed on leaf samples from Jin and TM-1 at five time points (0 d, 7 d, 10 d, 15 d, and 25 d) under drought stress.
Figure 1.
(A) Phenotypic photographs of Jin under drought stress. (A) Phenotypic photographs of Jin under drought stress. (C) Statistical analysis of MDA, CHI, SOD, and Pro content in TM-1 and Jin under different drought stress conditions. Significant differences were determined using one-way ANOVA (*, **, *** represent P-values less than 0.05, 0.01, and 0.001, respectively).
Figure 1.
(A) Phenotypic photographs of Jin under drought stress. (A) Phenotypic photographs of Jin under drought stress. (C) Statistical analysis of MDA, CHI, SOD, and Pro content in TM-1 and Jin under different drought stress conditions. Significant differences were determined using one-way ANOVA (*, **, *** represent P-values less than 0.05, 0.01, and 0.001, respectively).
3.2. RNA-Seq Data Analysis
In this study, 204.43 Gb of clean data were generated from 30 cotton leaf samples of TM-1 and Jin under drought stress at the flowering and boll-setting stages. Each sample had an average of 6.81 Gb of clean data, with Q30 base percentages exceeding 97.10%. Furthermore, the alignment rate of the reads to the reference genome ranged from 90.23% to 96.00%, with an average alignment rate of 94.60%. The Pearson correlation coefficient between the three biological replicates of the same sample showed correlations over 0.90 (
Figure 2A). PCA results indicated that biological replicates clustered together, and samples without drought stress clustered together, while samples under moderate drought stress (7d and 10d) clustered together, and samples under severe drought stress (15d and 25d) clustered together, suggesting that the transcriptomic data were highly reliable and reproducible (
Figure 2B).
3.3. Differentially Expressed Genes (DEGs) Analysis
To uncover the molecular mechanisms behind the drought resistance differences between different cotton varieties, we identified differentially expressed genes (DEGs) under drought stress using stringent thresholds (FDR < 0.05, |log2FC| ≥ 1). As shown in
Figure 3A, the sensitive variety TM-1 exhibited 13,439 DEGs (6,687 upregulated/6,752 downregulated) during the early stress stage (0-7d), while the drought-resistant variety Jin maintained a high response level during prolonged stress (15-25d) with 6,204 DEGs (2,409 upregulated/3,795 downregulated). Notably, Jin showed a significantly higher number of DEGs (2,927) during the mid-stress stage (7-10d) compared to TM-1 (469,
P < 0.01), suggesting that Jin may possess the characteristic of continuously activating drought resistance pathways. Venn analysis revealed that out of 25,123 DEGs, 12,406 were common genes, and the number of unique genes in Jin (8,544) was 2.05 times that of TM-1 (4,172). This difference may explain the divergence in drought resistance between the varieties: Jin maintains cell homeostasis by continuously regulating the synthesis of osmotic protectants (e.g., trehalose, proline), while TM-1 response mechanism is limited to the early stage of stress.
3.4. KEGG and GO Enrichment Analysis of Differentially Expressed Genes Shared by Jin and TM-1.
To explore the coordinated response mechanisms of the drought-resistant material Jin and the drought-sensitive material TM-1 under drought stress, we performed a systematic analysis of the 12,406 common DEGs. Through temporal expression pattern clustering, DEGs were divided into four characteristic modules (
Figure 3A): Cluster 1 genes were lowly expressed under no stress and significantly upregulated as stress duration increased (15-25d); Cluster 2 and Cluster 4 genes were highly expressed under no stress, but exhibited TM-1-specific (Cluster 2) and Jin-specific (Cluster 4) expression advantages; Cluster 3 genes were activated early in the stress period and maintained a stable expression level throughout the stress stages. KEGG enrichment analysis revealed that DEGs mainly participated in metabolic regulation (
Figure 3B), including lipid metabolism (fatty acid elongation, glycerophospholipid metabolism), carbohydrate metabolism (fructose/mannose metabolism), amino acid metabolism (alanine/aspartate/glutamate metabolism), secondary metabolism (steroid/carotenoid biosynthesis), and transport processes (ABC transporter proteins). Notably, stress response pathways closely related to ascorbate metabolism, sulfur metabolism, and redox regulation were significantly enriched. GO analysis (
Figure 3C) showed that biological processes primarily involved amino acid biosynthesis (cell/α-amino acids/sulfur-containing amino acids), polysaccharide metabolism (glucan synthesis), and catabolism (aromatic amino acids); cellular components enriched in structures related to genetic information regulation (chromatin/nucleosome) and stress response sites (cell wall/extracellular matrix); molecular functions were prominent in metabolic catalytic activities (isomerases/transferases/oxidoreductases) and molecular binding capabilities (unfolded proteins/ssDNA). These findings systematically reveal that plants may adapt to drought stress through a multidimensional molecular regulatory network.
3.5. Jin and TM-1 Specific Differentially Expressed Genes KEGG and GO Enrichment Analysis
Through functional analysis of drought-related differentially expressed genes (DEGs) in Jin and TM-1 cotton varieties, it was found that the 8,544 DEGs specifically expressed in Jin (|log2FC| >1, FDR <0.05) were dynamically divided into four clusters based on drought stress duration: Cluster 1 (high expression at 0 days control), Cluster 2 (high expression at 15 days drought), Cluster 3 (high expression at 25 days drought), and Cluster 4 (high expression at 7 and 10 days drought). The KEGG pathways of these DEGs were significantly enriched in lipid metabolism (ether lipids, alpha-linolenic acid metabolism), stress response signaling (MAPK, phosphoinositide signaling), and secondary metabolism (isoquinoline alkaloid, sesquiterpene biosynthesis), while GO functions were involved in microtubule binding, jasmonic acid biosynthesis, and outer mitochondrial membrane. In TM-1, among the 4,172 specific DEGs, Cluster 2 (1,870 genes) was highly expressed under control conditions and suppressed under drought stress. The metabolic characteristics of this cluster were focused on energy metabolism (TCA cycle, glycolysis), DNA repair (non-homologous end joining), and ion channel activity. The comparison between the two varieties indicated that Jin might maintain membrane stability through lipid metabolism and microtubule-related mechanisms, while TM-1 relies on energy supply and genomic stability to cope with drought, revealing the molecular differentiation of drought resistance strategies between them.
3.6. TF Analysis
We submitted the genomic sequences of the 25,123 identified DEGs to the PlantTFDB database (
http://planttfdb.cbi.pku.edu.cn/) for transcription factor prediction, and a total of 1,551 transcription factors were identified. These transcription factors were classified into different families, including MYB, ERFs, bHLHs, C2H2s, NAC, and WRKYs (
Figure 4A). To further understand the function of these 1,551 transcription factors, we used the k-means clustering algorithm to cluster the 1,551 differentially expressed transcription factors into 4 clusters, and then annotated the functional categories of each cluster using KEGG (
Figure 4B).
Cluster 1 was highly expressed without drought stress and showed a decrease in expression under drought stress. It was mainly annotated in nitrogen metabolism, isoquinoline alkaloid biosynthesis, betaine biosynthesis, glycine, serine, and threonine metabolism, and tyrosine metabolism. Cluster 2 was highly expressed under drought stress at 15 days and was mainly annotated in linolenic acid metabolism, MAPK signaling pathway, N-glycan biosynthesis, galactose metabolism, and brassinosteroid biosynthesis. Cluster 3, in contrast to Cluster 1, was lowly expressed without drought stress but showed an increase in expression under drought stress. It was mainly annotated in ribosome biogenesis in eukaryotes, RNA polymerase, RNA degradation, valine, leucine, and isoleucine degradation, and nucleotide excision repair. These results suggest that transcription factors coordinate responses to drought stress through the regulation of lipid metabolism, osmotic regulation (such as betaine), and signaling pathways (such as MAPK). These results suggest that transcription factors coordinate responses to drought stress through the regulation of lipid metabolism, osmotic regulation (such as betaine), and signaling pathways (such as MAPK).
Figure 6.
(A) Proportion of differentially expressed transcription factor families. (B) Expression patterns and functional enrichment analysis of differentially expressed transcription factors. The KEGG annotation results for each cluster are shown on the right, displaying the top 5 pathways with the smallest p-values.
Figure 6.
(A) Proportion of differentially expressed transcription factor families. (B) Expression patterns and functional enrichment analysis of differentially expressed transcription factors. The KEGG annotation results for each cluster are shown on the right, displaying the top 5 pathways with the smallest p-values.
3.7. WGCNA Analysis
To study the gene regulatory network involved in cotton under drought stress, we constructed a co-expression network of 25,123 differentially expressed genes (DEGs) in the cotton lines using weighted gene co-expression network analysis (WGCNA) (β soft-thresholding value of 7, scale-free R2 > 0.8). Thirteen co-expression modules were obtained, with different colors representing different modules. The correlation between the modules and physiological indicators was calculated, and two significantly highly correlated modules were identified: the blue module, which was significantly correlated with proline (Pro) (r = 0.81, p < 0.01), and the yellow-green module, which was significantly correlated with MDA (malondialdehyde) (r = 0.86, p < 0.01). The hub genes within the blue and yellow-green modules were selected using Cytoscape, and the three genes with the highest connectivity were identified as hub genes. A total of six hub genes were obtained, and the top 100 connections within the blue and yellow-green modules were visualized using Cytoscape based on weight values. Among the six hub genes, Gh_A08G154500 encodes a WRKY transcription factor, Gh_A05G348000 encodes a bromodomain-containing protein BET subfamily transcription factor, Gh_A11G336600 encodes protein kinase C conserved region 2 (CalB), Gh_A12G140900 encodes a cotton sugar and high-affinity sucrose synthase as well as a sucrose and Gol-specific galactoside hydrolase activity protein, Gh_A11G122800 encodes a drought-related protein PCC13-62, and Gh_A11G283900 encodes a protein similar to the antifungal chitin-binding protein hevein from rubber tree latex.
Figure 7.
(A) Hierarchical clustering dendrogram of WGCNA modules; different modules are represented by different colors. (B) Heatmap of the correlation and significance between samples and modules. (C) Gene expression pattern of the blue module. (D) Gene expression pattern of the yellow-green module. (E) Gene interaction network of the blue module. (F) Gene interaction network of the yellow-green module.
Figure 7.
(A) Hierarchical clustering dendrogram of WGCNA modules; different modules are represented by different colors. (B) Heatmap of the correlation and significance between samples and modules. (C) Gene expression pattern of the blue module. (D) Gene expression pattern of the yellow-green module. (E) Gene interaction network of the blue module. (F) Gene interaction network of the yellow-green module.
3.8. qRT-PCR
qRT-PCR was used to detect the expression patterns of these six candidate genes in the leaves of TM-1 and Jin under different drought stress conditions. The expression levels of Gh_A11G122800 and Gh_A11G283900 significantly increased after 7 days of drought stress, while the expression levels of Gh_A08G154500, Gh_A05G348000, Gh_A11G336600, and Gh_A12G140900 significantly increased after 15 or 25 days of drought stress. Furthermore, there were significant differences in the expression levels of these six candidate genes at the same time point after drought stress between TM-1 and Jin, indicating that the expression patterns of these six candidate genes were different under drought stress. The results also suggested that these genes participate in the regulation of cotton drought stress through different expression patterns. Moreover, the expression trends of these six candidate genes under drought stress were consistent with the transcriptome data, further confirming the reliability of the transcriptome sequencing results.