Genome-Wide Identification and Analysis of Cell Cycle Genes in Betula pendula

Research Highlights: This study identified the cell cycle genes in birch that likely play important roles during plant growth and development. This provides a basis for understanding the regulatory mechanism of various cell cycles in Betula pendula. Background and Objectives: The cell cycle factors not only influence cell cycle progression together, but also regulate accretion, division and differentiation of cells, and then regulate growth and development of plant. In this study, we identified the putative cell cycle genes in B. pendula genome, based on the annotated cell cycle genes in A. thaliana. It could serve as a foundation for further functional studies. Materials and Methods: The transcript abundance was determined for all the cell cycle genes in xylem, root, leaf and flower tissues using RNA-seq technology. Results: We identified 59cell cycle gene models in the genome of B. pendula, 17 highly expression genes among them. These genes were BpCDKA.1, BpCDKB1.1, BpCDKB2.1, BpCKS1.2, BpCYCB1.1, BpCYCB1.2, BpCYCB2.1, BpCYCD3.1, BpCYCD3.5, BpDEL1, BpDpa2, BpE2Fa, BpE2Fb, BpKRP1, BpKRP2, BpRb1 and BpWEE1. Conclusions: We identified 17 core cell cycle genes in the genome of birch by combining phylogenetic analysis and tissue specific expression data.


Introduction
Many important life processes are closely related to mitosis in higher organisms. The regulation mechanism of eukaryotic cell division cycle is one of the hot topics in cell biology and molecular biology. Research on the regulation of plant cell cycle started later than that of mammals and yeast. Great progress has been made in the research of cell cycle in higher plant in recent years [1][2][3][4]. The procedure of cell cycle is the result of interaction between the gene expression and the external factors. The cell cycle in higher plant is strictly regulated in the course of its growth and development.
The concept of cell cycle was brought forward by Howard and Pelcin 1953, which was divided into the intermitotic phase (G1, S,and G2) and mitotic phase (M). Growth and development of plant depended on accretion, division and differentiation of cells, while cell cycle involved into these process. Recent studies have shown that, during regulation of hormone, nutriment substance and other growth signals, Cyclin D (CYCD) was expressed firstly, and binds to cyclin dependent kinase A (CDKA) to form a complex. The complex is activated by the action of CDK activating kinase (CAK) andcyclin-dependent kinase inhibitor (CKI) or KIP-related proteins (KRPs). The activated complex attenuates the inhibitory effect of retino blastomaprotein-related(RBR) and E2F (E2 factor) a-b/DP through phosphorylation, and release transcript factor E2Fa-b/DP [5]. While E2F/DPs could promote the expression of genes required for G1 conversion to S phase (DNA synthesis phase).After entering the S phase, CYCA binds to CDKA, and it was combined with CDK subgroup cyclin-dependent kinase subunit (CKS) and CYCB synthesized during the development to G2 phase. To remove the inhibitory phosphate group from the tyrosine phosphatase, activate the CDKB, and enter the M phase. At the end of M phase, cyclin proteins are hydrolyzed through the anaphas-promoting complex (APC) protein pathway, and exit the mitosis. A whole cell cycle is completed [6,7].
Since the cell cyclins have been found in sea urchins by Hunt in the 1980s, tremendous advances have been made in the molecular mechanisms of the cell cycle. This provides a positive direction for the study of tumors and other physiological diseases caused by cell cycle regulation [8]. The most significant molecular structure feature of cyclin is its conserved domain sequence, known as cyclin box, which consists of about 100 amino acid residues. The cyclin framework is the core structure of cyclin. During the cell cycle, specific cyclins rely on their own unique cyclin frames to recognize specific cyclin-dependent kinase (CDK), and form a complex with it, thus showing specific CDK kinase activity [9]. Many different cyclins have been found, which have different expression patterns in different organs, tissues, and cell types of various organisms [10].
Betula pendula is a pioneer boreal tree that can be induced to flower within one year [11,12]. As an important timber tree, understanding how cell cycle genes regulate growth and development in B. pendula will greatly contribute to its use in industrial production and ornamental. Fortunately, the genome sequence of B. pendula [11] has become available in the last few years, which can help us to accurately identify the genes related to cell cycle. In this study, we identified cell cycle genes that likely play a very important role during plant growth and development. This provides a basis for understanding the expression processes and regulatory mechanism of various cell cycles in B. pendula, and may serve as a foundation for further functional studies.

2.1.Identification of B. pendula cell cycle genes and physical and chemical properties analysis
The Betula pendula genome was used for the identification of the cell cycle genes according to the previous publication [13]. The putative cell cycle genes were first identified by BLASTP [14] with threshold of E-value less than 1e-5 using A. thaliana cell cycle genes as queries. In addition, all the genes were further manually examined using the Conserved Domain Database of NCBI [15] to confirm if they were correctly annotated. We then divided them into eight subgroups based on their functional type in A. thaliana. Then, we used ExPASy-ProtParam Tool (http://web.expasy.org/protparam/) to determine the physical and chemical parameters of cell cycle genes, including the number of amino acids, molecular weight and isoelectric point (PI).

Chromosome distribution of B. pendula cell cycle genes
According to the starting position of B. pendula cell cycle gene on the birch chromosome, use the TBtools software to determine the chromosome location image of the birch cell cycle gene.

Phylogenetic analyses of B. pendulacell cycle genes, Gene structure and Conserved sequence and specific motif analysis
To investigate the phylogenetic relationships of the cell cycle genes of B. pendula, aphylogenetic tree was constructed for every subgroup according to the previous publication [16]. We performed a multiple sequence alignment. Then, the phylogenetic trees of each subgroup were built using MEGA 5.05 with 500 bootstrap trials. Representative trees were selected using the Neighbor-Joining method.
In order to understand the structural diversity of B. pendula cell cycle genes, we performed exon/intron analysis. In order to understand the functional regions of birch cell cycle proteins and analyze the structural differences of birch cell cycle genes, we used the online software MEME(Multiple Em for Motif Elicitation, Version 5.4.1, http://meme-suite.org/tools/meme) to analyze birch cells. The conserved amino acid motifs of cyclins were analyzed. And use TBtools for visualization. The CDS sequence of Betula platyphylla was extracted from the genomic structure information of the genome(https://phytozome-next.jgi.doe.gov/report/gene/Bplatyphylla_v1_1), and its intron and exon structure were visualized with TBtools.

RNA-seq expression analysis of B. pendula cell cycle genes
To investigate the expression patterns of B. Pendula cell cycle genes in different tissues, transcriptome data (PRJNA535361) was downloaded from [13] from the public database of NCBI SRA. The clean reads of each sample were obtained by filtering out reads of low quality. All the clean reads were aligned to the B. pendula reference genome using bowtie2. The RNA-seq (RNA-sequencing) data were then analyzed using the RSEM (RNA-seq by Expectation-Maximization) pipeline [17] and the data were processed using a paired-end sequencing mode. RSEM could compute transcript abundance, estimating the number of RNA-seq fragments corresponding to each gene, Draw a heat map of the differential expression of cell cycle genes with the value of Log2 (TPM+1).

Identification of B. pendula cell cycle genes and physical and chemical properties analysis
The annotated genes in B. pendula genome were used to identify putative cell cycle genes, based on the annotated cell cycle genes in A. thaliana. In total, 59 gene models (Table 1) were identified as putative cell cycle genes in B. pendula genome. The 59 genes contain 15 cyclin-dependent kinases (CDKs), 2 cyclin-dependent kinase subunit (CKSs), 27 Cyclins (CYCs), 3 E2 factor (E2Fs), 2DPs, 2 DP-E2F-like (DELs), 4 KIP-related proteins (KRPs), 2 Rbs, and 2 WEEs, respectively. Among these cell cycle genes, CYC is the largest family that contains 27 members, while CKS、DEL、Rb and WEE are all the smallest families containing only two members. Rb and WEE are also the smallest families in A. Thaliana containing only one member. Analysis of protein characteristics showed that the size of the cell cycle gene protein ranges from 69 amino acids (Bpev01.c0457.g0045) to 1316 amino acids (Bpev01.c1113.g0001), and the relative molecular mass ranges from 7 kDa to 14 kDa. The predicted isoelectric point also varies greatly from 4.42 (Bpev01.c0579.g0010) to 9.69 (Bpev01.c1061.g0010), which indicates that different cyclins may work in different microenvironments. The detailed information of the protein molecular weight, isoelectric point and amino acid number of the gene family are shown in Table 1.   There are many regulators of cell cycle in plants, most of them have special serine/threonine protein kinase activity, because they bind to cyclins to function, and are named as cyclin dependent kinases (CDKs).According to their structural and functional similarities with animal and yeast CDKs and their conserved PSTAIRE domains that bind cyclins, plant CDKs were divided into 8 groups: CDKA, CDKB, CDKC,CDKD, CDKE, CDKF, CDKG and CDKLIKE [4,18]. In this study, we identified 5 groups: BpCDKA, BpCDKB, BpCDKC,BpCDKD, BpCDKE and BpCDKF. CDKA.1 plays a key role in the process of leaf cell division and differentiation and the development of leaf [19]. CDKB1.1 can lengthen hypocotyl cells, promote cotyledon cell development, and regulate stomatal development of Arabidopsis thaliana [2,20]. The mutation of CDKB2 has been shown to impact meristem seriously [21].
We identified 15 BpCDKs in the B. pendula genome, Constructed a phylogenetic tree for the BpCDKs (Figure 2a). reveal the evolutionary relationships within these groups. Seven different conserved domains and special motifs of BpCDKs protein were located using MEME tool (Figure 2c). All BpCDKs proteins contain at least one conserved amino acid motif. For example, BpCDKE1.1 only contains motif 2, while the rest of BpCDKs proteins contain 1, 2, and 3 conserved amino acid motifs. The conserved motifs of each BpCDKs protein branch are similar in composition, indicating that these members have a close evolutionary relationship [22]. In addition, most members of the BpCDKs protein contain motif 1, motif 2, motif 3, and motif. 6 conservative motifs, these motifs may have an important influence on the function of BpCDKs protein. The gene structure helps to further understand the gene family. In the BpCDK family, there are at most 13 introns (BpCDKC1.1 and BpCDKE1.2), and at least one intron (BpCDKC1.8 and BpCDKE1.1). Most genes in the BpCDKs family contain 7-8 introns (Figure 2b), and the fact that most members of the same subfamily share a similar exon/intron structure strengthens the observed phylogenetic distribution.

CYC
Monomeric CDKs have no kinase activity and must associate with regulatory proteins called cyclins to be activated. There is common molecular structure among various cyclins, which contain a rather conservative amino acid sequence called cyclin frame to mediate the binding to CDK and regulate the activity of CDK. In plant, cyclins can be grouped into M-cyclin (containing A-and Btype cyclins) and G1-specific cyclins (designated D-type cyclins). C-cyclin and H-cyclin have been confirmed, and only CYCH.1 could activate CDK [23].
All four types of cyclins known in plants were identified. A total of 27 BpCYCs genes were detected in the B. pendula genome,For B. pedula, nine A-type, six B-type, eleven D-type, and one Htype cyclins have been identified. And build an evolutionary tree. The MEME tool was used to locate five different conserved amino acid motifs of the CYC protein ( Figure 3c). All BpCYCs proteins contain at least one conserved amino acid motif. For example, BpCYCD3.4, BpCYCD3.2, and BpCYCD3.3 only contain motif 2, BpCYCA1.2 only contains motif 3, and most of the other BpCYCs proteins have Contains 1, 2, 3, and 4 conservative amino acid motifs, indicating that these motifs may have an important influence on the function of BpCYCs protein. It can be seen from Figure 3b that the BpCYCs family has a similar intron structure (Figure 3b). Finally, the intron-exon organization of the BpCYCs family is similar to that of Arabidopsis, This indicates that CYC is highly conserved in plants in an evolutionary manner.

CKS
CDK subunit (CKS) proteins act as docking factors that mediate the interaction of CDKs with putative substrates and regulatory proteins. There are two CDK subunit genes in Arabidopsis described previously [4]. In this study, we have identified two BpCKSs in the B. pendula genome. It can be seen that these two genes have the same motif, but their gene structures are quite different.

Rb and E2F/DP
RB regulates the expression of many essential genes in cell cycle progression by regulating the activity of E2F transcription factor. Only one Rb could be identified in the Arabidopsis genome [4]. We identified two BpRbs in the B. pendula genome. E2F transcription factors, composed of E2F and DP, play a decisive role in plant cell size control [24].We identified three BpE2Fs in the B. pendula genome, Two BpDPs were identified in the B. pendula genome, Because of phylogenetic positions of E2F and DP, they form a distinct class, which we designated DP-E2F-like (DEL).We identified two BpDELs in the B. pendula genome, Four groups had emerged based on the phylogenetic analysis ( Figure 5a). Through the analysis of conservative motifs, it can be seen that both E2F and DP families contain conservative motif 1 (Figure 5c), indicating that conservative motif 1 is highly conserved during evolution. Except for BpRb2 and BpDPa2, both intron and exon structures contain highly similar and numerous introns (Figure 5b).

KRP and WEE
The activity of CYC-CDK is also regulated by an inhibitory protein CKI (also known as KRP). Seven CKI genes belonging to the group of Kip/Cip CKIs have been described previously for Arabidopsis, designated KRP1 to KRP7 [25].In this study, we have identified four BpKRPs in the B. pendula genome. These four genes all have motif 1 (Figure 6c). BpKRP1 and BpKRP2 also contain the same motif 2, and both contain 3-4 introns (Figure 6b), and have similar structures.
CDK/cyclin activity is regulated negatively by phosphorylation of the CDK subunit by the WEE1 kinase and positively when the inhibitory phosphate groups are removed by the CDC25 phosphatase. Two BpWEEs were identified in the B. pendula genome, Their conserved motifs are similar in structure, while there are only two introns in BpWEE2. We applied quantitative criteria to assign genes that are likely to be cell cycle genes based on transcript abundance and specificity. The tissue specific expressional data include xylem, roots, leaves and flowers. We calculated the total expression of the 59 identified genes in xylem and selected 17 genes which have a high expression in leaves or xylem or flower (Figure 7). The 17 cell cycle genes were BpCDKA. 1 BpRb1is abundant in leaves BpRb1 is most similar to AT3G12280. ZmRb1 binds to D-type cyclins in plants, is highly expressed in differentiated cells, and regulates leaf development in temporal and spatial level [26].BpE2Fa and BpE2Fb are abundant in leaves. BpE2Fa and BpE2Fb are most similar to AT2G36010 and AT5G22220, respectively. Two BpDPs were identified in the B. pendula genome,of which BpDP2 is abundant in xylem , and this gene is similar to AT5G02470. BpDEL1 is abundant in leaves . This gene is similar to AT3G48160 in A. thaliana.
In the KRP family of birch, BpKRP1 was most abundant in the xylem and BpKRP2 also was expressed at a high level in the xylem . These two genes are most similar to AT2G23430 in A. thaliana. Moreover, BpKRP1 and BpKRP2 are also highly expressed in flower and leaves.BpWEE1 is abundant in leaves . This gene is similar to AT1G02970.

Discussion
Previous studies have identified many cell cycle genes [27], but the genetic and biochemical roles of the birch cell cycle genes needs to be better defined. In this study, we identified a total of 59 cell cycle genes in B. pendula, which should help clarifying the molecular mechanism of plant growth and development in B. pendula.
Plant cell cycle could be regulated by altered expression of some G1-S and G2-M checkpoints genes in cells [3].G1-S phase was one of the most important checkpoints among all the cell cycle, and CycD genes have been indicated as a sensor of extracellular growth condition [1]. Over expression of CycD3;1 in Arabidopsis thaliana could induce B-type cyclin expression, resulting in not only an increase in endoreduplication but also in mitosis [28]. A further study revealed that CYCLIN B1;2 was the mitosis promoting factor [29]. CYCLIN B1;2 expression can promote nuclear and cellular division, which is sufficient to trigger endoreduplication to mitosis, but not sufficient enough to increase cell cycle rounds [29]. In contrast with our results, BpCYCB1.1、BpCYCB1.2、BpCYCB2.1 and BpCYCD3.1 are highly expressed in leaves, and BpCYCD3.5 is abundant in flower and leaves (Figure 7). These genes with high expression levels in birch tissues contain CYCD3.1 and CYCB1.2, indicating that these two genes in birch may also play a very important role in cell division. Gene structure analysis found that the gene sequence structure of BpCYCs family members is similar (Figure 3b), indicating that their gene structure is highly conserved during evolution. Both the pistil cell death and stamen cell arrest are involved in cell cycle regulation in maize sex determination, CYCA, CYCB and CDK were highly expressed in the developing pistil and stamen, while Wee1 and CKI were only expressed in the arresting stamen [30]. In our study, part of genes was highly expressed in flower, such as BpCYCD3.5, BpCKS1.2, BpCDKA.1 (Figure 7). However, birch has unisexual flowers onseparate male and female inflorescences (catkins) [12,31,32]. How the cell cycle genes regulate the flower development process of birch needs our further research.

Conclusions
Cell cycle genes are closely related to all life activities of plants,We identified 17 core cell cycle genes in the genome of birch by combining phylogenetic analysis, gene structure analysis and tissue specific expression data, provide some help for better application of cell cycle genes and modern molecular breeding.