Predictive value of CCNB1, BUB1B and TTK in the progression and prognosis of lung adenocarcinoma Running title: Predictive value of CCNB1, BUB1B and TTK

Lung cancer predominates in cancer-related deaths worldwide, with lung adenocarcinoma (LUAD) being a common histological subtype of lung cancer. The aim at this study was to search for biomarkers associated with the progression and prognosis of LUAD. We have integrated the expression profiles of 1174 lung cancer patients from five GEO datasets (GSE18842, GSE19804, GSE30219, GSE40791 and GSE68465) and identified a set of differentially expressed genes. Functional enrichment analysis showed that these genes are closely related to the progression of LUAD, such as cell cycle, mitosis and adhesion. Cytoscape software was used to establish a protein-protein interaction (PPI) network to analyze important modules using Molecular Complex Detection (MCODE), and finally CCNB1, BUB1B and TTK were selected for further study. The study found that compared with non-tumor lung tissue, CCNB1, BUB1B and TTK are highly expressed in LUAD. Kaplan-Meier analysis showed that CCNB1, BUB1B and TTK were negatively correlated with the overall survival and disease-free survival of patients. Gene set enrichment analysis (GSEA) demonstrated that for the samples of any hub gene highly expressed, most of the functional gene sets enriched in cell cycle. In summary, CCNB1, BUB1B and TTK can be used as biomarkers of poor prognosis of LUAD. The high expression of CCNB1, BUB1B and TTK can accelerate the progression of LUAD and lead to shorter survival, suggesting that they may be potential targets for treatment in LUAD.


Introduction
In recent years, lung cancer has become the highest cancer incidence and mortality in the world [1]. Due to its high incidence, rapid progression and poor response to treatment, lung cancer has become one of the most serious malignant tumors that threaten human health and life [2,3]. According to the histopathological method, lung cancer can be roughly divided into two categories: non-small cell lung cancer (NSCLC) and small cell lung cancer, while NSCLC includes adenocarcinoma, squamous cell carcinoma and large cell carcinoma [4]. Lung adenocarcinoma, the most common form of NSCLC, accounts for about 40% of primary lung cancers and most of the patients are diagnosed as late stage [5]. Currently, the primary treatments for advanced LUAD remains chemotherapy [6] and targeted therapy [7]. But there are a lot of side effects of chemotherapy and targeted therapy, and the elderly and those with poor general conditions are hard to tolerate. The most notable example of lung cancer targeted therapies is the use of epidermal growth factor receptor tyrosine kinase inhibitors (EGFR-TKI, mainly jefitinib and erlotinib) [8], but most patients have varying degrees of resistance to TKI. However, most patients with NSCLC have a very poor prognosis, especially in LUAD, with a 5-year survival of only 10-15% [9]. Therefore, the search for more effective biomarkers and new targeted drugs become more urgent.
Recently, genome-wide expression profiling has been used to identify prognostic features in cancer patients [10][11][12]. However, certain prognostic-related genes identified in one dataset may be difficult to validate in other cohorts [13]. In order to solve these problems, it is very necessary to verify the important role of the hub genes in independent research or in different populations.
In the present study, we used five datasets to compare changes in gene expression between tumor tissue and adjacent non-tumor lung (NTL) tissues and to overlap those differentially expressed genes (DEGs) that are involved in the development of LUAD.
Finally, we identified 126 DEGs in LUAD and all of these belong to five datasets.
Functional enrichment analysis of 126 DEGs found that genes were significantly enriched in the cell cycle. And then two important modules were analyzed by Molecular Complex Detection (MCODE) algorithm according to protein-protein interaction (PPI) network mapped by DEGs. Finally, CCNB1, BUB1B and TTK were identified for further study. Therefore, we further investigate and explore the predictive value of CCNB1, BUB1B and TTK for the progression and prognosis of LUAD. Our data show that high expression of CCNB1, BUB1B and TTK can promote the progression and lead to poor prognosis in LUAD.

Data collection
The training set (GSE18842, GSE19804, GSE30219, GSE40791, GSE68465) based on the platform of Affymetrix (Affymetrix HG-U133 Plus 2.0 array and HG-U133A array) and the corresponding clinical data was obtained from Gene Expression Omnibus (GEO) (http://www.ncbi.nlm.nih.gov/geo). Three NSCLC genome-wide expression profiles were extracted from the following three data sets: GSE18842, which contains 46 tumors and 45 paracancer samples; GSE19804, which contains 60 pairs of

Data preprocessing
Download the original microarray data file (. CEL file) for the five data sets from the GEO database. The original microarray data was based on the "Affy" R package [14] for Robust Multichip Average (RMA) background correction, log2 transformation and normalization. Finally, the probe is annotated by the Affymetrix annotation file.

Differentially expressed genes (DEGs) screening
Five training sets were screened for DEGs using the "limma" R package [15 3716] for tumor tissue and NTL tissue groups. The false discovery rate (FDR) <0.05 and | log2 fold change (FC) |> 1.5 as cut-off criteria. Draw the volcanoes and venn diagram by the R package lattice and venn, respectively [16].

Functional enrichment analysis
The Database for Annotation, Visualization and Integrate Discovery (DAVID) database (http://david.abcc.ncifcrf.gov/) is an online program that provides researchers with a comprehensive set of functional annotation tools to understand the biological implications behind a large number of genes [17]. Gene Ontology (GO) consists of three main categories: Molecular function (MF), biological process (BP) and cellular component (CC). The Encyclopedia of Genes and Genomes (KEGG) is a database that links relevant gene sets with their pathways [18]. Functional annotation with P-value <0.05 was considered statistically significant. In this study, GO and KEGG were used to detect the enrichment of DEGs in biological implications and pathways.

PPI network analysis
The Search Tool for the Retrieval of Interacting Genes (STRING) [19] database provides information on protein prediction and protein interactions. In this study, DEGs were mapped into PPI and protein pairs were extracted using combined score > 0.4 as a cut-off. In addition, PPI network were constructed using Cytoscape software version 3.2.1 [20]. Topological properties of the PPI network, including degree [21], closeness [22] and betweenness [23] centralities were determined using the R software package igraph, in order to analyze key genes in the network.

Module analysis and validation of hub gene
The network module is one of the features of the protein network and may contain specific biological implications. The most prominent clustering module was analyzed using the Molecular Complex Detection (MCODE) software package in Cytoscape. In Cytoscape, the MCODE calculation is performed according to the cut-off criteria of degree cutoff = 2, node score cutoff = 0.2, k-core = 2, and max. Depth = 100, and finally score> 6 as cut-off value to screen key modules. Next, use the DAVID online tool to analyze KEGG pathways for DEGs in key modules.

Patient tissue specimen
Tissue specimens from 20 patients with LUAD, were collected at the Department of thoracic surgery of Zhongnan Hospital of Wuhan University between August 2017 and February 2018. None of the patients had history of preoperative chemotherapy and radiotherapy.

Quantitative real-time PCR
Total RNA from LUAD tissues were isolated using RNeasy Mini kit (cat. no. 74101, Qiagen, Hilden, Germany) according to the manufacturer's instructions. The cDNA was synthesized using 1 μg of total RNA isolated by ReverTra Ace qPCR RT kit (Toyobo, Shanghai, China) and qRT-PCR was performed using 400 ng cDNA per 25 μl reaction.

Gene set enrichment analysis (GSEA)
In the test set, LUAD samples were divided into two groups based on the expression level of hub genes. To identify potential functions of central genes, GSEA

Identification of differentially expressed genes
Analysis of results shows that 1,387 DEGs (567 up-regulated genes and 820 down- In addition, we performed an overlap analysis of DEGs in NSCLC and LUAD to identify genes that are specifically expressed in LUAD. A total of 314 genes were significantly differentially expressed in the three NSCLC datasets ( Figure 1F). 422 genes were overlaped in the two LUAD datasets ( Figure 1G). After the last overlap of these two subgroups of genes was further screened, 126 genes that affect the oncogenic of LUAD were identified ( Figure 1H).

PPI network and module analysis to determine hub genes
126 DEGs were analyzed based on the STRING database, resulting in 378 proteins pairs with a combined score > 0.4 ( Figure 3A). The top 12 most representative DEGs are listed according to degree, closeness and betweenness (Table 1). In these DEGs, CCNB1, BUB1B and TTK are involved in cell cycle and mitosis, and are both enriched in the cell cycle pathways.
Two modules with score> 6 (modules 1 and 2) were detected by MCODE ( Figure   3B, C). Though the functional enrichment of the genes in the modules (Table 2), we found that the cell cycle pathway was identified as the most important pathway in module 1 (P = 3.99E-5), and the genes CCNB1, BUB1B and TTK in the cell cycle pathway had a higher degree (Table 1). And the genes in module 2 are predominantly enriched in the chemokine pathway (P = 9.02E-4), such as CXCL13, CXCL2, CXCR2.

CCNB1, BUB1B and TTK are overexpressed in LUAD
Since CCNB1, BUB1B and TTK are known to play an important role in the regulation of tumor cell cycle and mitosis, they were selected to further investigate their predictive value for the progression and prognosis of LUAD. It was identified in data sets GSE18842, GSE19804, GSE30219, GSE40791 that expression of CCNB1, BUB1B and TTK was significantly increased in LUAD tissues ( Figure 4A, B and C).
Using qRT-PCR to detect the expression of CCNB1, BUB1B and TTK in 20 pairs of LUAD, we found that CCNB1, BUB1B and TTK were highly expressed in LUAD compared to adjacent normal tissue (all p = 0.000) ( Figure 5).

Associations of CCNB1, BUB1B and TTK expression with progression and prognosis in LUAD
According to the GEPIA database, we found significant differences in CCNB1, BUB1B and TTK expression between different stages of LUAD ( Figure 6A). In the training set GSE40719 and the test set GSE10072, a linear regression analysis showed a positive correlation between the three hub genes and the progression of LUAD (P for trend <0.001) ( Figure 6B and C). In addition, we also found that the overall survival and disease-free survival of LUAD patients with high expression of CCNB1, BUB1B and TTK were significantly shorter ( Figure 6D and E).

Gene set enrichment analysis (GSEA)
To identify potential function of the hub genes, GSEA was conducted respectively to search KEGG pathways enriched in the samples with the gene highly expressed. As a result, it was found that the samples with high expression levels of CCNB1, BUB1B and TTK were enriched in the following six pathways (Figure 7, Table 3), namely "Cell cycle", "Pyrimidine metabolism", "Protesome", "P53 signaling pathway" "Oocyte meiosis" and "RNA degradation".

Discussions
High-throughput analyzes are used to determine gene expression signatures for improved accuracy of prognosis [26]. To identify potential biomarkers in the prognosis and the treatment of LUAD, we integrated the gene expression profiles of 1174 LUAD patients in the five datasets from GEO and then obtained 126 DEGs by analysis. Finally, based on the degree of PPI network and MCODE algorithm to analyze important modules, we found that CCNB1, BUB1B and TTK are important in the cell cycle pathway.
CyclinB1 (CCNB1) is an important member in the cyclin family. Activated CCNB1 can promote cells to enter the M phase from G2 phase and initiate mitotic progression [27,28]. More and more studies have shown that CCNB1 is closely related to the abnormal proliferation of cells and the occurrence of tumors such as CCNB1 overexpression in liver cancer, breast cancer, esophageal cancer and cervical cancer [29][30][31]. Our study found that CCNB1 is highly expressed in LUAD tissues ( Figure 4A, Figure 5A). Aaltonen et al. [32] reported that CCNB1 overexpression in breast cancer is closely related to tumorigenesis, malignant phenotype and poor prognosis. In LUAD patients, we found that CCNB1 overexpression is correlated with shorter overall survival and disease-free survival ( Figure 6D and E). These results indicate that CCNB1 as a potential biomarker can predict the prognosis of LUAD patients.
Budding uninhibited by benzimidazoles 1 homolog beta (BUB1B) protein is an important functional protein. It ensures that the chromosome centromere links correctly with the microtubules to maintain genome stability. Prosecuting point defects in monitoring mechanisms can lead to premature segregation of chromosomes, leading to anomalies in the number of chromosomes that contribute to the development of tumors [33,34]. BUB1B was over-expressed in a variety of tumors, including renal and breast cancers, and there was a significant correlation between mutation and overexpression and chromosomal instability [35]. Our study found that BUB1B was overexpressed in LUAD tissues ( Figure 4B, Figure 5B), suggesting that BUB1B is associated with the development of LUAD and can be used as a biomarker to predict the prognosis of LUAD.
Threonine and Tyrosine kinase (TTK) is a dual-specific protein kinase that can phosphorylates threonine and tyrosine [36]. TTK, a core component of the spindle assembly checkpoint (SAC), plays an important role in cell monitoring mechanisms that ensure healthy cell proliferation and precise division [37,38]. Therefore, the abnormal expression of TTK can affect the function of SAC, ultimately affecting the occurrence and progression of tumor. Studies have shown that in many human malignancies such as glioblastoma, thyroid cancer, breast cancer, liver cancer and pancreatic cancer, the expression level of TTK is significantly increased, and there is a significant correlation between this over-expression and poor survival prognosis [39][40][41][42]. Our study found that the expression of TTK was significantly elevated in LUAD ( Figure 4C, Figure 5C) and the prognosis of LUAD patients with high TTK expression was poor ( Figure 6D, E). These results suggest that TTK can be used as a biomarker to predict the prognosis of LUAD.
In addition, we also found that the expression of CCNB1, BUB1B and TTK were related with the tumor stage of LUAD based on the chipsets (GSE40791 and GSE10072) and the GEPIA database ( Figure 6A, B and C). At the same time, overall survival and disease-free survival were significantly shorter in LUAD patients with high expression of BUB1B ( Figure 6D, E). This indicates that CCNB1, BUB1B and TTK were negatively correlated with the overall survival and disease-free survival of patients.
GSEA analysis found that most of the functional gene sets were enriched in the cell cycle pathway (Figure 7, Table 3).
In summary, this study shows that CCNB1, BUB1B and TTK are overexpressed in LUAD tissues and their upregulation can promote the progression of LUAD and lead to low survival and disease-free survival. Therefore, CCNB1, BUB1B and TTK can be used as prognostic indicators in LUAD patients.

Conflict of interest
All the authors declare that they have no conflict of interest.

Statement of Ethical
This article does not contain any studies with animals and humans performed by any of the authors.