Nomogram with a novel microenvironment signature is systematically constructed and validated to predict the survival rate of glioma patients

Glioma accounts for the highest proportion of primary intracranial malignant tumors. Microenvironment enormously influences the process of glioma progression. Our study is to establish an individualized prognostic nomogram for glioma patients with microenvironment signature. Glioma samples of Chinese Glioma Genome Atlas (CGGA) were grouped by the immune and stromal score based on ESTIMATE algorithm. Microenvironment-related genes (MRGs) in glioma were analyzed by R. To determine the best prognostic correlation genes, univariate and multivariate Cox regression analysis were used to analyze MRGs. Use the selected genes (CHI3L1, SOCS3, SLC47A2, COL3A1, SRPX2 and SERPINA3), we established the prognostic risk score model (microenvironment signature) and validated it. Gene Set Enrichment Analysis (GSEA) showed that the high-risk group was mainly enriched in immune and stromal function KEGG pathways. Finally, the nomogram was constructed and evaluated. The receiver operating characteristic (ROC) curve, Calibration plots and decision curve analysis (DCA) of training and validation set indicated the excellent predictive performance of nomogram. In conclusion, the 6-gene microenvironment signature can not only provide directions for the basic research of glioma, but also can be included as an independent prognostic index in nomogram for individual prediction to guide clinical treatment.


Introdution
In primary intracranial malignant tumors, the proportion of gliomas can be as high as 81% 1 .
Although a lot of achievements have been made in the clinical and molecular research of glioma, there are significant deficiencies in the study on the prognostic biomarkers and a more accurate and reliable prognostic index of glioma patients is also needed.
Tumor cell internal genes play essential roles in the evolution of glioma 2,3 . At the same time, tumor microenvironment had vital effects on gene expression in tumor tissues [4][5][6][7] . Tumor microenvironment contains two main non-tumor components: immune cells and stromal cells, which are crucial for diagnosis and prognosis of tumors 8,9 . Many studies showed that some microenvironment-related genes (MRGs) play essential roles in glioma in many signal pathways 10,11 . Therefore, MRGs are expected to be clinical prognostic indicators and therapeutic targets for glioma.
Thanks to the continuous development of genome sequencing technologies, several glioma molecular biomarkers have been discovered. There have been many studies on 1p/19q codeletion, tumor protein 53 (TP53) mutations, isocitrate dehydrogenase (IDH) mutation and so on 12,13 . Emerging research suggests that certain single genes do not fully represent tumor characteristics, but global gene expression pattern of multigene could be used as a special molecular biological marker for subgroup classification, early diagnosis, treatment targeting , prognosis prediction and so on in glioma 14,15 . However, there is little research on the global expression pattern based on MRGs in glioma.
Recently, a newly proposed computational algorithm, known as "Estimation of Stromal and Immune cells in Malignant Tumor tissues using Expression data (ESTIMATE)" was developed 9 and successfully brought to calculate the degree of infiltration of non-tumor cells in several malignant tumors like prostate cancer 16 , breast cancer 17 , and colon cancer 18 . Therefore, in this study, we use the ESTIMATE algorithm to evaluate the RNA sequencing data of glioma samples, construct and validate a microenvironment signature that can predict prognosis and provide research directions for therapeutic targets in glioma. Moreover, combining clinical parameters and the microenvironment signature, we established an innovative and promising predictive nomogram model, which has more accurate predictive ability for glioma.

Identification of MRGs and enrichment analysis
Using the immune or stromal median score as the cut-off, we divided the 693 glioma cases into high/low immune or stromal score groups. The K-M survival curve [ Figure S1] showed that, whether in immune (p = 0.281) or stromal (p= 0.114) groups, the median overall survival of patients with high scores was lower than that of patients with low scores, although they were not statistically significant.
We compared their RNA-seq data based on the high/low immune or stromal score group. The heatmaps [ Figure 1A] showed that the gene expression profiles of the cases were different. In

Identification of prognosis-related MRGs
We excluded patients with loss of age and survival time in cohort 1 and performed univariate Cox regression analysis on the 318 MRGs. The significant prognostic genes (P < 0.05) were arranged in ascending order, and top 10 genes related to prognosis were identified and  Figure   3B]. We treat cohort 2 as an external validation cohort and verify it in the same way. K-M curve also demonstrates that OS of patients in high-risk group was markedly worse (P < 0.001) [ Figure 3C]. And microenvironment signature showed a favorable predictive ability to predict the OS rates in 1, 2and 3 years with the AUC values in the 0.762, 0.82 and 0.826, respectively [ Figure 3D].

Figure3
. Survival analysis and prognostic evaluation of the microenvironment signature in glioma. K-M survival curve of the risk score for patient OS in the training (A) and validation set (C). The OS of patients in high risk group was significantly worse than low risk group. The prognostic evaluation of the microenvironment signature displayed by the ROC curve for predicting the 1,2 and 3-year OS rates in the training (B) and validation set (D).

Construct and verify the nomogram
Based on the training cohort, we established a prognostic nomogram, which can predict the 1,  Among all the areas formed by the curves and "None" and "All", the nomogram curve is the largest, which showed that the prediction ability of nomogram model is better than that of single parameter model.

Discussion
The tumor microenvironment of glioma plays an essential role in the development of glioma.
The change of microenvironment related genes can affect the expression of tumor tissue, and then affect the clinical outcome 4,5 . Therefore, MRGs are promising prognostic indicators and treatments target for glioma.
With the progress of high-throughput sequencing technology, more and more biomarkers related to the survival of glioma patients have been identified 12,13 . Many global gene expression patterns can be used in prognosis prediction, risk stratification and treatment guidance of glioma 20,21 . However, there is still a lot of room for us to study the global expression pattern based on MRGs in glioma.
ESTIMATE is a bioinformatics tool for predicting non-tumor cell infiltration. It can score each sample by evaluating the particular gene expression feature of stromal and immune cells 9 . In this study, we first use ESTIMATE to grade the cohort 1 samples. Taking the median score as the dividing value, the samples were partitioned for high/low immune or stromal score group.
Then, we regard the 318 intersection genes of DEGs between the immune and stromal group as MRGs. Finally, univariate and multivariate Cox regression analysis confirmed that 6 up-regulated genes (CHI3L1, SOCS3, SLC47A2, COL3A1, SRPX2 and SERPINA3) were significantly related to prognosis. TCGA-based GEPIA also proved that they are up-regulated in glioma compared to normal tissues. GSEA showed that gene sets of the high-risk group in the cohort 1 were chiefly enriched in immune and stromal related KEGG pathways, which suggested that glioma with high expression of 6 kinds of MRGs can affect tumor progression through related pathways. CHI3L1, also known as YKL-40, is a pro-inflammatory factor that can be used as a biomarker of glioma and brain injury. High levels expression of YKL-40 in human gliomas can activate AKT 22 . Angiogenesis and malignancy of glioblastoma can be synergistically inhibited by Anti-YKL-40 antibody and ionizing irradiation 23 . SOCS3 is related to tumor progression and therapeutic response in glioma, and it crucial for glioma to acquire anti-radioactivity. Hypermethylation of SOCS3 promoter is an important marker of poor prognosis in glioma 24,25 . The expression of SLC47A2 can be cis-regulated in renal cell carcinoma 26 . Type III collagen is an important signal molecule to promote wound healing, and COL3A1 encodes its alpha 1 chain 27 . SRPX2 enhances EMT process and promotes glioma metastasis through MAPK signaling pathway 28 . The upregulation of SERPINA3 might reshape the extracellular tissue matrix and promote the invasion of glioma, and it was significantly related to the poor survival of patients 29 . In summary, CHI3L1, SOCS3, SRPX2 and SERPINA3 were significantly associated with the evolution of glioma. However, SLC47A2 and COL3A1 had not been studied in glioma. These MRGs can be used not only as independent prognostic biomarkers but also as potential targets to guide the treatment of glioma.
Then, based on the expression of 6 MRGs, we developed and validated a novel risk score model (microenvironment signature) and separated glioma patients into low/high-risk group based on their risk score. Subsequently, the K-M curve showed that the high-risk group had an appreciably poorer prognosis. Therefore, glioma patients with high-risk scores should receive more attention and adopt more aggressive individualized medical strategies. At the same time, Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 16 December 2020 doi:10.20944/preprints202012.0404.v1 they need to be closely followed up to detect recurrence.
Nomogram can intuitively show the prognosis, which makes it widely used in clinical practice 30 .
In this study, we constructed and verified a nomogram based on microenvironment signature, IDH mutation status and age. As far as we know, the nomogram is an innovative combination of microenvironment signature and clinical parameters, which can individually and more precisely predict the survival rate of glioma patients. The ROC curve showed that the nomogram has an excellent ability to predict the OS rate of 1-, 2-and 3-years. The calibration curve showed that the prediction of the nomogram is in outstanding agreement with the actual observation, and the DCA curve showed that the nomogram model was better than the single parameter model. Combining the results of these three indicators, the innovative and promising nomogram demonstrates excellent prediction ability.
There is no denying that there are still some deficiencies in our research. First, the data we download from CGGA is incomplete and limited. Some clinical information of some patients is missing and some clinical parameters, such as operation method, tumor location, tumor size, etc. were not included in the study. Second, a limitation of this prediction model lies in its retrospective property, so it needs to be further verified in future clinical trials.

Database
From the Chinese Glioma Genome Atlas (CGGA, http://www.cgga.org.cn/) databases, we download clinical information and RNA sequencing data of glioma patients. Cohort 1 (mRNAseq_693) 31 and cohort 2 (mRNAseq_325) 32 were selected as training set and validation set respectively. Figure 6 showed the schematic diagram for constructing the nomogram. Through the evaluation of ESTIMATE algorithm, each sample was calculated to get its own immune and stromal score. Taking the median score of immune or stromal score as the dividing point, cohort 1 was partitioned for high/low immune or stromal score group. We screened the differentially expressed genes (DEGs) in immune or stromal score group by edgeR 33

Construct and evaluate the prognostic risk score model of MRGs
First of all, univariate Cox regression analysis was conducted to the MRGs in training cohort 1 by using survival software package in R 3.6.1. Genes with P < 0.05 were deemed as statistical significance to overall survival (OS) of glioma patients 36 . Then the first 10 genes with the lowest P-value were analyzed by multivariate Cox regression analysis. After the analysis we used the selected genes as the genes related to the optimal prognosis and established a prognostic risk score model to predict OS 37 .
The risk score was obtained according to the following formula: Where βi and xi are the coefficient and relative expression value of each selected gene, respectively 38 , and each patient could get a prognostic risk score according to this formula.
According to their median risk score, glioma patients were divided into low/ high-risk group.
Next, we constructed the Kaplan-Meier (K-M) survival curve of low/ high-risk group, and the survival difference between the two groups was evaluated by two-sided log-rank test.

Construction and validation of the nomogram
We combined the MRGs-based prognostic model (microenvironment signature) with other clinicopathological parameters of glioma patients for univariate and multivariate Cox proportional hazard regression analysis in the cohort 1 and cohort 2. After the analyses, we screened out all independent prognostic factors and used the rms R package (https://cran.r-project.org/web/packages/rms/) to construct a nomogram of these independent prognostic factors to evaluate the probability of 1, 2 and 3-year OS in cohort 1 glioma patients 14 . The discriminant ability of nomogram was graphically evaluated by using C-index, the AUC value, Calibration plots and decision curve analysis (DCA) 39,41,42 . Finally, cohort 2 was used as an external verification of the prognostic nomogram. All analyses were performed with R, and P < 0.05 was deemed to be statistically significant. Hazard ratios (HRs) and 95% confidence intervals (CIs) were also stated.

Conclusion
In this study, a promising nomogram contains novel microenvironment signature was constructed and validated for glioma individual prognostic assessment. Further bioinformatics analysis of these MRGs and microenvironment will help to clarify its possible survival mechanism. Next, this model will be further verified in clinical trials and is likely to be translated into meaningful practice to guide the individualized treatment of glioma patients.