Starting from the influence of tobacco: AKR1C3 and COVID-19 receptor ACE2 are potential prognostic biomarkers for brain lower grade glioma, evidence from bioinformatics analyses

Background: Coronavirus disease (COVID-19) related pneumonia is leaded by Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), which infects host cells through receptors named Angiotensinconverting enzyme 2 (ACE2). Smoking is thought to be related to poor disease prognosis, as a large amount of evidence highlights the negative effects of smoking on lung health and its causal relationship with various respiratory diseases. Methods: We first evaluated the role of ACE2 expression level with tobacco effect. And correlation analyses were used to identify the related genes of ACE2 both in smoker trait and human normal multi-tissues. After intersections, hub genes were later used to further GSEA and co-regulated mechanisms were explored by GeneMANIA. We used UALCAN to perform survival curves of pan-cancers included 33 types of cancers. New clinical model of top co-occurrence cancer type was constructed and validated. Results: Effected by cigarette smoking, the expression level of ACE2 expressed statistically upwards in current smokers compared with never smokers, upwards in groups after acute smoke exposed compared with normal control. But no strong evidence detected in third-hand smoke, as poor amounts of samples, only 4. A little trend of expressing upwards became in groups after third-hand smoke compared with groups after clean air exposing. 4 genes included PIR, ADH7, AKR1C2 and AKR1C3 were identified as ACE2 related genes in smoker trait and human normal multi-tissues. Then, we made the survival curves of pan-cancers in 4 genes. Brain lower grade glioma was the co-occurrence type as both ACE2, PIR and AKR1C3 had the significantly prognostic situation. Later, we made the new clinical prediction model as the C-index was 0.827 and the Area Under Curves (AUCs) of 1-year survival, 2-year survival and 3-year survival were 0.921(95%CI, 0.882-0.961), 0.911(95%CI, 0.860-0.962) and 0.878(95%CI, 0.818-0.939). In inner validation, the AUCs of 1-year survival, 2-year survival and 3-year survival were 0.777(95%CI, 0.640-0.914), 0.916(95%CI, 0.842-0.990) and 0.888(95%CI, 0.768-1.000). In outer validation one, the AUCs of 1-year survival, 2-year survival and 3-year survival were 0.764(95%CI, 0.672-0.856), 0.848(95%CI, 0.783-0.913) and 0.748(95%CI, 0.656-0.841). And in outer validation two the AUCs of 1-year survival, 2-year survival and 3-year survival were 0.740(95%CI, 0.654-0.826), 0.766(95%CI, 0.688-0.843) and 0.794(95%CI, 0.721-0.866). Conclusions: In our study, we compared the ACE2 expression level between smokers and non-smokers crowds and identified associated hub genes. Then we explored the further role of hub genes in tumor fields to identify the correlation and influence between the avoidable host factors such as smoking on COVID-19 contamination and tumor. ACE2 and AKR1C3 are potential prognostic genes for brain lower grade glioma, and we created the web dynamic nomogram and encapsulation app through validations.


Introduction
Coronavirus disease (COVID-19) related pneumonia is leaded by Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), which infects host cells through receptors named Angiotensin-converting enzyme 2 (ACE2) [1]. Since 31 December 2019 and as of 11 July 2020, 12476028 cases of COVID-19 have been reported, including 559l998 deaths [2]. The main entrance way of virus is through mucosal tissues includes nose, mouth and upper respiratory tract. Researchers had verified that lung alveolar epithelial cells of ACE2 expression level. R software was used to perform Shapiro-Wilk normality test and Bartlett test of homogeneity of variances. GraphPad prism 8.0.2 software was used to perform one-tailed t-test [25].

Weighted correlation network analysis and associated module identification of smokers
WGCNA package was used to construct 2 weighted correlation networks of GSE19667 and GSE52237 [26,27]. To check the sample quality, function of goodSamplesGenes was used and gplots package was employed to create heatmap [28]. Function of hclust made the sample dendrogram to eliminate the outliers. Then, pickSoftThreshold function calculated the average connectivity degrees of different modules and their independence in several powers, and assisted to recognize the best soft thresholding power β-value when the degree of independence reached 0.9. Function of blockwiseModules constructed the co-expression network and detect modules. with mergeCutHeight = 0.25 and minModuleSize = 20 in GSE19667 analysis and mergeCutHeight = 0.25 and minModuleSize = 200 in GSE52237. Later, phenotype traits were converted into numerical values and related to co-expression modules by regression analysis. Gene significance (GS) indicated the correlation between genes and clinical traits. And, module membership (MM) demonstrated the association between genes and specific modules. The genes in modules which showed the highest correlation with specific trait were all permit to further analysis. In our study, 4 modules in GSE19667 and GSE52237 were selected with 2 highest coefficient positive modules and 2 lowest coefficient negative modules in trait of smokers.

Functional enrichment analysis of correlation module genes of smokers
In GSE19667 and GSE52237, selected modules were combined into 2 groups included positive and negative. EnrichR is a platform for online enrichment analysis and was used to perform the Gene Ontology (GO) ontologies [29] and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways [30] analysis [31]. Term names, adjusted P-values and combined scores were the values used to construct the enrichment network. Combined score is computed by taking the log of the P-value from the Fisher exact test and multiplying that by the z-score of the deviation from the expected rank [32]. Cytoscape is a powerful software for visualizing network data [33]. Cytoscape yFiles Layout Algorithms made the layout of network [34].

ACE2 related hub genes identification and interaction analysis
We made the 2 intersections to identify the most convincing gens related ACE2 in overall situation. First, in GES19667 and GSE52237 datasets, Correlations between genes and ACE2 were calculated and P <0.05, absolute valued of coefficient >0.4 were the selection criterion. Later, 4 modules in highest correlation included positive and negative in smoker trait in GSE19667 and GSE52237 were intersected with the ACE2 correlation genes in smokers. Secondly, we identified the genes in one-way interaction included positive or negative with ACE2 in human normal multi-tissues of GTEx datasets. Finally, intersection of ACE2 correlation genes in multi-tissues and ACE2 correlation genes in trait of smokers was performed which screened the ACE2 related hub genes in overall application. The hub genes correlations plots in multi-tissues were performed by ggplot2 [35], ggrepel [36] and pheatmap[37] packages. Then, GSE19667 and GSE52237 datasets were merged and we calculated the all genes correlation of hub genes as the traditional fold change in gene-set enrichment analysis (GSEA). Hallmark gene sets of MSigDB [38] was used to GSEA and performed by clusterProfiler package [39]. Finally, GeneMANIA was employed to predict the hub gens functions and interactions and construct the network with 20 related genes [40].

Related cancers screening by prognosis and clinical risk prediction model construction and validation
In order to further explore the role or interaction of hub gens in tumors, UALCAN was used to perform the TCGA pan-cancer survival analysis of hub genes [41]. Survival curve status and P-value <0.05 were the main criteria in 33 types of cancers. We downloaded the 510 LGG patient's mRNA-Seq FPKM and clinical data and developed a new nomogram containing gene expression with clinical information to predict the 1year, 2-year and 3-yaer survival of LGG patients. We separated the patient's information randomly in 8:2 as training and inner testing data. X-tile software calculated the cut-off points and divided the specific clinical factor into 2 or 3 groups [42]. After clinical factors values were converted into numerical values, R package name survival used the Cox proportional hazard regression model to estimate hazard ratio and its 95% confidence interval (CI) for each factors [43,44]. In the multivariate Cox regression analysis, to measure the goodness of fit of statistical models, R function named extractAIC indicated the Akaike information criterion (AIC) of candidate models [45]. And Concordance index (C-index) was used to validate the ability of distinguishing events and non-events correctly. Previous researches identified the age at diagnosis was the largest contributor to patient survival, followed by WHO grade within our variables [46,47]. In our study, tumor grade and ages were determined the basic and key factors. During the model genes factors selecting, we used the correlation analysis between candidate genes in pan-cancers, LGG and related type glioblastoma multiforme (GBM) were especially focused to verify the choice. R package named timeROC created the receiver operating characteristic curve (ROC) curves to predict summarize the predictive capacity [48] and confint of R indicated the 95%CI value of Area Under Curve (AUC) of ROC. R packages named rms [49] and Hmisc [50] created the corresponding nomogram model. Later, 2 CGGA datasets were used to perform 2 outer validations. Risk scores of outer datasets were calculated by the established nomogram model and separated into 2 groups of low or high by median value. And finally, ROC of 1-year, 2-year and 3-year survival of outer validation were performed. To improve the convenient of nomogram, DynNom package [51] produced the web nomogram model and uploaded on Shinyapp platform [52]. We also encapsulated the web nomogram into an Android app named LGG Tracker [53]. Fig.1A-B indicated that in tobacco smoke, the ACE2 of current smokers significantly expressed upwards compared with never smokers, both in GSE994 and GSE7895. And the ACE2 relative expression between former smokers and never smokers, current smokers and former smokers did not show the significant difference. Later for second-hand smoke, the group after Acute Smoke Exposing (ASE) showed the significant upward expression difference (Fig.1C). For third-smoke, the ACE2 relative expression between group after Third-Hand Smoke (THS) and group after Clean Air (CA) showed little significant, as the amount of sample was poor with 4, but it still indicated the upward trend of ACE2 expression after THS treated (Fig.1D).  Before WGCNA network were constructed, pickSoftThreshold function of WGCNA was used to calculate the soft thresholding power. As shown in Fig.4A, in GSE19667, the soft-thresholding power was set at 10 as the first time to reach the 0.9 scale independence and with a low mean connectivity. For GSE52237, 6 was selected to set soft-thresholding power for its initially reached 0.9 scale independence with a low mean connectivity (Fig.4B). Later, one-step network construction function of WGCNA was employed to identify gene modules. Parameter of minModuleSize was set as 30 and mergeCutHeight was set as 0.25 in GSE19667. In GSE52237, minModuleSize was set as 200 and mergeCutHeight was set as 0.25. Finally, 28 modules were Identified in GSE19667 (Fig.5A) and 19 modules were Identified in GSE52237 (Fig.5B). As shown in Fig.6A, In GSE19667, the magenta module showed the highest positive relationship with smoker group as correlation coefficient was 0.85 and pink module indicated the highest negative relationship with smoker group as -0.8 correlation coefficient. Blue and turquoise module correspond to the highest positive and negative relationship with smoker group in GSE52237 (Fig. 6B). Then, the four modules were used to further analyses. 1697 genes and 2330 genes consisted in the positive and negative of smoker which

Functional enrichment analysis of correlation module genes of smokers
Evidence indicated that the four selected modules were convincing to identify the genes associated with smoker. Pink and turquoise modules were combined as positive correlation genes of smoker. Blue and magenta were combined as the negative correlation group. The results of EnrichR were reproduced by Cytoscape, as shown in Fig.8A of positive correlation and Fig.8B of negative correlation. In positive correlation of smoker, the GO enrichment indicated that aminoglycoside antibiotic metabolic process, doxorubicin metabolic process and daunorubicin metabolic process were enriched in biological process, pyrimidine nucleotide-sugar transmembrane transporter activity, protein-disulfide reductase activity and alcohol dehydrogenase (NADP+) activity were enriched in molecular function, Golgi subcompartment, Golgi membrane and vesicle coat were enriched in cellular component. KEGG enrichment indicated that the positive correlation enriched in Mucin type O-glycan biosynthesis, Protein export and Protein processing in endoplasmic reticulum. In negative correlation group of smokers, SRP-dependent co-translational protein targeting to membrane, co-translational protein targeting to membrane and viral gene expression were enriched in biological process, protein kinase C activity, small ribosomal subunit rRNA binding and leucine binding were enriched in molecular function, cytosolic ribosome, polysomal ribosome and ribosome were enriched in cellular component. KEGG enrichment indicated that the negative correlation enriched in Ribosome, Pantothenate and CoA biosynthesis and Sulfur metabolism. correlation group, achieved by Cytoscape.

Fig.9
The upset plot between ACE2 correlation gens in smokers and WGCNA selected modules gens (A) and Venn plot between ACE2 correlation genes in multi-tissues of GTEx and ACE2 correlation genes in smoker by WGCNA analysis (B) As shown in  To explore the potential functions of five genes in smoker condition, we combined the GSE19667 and GSE52237 normalized data as the single gene GSEA input data. After correlations were calculated, functions of one gene were reversed by analyzing the most relevant gene in positive and negative direction, the correlation coefficients represented the log fold change in traditional GSEA analysis. The results indicated ACE2 suppressed in BETA SIGNALING, WNT BETA CATENIN SIGNALING and HEME METABOLISM, activated in REACTIVE OXYGEN SPECIES PATHWAY, GLYCOLYSIS and PROTEIN_SECRETION, as shown in Fig.11A. For ADH7 as shown in Fig.11B, PROTEIN SECRETION, REACTIVE OXYGEN SPECIES PATHWAY and GLYCOLYSIS were activated, CHOLESTEROL HOMEOSTASIS, E2F TARGETS and UV RESPONSE_DN were suppressed. The results of AKR1C2 as shown in Fig.11C indicated that REACTIVE OXYGEN SPECIES PATHWAY, PROTEIN SECRETION and MTORC1 SIGNALING were activated, G2M CHECKPOINT, CHOLESTEROL HOMEOSTASIS and UV RESPONSE DN were suppressed. For AKR1C3 as shown in Fig.11D, PROTEIN SECRETION, REACTIVE OXYGEN SPECIES PATHWAY and ADIPOGENESIS were activated, G2M CHECKPOINT, UV RESPONSE DN and WNT BETA CATENIN SIGNALING were suppressed. In Fig.11E, results indicated that PIR activated in PROTEIN SECRETION, REACTIVE OXYGEN SPECIES PATHWAY and ADIPOGENESIS, G2M CHECKPOINT, UV RESPONSE DN and WNT BETA CATENIN SIGNALING were suppressed. Finally, in the top 3 items, REACTIVE OXYGEN SPECIES PATHWAY was activated in ACE2, AKR1C3, AKR1C2, PIR and ADH7. PROTEIN SECRETION and REACTIVE OXYGEN SPECIES PATHWAY were activated in ADH7, AKR1C2, AKR1C3 and PIR. For the possible co-regulated mechanisms between 5 genes, as shown in Fig.12, results of GeneMANIA showed that regulation of hormone levels and hormone metabolic process were particularly involved in ACE2, oxidoreductase activity, acting on CH-OH group of donors and oxidoreductase activity, acting on the CH-OH group of donors, NAD or NADP as acceptor were involved in ADH7. AKRIC2 and AKR1C3 were involved in 7 functions included aldo-keto reductase (NADP) activity, oxidoreductase activity, acting on the CH-OH group of donors, NAD or NADP as acceptor, oxidoreductase activity, acting on CH-OH group of donors, glycoside metabolic process, hormone metabolic process, regulation of hormone levels and quinone metabolic process.

Fig.14 The cut-off points of Age (A), ACE2 (B), AKR1C3 (C) and PIR (D) detected by X-tile software
Three genes included ACE2, AKR1C3 and PIR expression levels were inserted in clinicopathological data. X-tile software made the cut-off points analysis of age, ACE2, PIR and AKR1C3 factors, instead of as the median level. As shown in Fig.14A, ternary classifications were in age with 14-41 years, 42-60 years and 61-87 years. Binary classifications were in ACE2, PIR and AKR1C3 with low and high expression status, as shown in Fig.14B, the cut-off point between low expression level of ACE2 is 0.12, and the cut-off point of AKR1C3 is 2.6 ( Fig.14C) and the cut-off point of PIR is 7.9 (Fig.14D). After screening, patients with NA values in anyone of factors were excluded. Finally, 510 patients of TCGA database and 328 patients of CGGA database were obtained. Detailed baseline characteristics are summarized in Table 1. After the univariate Cox analysis, age, tumor grade, pharmaceutical, radiation, ACE2, PIR, AKR1C3 were detected to be statistically associated with the mortality of LGG ( Table 2). Later, in the multivariable Cox proportion hazard regression models in which factors with p-value less than 0.05 were included based on the univariate analysis, to simplify the model, the most meaningful factors were included as the results indicated that the concordance index (C-index) of multi-factors permutation and combination ( Table 3). Results indicated that the C-indexes of ACE2 and AKR1C3 factor joined the model individually were the highest value, and the Akaike information criterion (AIC) value of the only AKR1C3 joined model was slightly less than the both ACE2 and AKR1C3 joined ( Table 3). AIC is to find the model that can best interpret the data but contains the least free parameters. The less AIC value, the better model. To further investigate and proof the rationality of ACE2 factor joined, we made the correlation relationship analysis of 3 genes in UCSC recomputed TCGA datasets contained 33 cancers. As shown in Fig.15A-C, in LGG and related cancer type, only AKR1C3 and ACE2 indicated the highest coefficient. The related correlation coefficient of AKR1C2 and ACE2 in LGG was 0.51 (Fig.15D) and in GBM was 0.79 (Fig.15E). In our study, age and tumor grade were determined the key and necessary factors, results indicated that the C-index improved when ACE2 and AKR1C3 were inserted. Finally, multivariable Cox analysis indicated that age, tumor grade and AKRIC3 were significant in statistical, and ACE2 were still brought as the P-value of 0.364 and its special important status in COVID-19 ( Table 2). Abbreviations: TCGA, The Cancer Genome Atlas; CGGA, Chinese Glioma Genome Atlas.  Table 3. The C-index and AIC of multi-factors permutation and combination Abbreviations: C-index, Concordance index; AIC, Akaike information criterion.

Fig.15
The correlation of between ACE2 and PIR (A), AKR1C3 and ACE2 (B), PIR and AKR1C3 (C) in multi-cancers, and detail coefficient between AKR1C3 and ACE2 in LGG cancer (D) and GBM caner (E) Then, we established a nomogram to predict the 1-year, 2-year and 3-year survival, as shown i n Fig.16A. Furthermore, a web calculator was constructed to improve the accuracy, the available U RL is https://yhy9.shinyapps.io/LGG_Nomogram and the standby URL is https://figshare.com/articles/ online_resource/LGG_nomogram/12619157. And the corresponding app of Android platform was ac hieved to improve the convenient and can be downloaded at https://figshare.com/articles/software/LG G_Tracker_1_0-signed_apk/12619121. The C-index of training model was 0.827, which indicated that about 82.7% probability of nomogram model to predict patient mortality correctly. As shown in

Discussions
We evaluated the ACE2 expression level of smokers included direct, second-hand and third-hand. ACE2 expression level significantly expressed upward in current smokers compared with never smokers, while other comparisons included former smokers and never smokers, current smokers and former smokers did not show the significant difference. Then, we explore the effect of Acute Smoke Exposing (ASE) in ACE2 expression level, results showed the significant upward expression difference. In Third-Hand Smoke (THS) groups, due to the poor amounts of samples, there was no significant expression difference and a little upward expression of ACE2 in after THS showed a trend. Total results pointed out that cigarette smoking effected the ACE2 expression level. To further identify the related genes about smoker trait, then, we chose GSE19667 and GSE52237 for WGCNA. Several modules were detected. And in most convincing positive correlation modules of smoker trait, the GO enrichment indicated that aminoglycoside antibiotic metabolic process, doxorubicin metabolic process and daunorubicin metabolic process were enriched in biological process, pyrimidine nucleotide-sugar transmembrane transporter activity, protein-disulfide reductase activity and alcohol dehydrogenase (NADP+) activity were enriched in molecular function, Golgi subcompartment, Golgi membrane and vesicle coat were enriched in cellular component. In negative correlation group of smokers, SRP-dependent co-translational protein targeting to membrane, co-translational protein targeting to membrane and viral gene expression were enriched in biological process, protein kinase C activity, small ribosomal subunit rRNA binding and leucine binding were enriched in molecular function, cytosolic ribosome, polysomal ribosome and ribosome were enriched in cellular component. KEGG enrichment indicated that the positive correlation groups enriched in Mucin type O-glycan biosynthesis, Protein export and Protein processing in endoplasmic reticulum and the negative correlation groups enriched in Ribosome, Pantothenate and CoA biosynthesis and Sulfur metabolism. Later, we made 2 steps intersections contained the first intersection between ACE2 correlation genes and related genes about smoker trait in GSE52237 and GSE19667, and the second intersection between ACE2 correlation genes in human multi-tissues and gens both related ACE2 and smoker trait. 5 hub genes included ACE2 with PIR, ADH7, AKR1C2 and AKR1C3 were identified. GeneCards and Entrez gene summaries indicated that [54][55][56]: ACE2 (Angiotensinconverting enzyme 2) may play a role in the regulation of cardiovascular and renal function, as well as fertility. Associated diseases with ACE2 include Severe Acute Respiratory Syndrome and Coronavirus Infection. Related pathways contain Agents Acting on the Renin-Angiotensin System Pathway, Pharmacodynamics and Peptide hormone metabolism. Gene Ontology (GO) annotations related to ACE2 include metallopeptidase activity and peptide binding; PIR (Pirin), the encoded protein may act as a transcriptional cofactor and be involved in the regulation of DNA transcription and replication. Associated diseases include Exhibitionism and Obsessive-Compulsive Personality Disorder. Related pathways include Cell Receptor Signaling Pathway and Apoptosis and Autophagy. GO annotations related to PIR include transcription coregulator activity and quercetin 2,3-dioxygenase activity; ADH7 (Alcohol Dehydrogenase 7), the encoded enzyme may participate in the synthesis of retinoic acid, a hormone important for cellular differentiation. Associated diseases with ADH7 include Alcohol Dependence and Atrophic Gastritis. Related pathways are Drug metabolism-cytochrome P450 and Cytochrome P450-arranged by substrate type. GO annotations related to ADH7 include oxidoreductase activity and retinol binding; AKR1C2 (Aldo-keto Reductase Family 1, member C2), associated Diseases with AKR1C2 include 46, Xy Sex Reversal 8 and Colorectal Cancer, Hereditary Nonpolyposis, Type 2. Related pathways are Drug metabolism -cytochrome P450 and Naphthalene metabolism. GO annotations related to AKR1C2 include oxidoreductase activity and carboxylic acid binding; AKR1C3 (Aldo-keto Reductase Family 1, member C3) may play an important role in the pathogenesis of allergic diseases such as asthma, and in controlling cell growth and/or differentiation. Associated diseases with AKR1C3 include Prostate Disease and Endometrial Cancer. GeneMANIA showed that regulation of hormone levels and hormone metabolic process were particularly involved in ACE2, oxidoreductase activity, acting on CH-OH group of donors and oxidoreductase activity, acting on the CH-OH group of donors, NAD or NADP as acceptor were involved in ADH7. AKRIC2 and AKR1C3 were involved in 7 functions included aldo-keto reductase (NADP) activity, oxidoreductase activity, acting on the CH-OH group of donors, NAD or NADP as acceptor, oxidoreductase activity, acting on CH-OH group of donors, glycoside metabolic process, hormone metabolic process, regulation of hormone levels and quinone metabolic process. Later, we made the pan-cancers survival curves of 5 hub genes, and Brain Lower Grade Glioma (LGG) was the highest co-occurrence type as both ACE2, PIR and AKR1C3 had the significantly prognostic situation. Further clinical predicting model was constructed with the factors of ACE2, age, AKR1C3 and tumor grade. And related web dynamic nomogram and encapsulation APP were created as well. Some limitations occurred in current study, 3 prognostic genes included ACE2, PIR and AKR1C3 need to be verified in laboratory experiments. And the prediction model based on the retrospective data and the factors in clinical data were limited that can miss something more important. In conclusions, ACE2 and AKR1C3 play a prospective role in brain lower grade glioma and corresponding clinical nomogram improves the prognostic capability. But the mechanism of ACE2 and AKR1C3 in LGG need to be deeply investigated in the future.

Funding
None.