Integrative analysis of multi-omics data identified KLRC3 as key nodes in a gene regulatory network related to immune phenotypes in lung adenocarcinoma

In a recent study, the PD-1 inhibitor has been widely used in clinical trials and shown to improve various cancers. However, PD-1/PD-L1 inhibitors showed a low response rate and showed to be effective for a small number of cancer patients. Thus, it is important to identify key genes, which can enhance the PD-1/PD-L1 response for promoting immunotherapy. Here, we used ssGSEA and unsupervised clustering analysis to identify three clusters to show different immune cell infiltration status, prognosis, and biological action. The cluster C showed a better survival rate, high immune cells infiltration, and immunotherapy effect enriched in a variety of immune active pathways, including T and B cell signal receptors. Besides, it showed more immune subtypes C2 and C3. Further, we used WGCNA analysis to confirm the cluster C correlated genes. The red module highly correlated with cluster C for 111 genes which were enriched in a variety of immune-related pathways. To pick candidate genes in SD/PD and CR/PR patients, we used the Least Absolute Shrinkage and SVM-RFE algorithms. In conclusion, our LASSO analysis and SVM-RFE based research identified targets with better prognosis, activated immune-related pathways, and better immunotherapy. The KLRC3 was identified as the key gene which can efficiently respond to immunotherapy with greater efficacy and better prognosis.


Introduction
According to recent research, lung cancer is a highly malignant prognostically poor type of cancer. It ranks among the top cancers in terms of morbidity and mortality (1), wherein about 60% of the patients are in the early stages of lung cancer undergoing combined treatment of surgery, chemotherapy, and radiotherapy were not found satisfactory, and therefore, a quest for a novel therapeutic scheme for lung cancer management has become important. With the development of molecular biology, immunology, scientific research, and technology, immunotherapy has aroused great interest in researchers. Since 2010, the PD-1 or PD-L1 antibodies have shown significant anti-tumor activity, including NSCLC (2,3). In addition to activated T-cells where PD-1 binds to its ligands (PD-L1, PD-L2) to suppress T-cell functions, the PD-1 effect is also seen in other immune cells (4,5). Interestingly, the interactions of PD-1 with its ligands constitute a major immunosuppressive pathway in tumors (6,7). Nowadays, several therapeutic antibodies with PD-1 or PD-L1 suppressing properties have been formulated for managing malignancies like NSCLC clinically. However, these immunotherapies show a low remission rate. Thus, a great deal of effort has been undertaken to find predictive biomarkers from mechanisms involving PD-1 pathway inhibition taking patients with optimal response to these inhibitions into consideration, and also assess optional personalized regimes for NSCLC/other cancer patients with possible responses to PD-1 inhibitors. Thus, it is urgent to confirm the targets that can be used for immunotherapy with a better survival rate. Here, we used the ssGSEA and unsupervised clustering analysis to identify targets with better prognosis, immunotherapy and that are activated in immune-related pathways based on the red module for 111 genes that are highly correlated with the cluster C based on the LASSO and SVM-RFE analyses. The KLRC3 was identified as a key gene with a better prognosis and correlated with immunotherapy.

Datasets and Samples
Gene expressions of a total of 1881 patients with detailed survival information obtained from TCGA-LUAD (https://portal.gdc.cancer.gov/) and GEO datasets of GSE31210, GSE30219, GSE68465, GSE37745, GSE50081, and GSE72094 were generated. Expression values were log-transformed, and the "ComBat" algorithm was used for reducing probable batch effects resulting from the inter-dataset biases (non-biotech) (8).
Gene signature and single-sample gene set enrichment analysis A set of marker genes for types of immune cells was selected based on Bindea et al. (9). For enrichment computation in the individual sample gene set, the absolute enrichment fractions were derived via the GSEA program for traits that have been validated by prior experimentation. For confirming the immune cell populations, the ssGSEA analysis of each sample was accomplished using the immune cell signature gene predictions (10).

Gene set variation analysis (GSVA) and functional annotation
To explore the biological event differences between clusters A, B, and C, we used the "GSVA" R packages to conduct the GSVA enrichment analysis (Figure 2A.). The "c2.cp.kegg.v7.2.symbols.gmt" was obtained from the MSig DB dataset, and the p-adjust< 0.5 was considered as statistically significant. The subtypes also were seen to correlate with the immune studies and prognosis by Thorsson and colleagues (15), aggressive subtype by Elisa Dama and colleagues (16), and luminal and basal Subtypes by Shuang G. Zhao and colleagues (17) to analyze the overlap with our study cluster C ( Figure 3A.). To confirm the events in the different subgroups, the distribution of mutant gene frequencies affected by SNVs and CNVs was investigated across various subtypes ( Figure 3B.). The frequency of mutations in each gene was significantly different across subtypes (Fisher exact test with BH test correction, adjusted P <0.05).

Consensus Clustering for Tumor-Infiltrating Immune Cells and DEGs
For each sample, hierarchical agglomerative clustering was implemented for LUAD depending on a specific pattern. In this procedure, the "Consensus Cluster Plus" R package was used to perform "Pam" analysis, which is a Euclidean distance and Ward' s linkage-based unsupervised clustering approach. To ensure clustering stability, the aforementioned process was repeated about 100 times.

DEGs associated with the two clusters
Depending on the infiltration of immune cells, patients were classified into high and low immune-cell infiltration subtypes. To determine DEGs between two clusters, the limma R package was utilized, and absolute fold change was designated to >1, and significance criterion adjusted to p < 0.05.

Construction of signature gene of lung adenocarcinoma
For immunotherapy response assessment of lung cancer patients presenting newly defined immunophenotypes, the gene expression profiles and clinical outcome data of 348 patients from the Imvigor210 (a clinical response trial dealing with PD-L1 blockade by atezolizumab) were collected. The responses to anti-PD-L1 therapy constituted the observed endpoints, which were complete response (CR), partial response (PR), progression of disease (PD), and stability of disease (SD). Regarding the objective response rate (ORR) and disease control rate (DCR), they respectively involved patients with CR and PR (for ORR) and patients with CR, PR, and SD (for DCR). Based on the top 10 marked genes of metabolic subtypes, we separated the Imvigor210 cohort into three subtypes(cluster A, cluster B, cluster C).
For candidate gene selection, we used a Least Absolute Shrinkage and Selector Operation (LASSO) algorithm, whose penalty parameter was adjusted by setting a cross-validation (10-folds) approach. Meanwhile, we used another algorithm, Support Vector Machine-Recursive Feature Elimination (SVM-RFE), to accomplish gene selection for the CR/PR and SD/PD patients. For further narrowing on the gene among the training cohort, L1 penalized Cox analysis was eventually carried out through gene integrations from either of the above two algorithms.

Gene Expression Data with Immunotherapy
Under the Creative Commons 3.0 license (http://research-pub.gene.com/IMvigor210CoreBiologies), the IMvigor210 dataset was obtained from an accessible, well-documented software & data package. To determine the status of binary response in various clusters, 298 urothelial cancer patients and 80 immunotherapy recipients with cutaneous melanoma having complete clinical records were analyzed.

Statistical Analysis
GraphPad and R 4.0.0. were used for all statistical analyses. The Wilcoxon test was conducted for pair-wise comparison analysis, Kruskal-Wallis test was adopted for comparison among more than two groups. The FDRs in Limma and GSEA were adjusted by the Benjamini-Hochberg approach with a significance level of p < 0.05. The correlation of categorical clinical information with defined clusters was statistically examined by Fisher ' s exact test. All statistical differences were considered significant when p-value <0.05.

Identification of different Subtypes
In this study, the immune cells infiltration matrix was used to identify two clusters with different survival rates ( Figure 1A. Figure 1E). Differential expressions of 110 genes between the two clusters ( Figure 1D; tableS1)were enriched in various immune-related pathways such as T-cell activation and cytokine activity ( Figure 1F).

Identification of Gene Subtypes and association with known subtypes
With the aid of the Limma package, the differential expression genes (DEGs) analyses for transcriptome evolution investigation among these clusters performed to identify the biological function of different clusters showed 110 differential gene expressions. For the elimination of redundant genes, the cox analysis was performed to collect significantly correlated prognosis genes (tableS2).
The survival records used to assess the prognostic implication of the clusters ( Figure  2B.) showed clusters B and C to have a better survival rate than cluster A (P = 0.005). Meanwhile, cluster C showed greater immune cells infiltrations such as aDC, B cells, CD8 T cells, Cytotoxic cells, DC, Eosinophils, iDC, Macrophages, Mast cells, Neutrophils, NK CD56-dim cells, T cells, Tcm, Tem, TFH, Tgd and Th1 cells ( Figure  2C). Further, enrichment of cluster C showed multiple immune-associated pathways, including those for signaling T-and B-cell receptors ( Figure 2D-F). Additionally, cluster C showed high tumor mutational burden, neo-antigen (indel and SNV), and PD-1 expression than other subtypes ( Figure 2H-K.).
Our findings showed clusters B and C to have high immune subtypes C3 and C2, high frequency of LumB subtype, and aggressive Subtype C4 (Figure 3C.). Of these subtypes, the immune subtype C3, LumA subtype, and aggressive subtype C4 found in our study ( Figure 2B) were consistent with the findings of others for a better prognosis. In addition to this, a majority of immune-checkpoint relevant signatures (CD274, CTLA4, HAVCR2, IDO1, LAG3, and PDCD1) and immune-activity relevant signatures (CD8A, CXCL9, CXCL10, GZMA, GZMB, IFNG, PRF1, TBX2, and TNF) exhibited significant high expression in cluster C (Figure 3D.).

Immunotherapy response
Our findings also showed that the cluster A subtype had high SD/PD with 72.6% and low CR/PR with 27.4%; however, the cluster A subtype had high CR/PR with 19.2% than the cluster C subtype ( Figure 3F). In the Imvigor210 cohort, the cluster C subtype had a higher survival rate than other subtypes from anti-PD-1 treatment ( Figure 3E).

WGCNA analysis
Aided by the R package "WGCNA", the co-expression network was created from the expression levels of 11,518 genes (18). Clustering of 1,881 samples was performed by calculating the mean linkage and Pearson's correlation coefficient. The soft threshold power was set at β = 3, scale-free R2 = 0.96 to ensure the scale-free network was constructed (Figs. 4A, 4B). A dynamic mixing and cutting technique was employed to establish the hierarchical clustering tree, each of whose leaves was used to refer to a gene. Meanwhile, a tree branch constituted gene assemblies resembling expression data that were used to refer to a gene module. In this study, a total of 6 modules were produced ( Figure 4C, D), of which the red module showed a high correlation with cluster C from 111 genes enriched by T-cell activation, cytokine activity, and regulation of T-cell activation (Figure 4E.). The median expressions of red module genes showed a high correlation with PD-1/PD-L1 expression ( Figure  4F).

Identification of predictive signature
Further, the most significant genes were selected via two algorithms from the CR/PR and SD/PD patients, and the lasso algorithms were used to identify the prognosis gene (Table S3.). A total of 106 gene candidates were identified after the integration of LASSO-and SVMRFE-selected genes from which, 6 genes were selected by both algorithms (Figure 5A-C.). The correlations were seen between the overlapping genes, including the PNOC, RHOH, ACAP1, CYTIP, IL10RA, and KLRC3 along with the immune cell infiltrations (Figure 5D.). The expression of KLRC3 showed significant response/Non−response of PD−L1 ( Figure 6A-F.) and better survival ( Figure 6G).

DISCUSSION
With the developments in immunotherapy, the PD-1/CTLA4 inhibitors have been widely used in clinical applications and in improving various cancer prognoses. The effect of PD-1 inhibitor still exists in low remission, which is mainly restricted by many factors, such as tumor mutational burden (11), microsatellite instability (MSI), effective DNA mismatch repair (DMMR) (12), and expression of neoantigen (13) and PD-L1 (14), whose associations with response to tumor immunotherapy have been demonstrated. These remain in low remissions are due to many factors such as the degree of CD8+ T-cell infiltration and tumor microenvironment. Here, in our research, we identified those factors, which can show better prognosis, immune cells infiltration, and used for immunotherapy. We identified KLRC3 as a key gene through cluster C analysis, which can affect predictions in immunotherapy and prognosis. At first, the ssGSEA algorithm with 1881 samples was used to acquire 24 immune cells infiltration matrix, and based on the matrix findings; the unsupervised clustering analysis was used to build two clusters, including high and low immune cells infiltration clusters (Figure 1C.), having different significant survival rate (Figure 1B.). The above findings showed that cluster C had a high immune-activated potential to elicit an effective immune response. The 110 genes having different expressions in two clusters ( Figure 1D) were enriched in the various immune-related pathways implying that the immune cells infiltration levels correlated with immune-related pathways such as T-cell activation and cytokine activity. Based on the differential gene expressions, we identified three clusters subtypes that showed different survival and immunotherapy effects. The immunotherapy techniques in various cancers in different studies exhibited greater potential in improving the prognosis and were effective in killing the tumor cells (19,20). However, with the application of these immunotherapeutic strategies, the effective rate of the therapy especially using the PD-1 inhibitors, is limited (21).s Various factors show a low remission rate in immunotherapy, such as the tumor mutation burden (TMB) (22), effective DNA mismatch repair (DMMR) or microsatellite instability (MSI) (23), neo-antigen (24), and PD-L1 expression (25), whose associations with response to tumor immunotherapy have been demonstrated earlier. Our findings coincided with these findings by high TMB, Silent Mutation Rate, and Neoantigens (SNV and Indel) and identified an immune hot phenotype, favorable immune-activity with increased infiltration of aDC, B cells, CD8 T-cells, Cytotoxic cells, DC, iDC, macrophages, mast cells, neutrophils, NK, CD56dim cells, T-cells, T-helper cells, Tcm, Tem, TFH, Tgd, Th1 cells, and TReg and were enriched in various immune-related pathways, implying an effective immune-reaction group. In accordance with the findings of Thorsson and colleagues, our research showed cluster A with high immune subtypes C1 and cluster B and C with high immune subtypes C3 which also corresponded with the finding in the outcomes for C2 and C1 to be less favorable despite the substantial presence of immune components and high CD8+ l and CD4+ T-cell in C3showing better survival.
The invasiveness of cancers is always the main reason for poor prognosis. Our research findings showed the cluster A subtype to have higher aggressive subtype C1, similar to the findings of Elisa Dama confirming it to be a poor prognosis, which is also similar in advanced lung cancer conditions. Further, the immune-relate genes, including CD274, CTLA4, HAVCR2, IDO1, LAG3, and PDCD1 as immune-checkpoint-relevant signatures, and CD8A, CXCL10, CXCL9, GZMA, GZMB, IFNG, PRF1, TBX2, and TNF (immune-activity-relevant signatures) displayed high expression in cluster C than other subtype suggesting cluster C to have high levels of immune-reactivity. We also used the WGCNA analysis to confirm the cluster C correlated red module with 111 genes. The LASSO and SVMRFE algorithms confirmed that the key gene, KLRC3, is correlated with the CR/PR status with a better prognosis.
Based on our findings in this study, the cell infiltration landscapes and specific alterations in the mutational aspects, transcriptome profile, and biological functions can have a huge impact on improving immunotherapy. The KLRC3 can be considered as a suitable gene for prognosis and correlated with the SD/PD and CR/PR patients.

Author Contributions
ZL conceived and designed the study, obtained funding, and drafted the manuscript. ZL and BD acquired the data and drafted the manuscript. KM critically revised the manuscript. BD and KM performed the statistical analysis and technical support. All authors contributed to the article and approved the submitted version.      Table S1: The differential gene expressions between high and low immune infiltrations.