A 15-DNA repair gene prognostic signature for immunotherapy in gastric cancer

Gastric cancer is a heterogeneous group of diseases with only a fraction of patients responds to immunotherapy. The relationships between tumor DNA damage response, the immune system and immunotherapy have recently attracted attention. Accumulating evidence indicate that DNA repair landscape is a significant factor in driving response to immune checkpoint blockade (ICB) therapy. In this study, to explore new prognostic and predictive biomarkers for gastric cancer patients who are sensitive and responsible to immunotherapy, we developed a novel 15-DNA repair gene signature (DRGS) and its related scoring system and evaluated the efficiency of DRGS in discriminating different molecular and immune characteristics and therapeutic outcomes of gastric adenocarcinoma. The results showed that DRGS high score patients showed significantly better therapeutic outcomes compared to DRGS low score patients ( P < 0.001). Integrated analysis of multi-omics data demonstrated that the patients with high DRGS score were characteristic of high levels of anti-tumor lymphocytes infiltration, tumor mutation burden (TMB) and PD-L1 expression, and these patients exhibited a longer overall survival and may benefit more from ICB therapy, as compared to the low-score patients. Therefore, the DRGS and its scoring system may have implications in tailoring immunotherapy in gastric cancers.


Introduction
Gastric cancer is one of the most common cancer worldwide, with more than one million new cases in 2020, ranking fifth in incidence and third in mortality worldwide [1]. Up to now the prognosis of this disease remains dismal. Gastric cancer is a heterogeneous group of diseases with variable responsiveness to treatment such as chemotherapy and immunotherapy, and new biomarkers are needed to identify patients with gastric cancer for sensitivity toward such therapies.
Chemotherapy is used as standard treatment, either as preoperative or postoperative therapy, at all pathological stages of gastric cancer. Some regimens showed better 5-year survival rate (OS) or relapse-free survival (RFS) in certain patients [2,3]. However, some regimens demonstrated considerable toxicity and mortality in gastric patients.
In recent years, the approval of multiple immune checkpoint inhibitors (ICIs) and the promising results from tumor immune clinical trials have led to the development of tumor immunotherapy [2,3]. In ICI therapies, anti-PD-1 and anti-PD-L1 antibodies and anti-CTLA4 antibodies are highly effective in patients with microsatellite instability-high (MSI-H) subtype or high expression of PD-L1 [4,5]. Most recently, the effect of immunotherapy combined with chemotherapy or targeted therapy on gastric cancer has been studied. For example, in CheckMate-649, the largest global randomized phase III clinical study in gastric cancer [6], nivolumab plus chemotherapy as first-line treatment for previously untreated and advanced gastric cancer demonstrated superior overall survival (OS) and relapse-free survival (RFS) compared to chemotherapy alone, reducing the risk of death by 20%. Some conflicting results have also been obtained which may be due to different chemotherapy regimens and heterogeneity of patients. In summary, even though its impact on the outcome of gastric cancer has not been clearly defined, immunotherapy plays an important role in the treatment of gastric cancer, and combinational therapy is becoming a new trend.
The association between genetic instability or DNA repair defects with tumor susceptibility to immunotherapy has been observed in certain types of tumors. DNA damage and genomic instability have been found to affect the anti-tumor immune response. There are several major repair pathways for the removal of different types of exogenous and endogenous DNA lesions, including direct reversal, base excision repair (BER), nucleotide excision repair (NER), mismatch repair (MMR) and doublestrand break (DSB) repair that includes homologous recombination (HR), nonhomologous end joining (NHEJ) and the Fanconi anemia pathway [7][8][9]. Deficiency in such repair is associated with reduced DNA repair capacity and increased genetic instability, thus promoting cancer development, or providing an opportunity/benefit for cancer therapy, as the efficacy of certain anticancer drugs or therapies is highly influenced by cellular DNA repair capacity, for example, small-molecule inhibitors of DNA repair have been combined with conventional chemotherapy drugs [10]. As mentioned above, the noticeable effectivity of ICIs in cancers with MMR-deficient and its characteristic genetic signature MSI has recently been discovered, and a recent large-scale analysis showed that mutations of mismatch repair genes and DNA polymerases account for 13.5% of high TMB tumors [11,12]. There is also evidence of potential implications for immunotherapy in cancers with homologous recombination deficiency (HRD) and somatic changes in the NER pathway [13].
Taken together, accumulating evidence suggest that a systematic understanding of DNA repair landscape in tumor may help assess the tumor susceptibility to immunotherapy.
There are still many challenges in tumor immunotherapy, such as only small portion of tumors exhibiting an effect, the overall clinical response rate is low, and the recurrence after treatment. These prompted searching for biomarker strategies for immunotherapy, such as those based on tumor-infiltrating lymphocytes, TMB, deficient MMR, immune gene signatures, and multiplex immunohistochemistry [14].
More recently, a role for DNA repair in selection of patients for immunotherapy has been emerged. However, no repair gene signature(s) has been developed so far. In this study, we aimed to explore the prognostic role of DNA repair genes expression in relation to predict response and better select patients for immunotherapy in gastric cancer. We identified a 15-DNA repair gene signature (DRGS) that is associated with the prognosis of gastric cancer and developed a scoring system that predicts response and better select patients for immunotherapy. In addition, the association of the DRGS score with the molecular and immune profiles were further investigated. The results suggest that our DRGS is a promising biomarker for tailoring immunotherapy.

Datasets used in this study
The next-generation sequencing data and clinicopathological information for gastric cancer were downloaded from The Cancer Genome Atlas Stomach Adenocarcinoma (TCGA-STAD) data collection (https://portal.gdc.cancer.gov). The microarray data (GSE30727) were downloaded from the GEO database (https://www.ncbi.nlm.nih.gov/geo/geo2r/?acc=GSE30727). We also employed a dataset with immunotherapy information (R package "IMvigor210") to evaluate the prediction efficiency of our scoring system [15] Before the analysis, we applied the Z-score normalization to the expression level of 15 genes from each sample. The principal component analysis (PCA) is a proven technique, which can reduce dimension, improve interpretability and minimize information loss of large datasets at the same time. We employed the PCA as a dimensionality reduction method to obtain the 1st dimension correlation coefficients of the 15 genes ( Figure S1).
The formula for the DRGS scoring system was as follow:

Comparison of the immunocyte infiltration
To determine the relationship between this DRGS scoring system and immunocyte infiltration in the gastric cancer tissue, we performed the immunocyte infiltration analysis through the online platform ImmuCellAI (http://bioinfo.life.hust.edu.cn/ImmuCellAI#!/) using TCGA-STAD samples [17].
Then we calculated DRGS scores from the same TCGA-STAD samples and used the median score as the cut-point to divide these samples into high-score subtype and low-score subtype. Mann Whitney U test was employed to compare the multiple immunocyte infiltration results between these two subtype groups. We applied the hierarchical clustering to the immunocyte whose infiltration score were significantly different between two groups. Mann Whitney U test was used to compare the infiltration score between high-score subtype and low-score subtype. We also used ImmuCellAI to predict the patient response to ICB therapy.

Comprehensive analysis of histologic and molecular characteristic
Samples from the two groups mentioned above were also used to explore the relationship between the DRGS score and various tumor characteristics. We compared the Lauren classification, molecular subtype and MSI status using chisquare test. Tumor mutation levels were calculated through R package "Maftools" [18]. We used Mann Whitney U test to compare TMB and PD-L1 expression between the two groups. In light that the TMB and PD-L1 expression are strongly related to the ICI therapy, we performed the spearman correlation to analyze the relationship between the DRGS score and the above two immunotherapeutic biomarkers.

Relationship between the DRGS score and immunotherapy outcomes
The public dataset IMvigor210 contains a total of 348 advanced urothelial bladder cancer patients who received atezolizumab treatment. We applied DRGS to the dataset to investigate its predictive ability of immunotherapy. We performed normalization and PCA analysis with the same methodology as mentioned above, and all the 348 patients were divided into two equal groups according to their DRGS score. We then compared their response to immunotherapy and the immunotypes of tumor by the Chi-square test. Also, the Kaplan-Meier analysis of OS within the two groups was performed, and log-rank test was used to compare survival curves.

Prognostic ability of DNA repair score
We also analyzed the prognostic ability of the DRGS scoring system in TCGA-STAD samples, through Kaplan-Meier analysis of OS and the log-rank test for comparison. The independence of DRGS score system was verified by univariate and multivariate Cox regression. Independent gastric cancer datasets (GSE26901, GSE15459 and GSE26899) were served as validation sets and analyzed with the same methodology described above.

Association of the DRGS score with drug sensitivity
We downloaded the drugs sensitivity data combined with the RNA-seq data from 23 types of GC cells. We performed the Pearson correlation analysis to calculate the correlation between the drug sensitivity level of different chemotherapeutic agents and DNA repair score. The criteria for significant correlation are p value < 0.05 and correlation > 0.4.

Statistical analysis
Statistical methods were described in different sections above. The significant level of these statistic tests was two-sided p-value < 0.05. The software used for analysis is R (version 4.0.5).

Identification of a 15-DNA repair gene signature (DRGS) and construction of the scoring system
The 219 DNA repair genes were acquired from the website Human DNA Repair Genes. Next, we used DEseq2 package of R to process the raw data from TCGA-STAD and employed the online platform GEO2R to find those differentially expressed genes (adjusted p-value < 0.05, |log2FC| > 1) between tumor and normal tissues. Finally, we identified 15 common genes among the 219 DNA repair gene, which were differentially expressed in both TCGA-STAD and GSE30727 ( Figure 1A, Table 1). We then employed the PCA analysis as a dimensionality reduction method to obtain the first dimension correlation coefficients of 15 genes ( Figure 1B). Based on these coefficients, we created a scoring system for the 15-DNA repair gene signature (DRGS) (details refer to the method). Then patients were divided into two group using the DRGS score (high and low score groups).

Immune characteristics of different DRGS score groups
We identified immune characteristics of 375 TCGA-STAD samples by  (Figure 2A, Figure 3A). In our study, infiltration score of samples in DRGS low score group was significantly higher than those in high score group (P < 0.001, Mann-Whitney U test, Figure 3B). Furthermore, we used ImmuCellAI to predict the response of ICB therapy and found that there were 53.5% of patients in DRGS high score group and fewer patients (37.8%) in DRGS low score group, in response to ICB therapy (P = 0.003, chi-square test, Figure 3C).

Relationship between DRGS score groups and other histologic and molecular classifications
To examine the relationship between score grouping and other histologic and  [19]. In our study, DRGS high score group had a higher percentage of EBV (11.2%) and MSI (27.8%) subtype samples. However, DRGS low score group had a higher percentage of GS (23.1%) and CIN (63.9%) subtype samples (P < 0.001, chi-square test, Figure 4B).

Figure 3.
Immune characteristics of the DRGS score subgroups. A. The hierarchical clustering of various types of immunocytes whose infiltration scores were significantly different between the two score groups; B. The overall infiltration score of low-score group was significantly higher than the high-score group; C. ImmuCellAI was used to predict the response of immune checkpoint blockade (ICB) therapy. R: response; NR: non response. (*p < 0.05, ** p < 0.01, *** p < 0.001).
On the basis of the frequency of mutations within microsatellite markers, gastric cancer can be classified as MSS, MSI-H and MSI-L [20,21]. In our study, the percentage of MSI-H subtype patients (31.7%) was higher in DRGS high score group.

Prognostic ability of the DRGS score
To investigate whether the prognostic impact of the DRGS scoring system is independent of clinical factors that could be associated with clinical outcomes, we first performed univariate Cox regression analysis on all available clinical parameters in the TCGA-STAD dataset (n = 372). Clinical parameters that demonstrated significant prognostic impact (P < 0.05 by Wald test) were selected for multivariate Cox regression analysis along with the DRGS. The result indicated that DRGS is an independent prognostic factor (P = 0.038, Table 2). Taking the median score as the cut-off value, patients with DRGS high score had a better OS than these with DRGS low score (p = 0.007, log-rank test, Figure 5A). Then, the prognostic value of scoring system was validated using the GSE26901 (n = 109), GSE15459 (n = 192) and GSE26899(n = 93) GC dataset. As showed in Figure 5, the patients of GSE26901 in the DRGS high score group had a significantly better prognosis than those in the DRGS low score group (p = 0.011, log-rank test, Figure 5B), consistent with the result of the TCGA dataset. However, there were no significant OS difference in GSE15459 (P = 0.77, Figure 5C) or GSE26899 (P = 0.25, Figure 5D) between the two groups.

Molecular characteristics of different DRGS score groups
To identify the relationship between TMB and the DRGS score, we analyzed the somatic mutation data of 365 stomach adenocarcinoma samples from TCGA-STAD.
The result indicated that the DRGS score was positively associated with TMB (rho = 0.50, P < 0.001, Figure 6A). TMB of samples in DRGS high score group was significantly higher than that in DRGS low score group (P < 0.001, Mann-Whitney U test, Figure 6B). In summary, all mutation events were divided into different categories, among which missense mutations accounted for the largest proportion in the classification of variation ( Figure S2A, B), SNPs occurred much more frequently than insertion or deletion, and C>T was the most general of single nucleotide variants (SNV) in all STAD samples. Notably, median variant per sample of DRGS high score group (121) was higher than that of DRGS low score group (59.5). Furthermore, mutation events of the top 10 mutated genes in each sample were shown in the waterfall plot. Among DRGS high score group, the mutation frequency of TTN was the highest (59%), followed by TP53 (54%) and MUC16 (39%, Figure S2B). Among DRGS low score group, the highest mutation frequency gene was TP53 (34%), followed by TTN (33%) and MUC16 (21%, Figure S2A).

Figure 5.
Prognostic ability of the DRGS score system using various publicly available gastric cancer datasets. A. Kaplan-Meier analysis and the log-rank test of TCGA-STAD samples to compare overall survival of 2 score subtypes; B. An independent gastric cancer dataset GSE26901, served as a validation set, was analyzed with the same methodology described above. C. the dataset GSE15459 was used as a validation set; GSE26899 was used as a validation set.
PD-L1 expression is another important biomarker of susceptibility to PD-1 blockade. We found that PD-L1 expression in DRGS high score group was significantly higher than in DRGS low score group (P = 0.002, Figure 6D). The DRGS score is positively correlated to PD-L1 expression (rho = 0.17, P < 0.001, Figure 6C).

The benefit of immunotherapy with PD-L1 blocker in different DRGS score groups
In view of the association between our score system and the immune microenvironment of the tumor, as described above, we tested the ability of the DRGS score system to predict the response of patients to immunotherapy. This analysis was based on IMvigor210 cohort, a large phase 2 trial investigating the clinical activity of PD-L1 blockade with atezolizumab in metastatic urothelial carcinoma (mUC). We found that DRGS high score subtype patients exhibited prominent prolonged overall survival (P = 0.021, Figure 7A). Also, the 348 patients in the IMvigor210 cohort exhibited different degrees of response to anti-PD-L1 blocker.

Figure 7.
The benefit of immunotherapy with PD-L1 blocker in the two DRGS score subgroups. A. 348 patients in IMvigor210 were divided into two groups equally according to their DRGS scores. Kaplan-Meier analysis of overall survival within the two groups was performed and the survival curves were compared using the log-rank test; B. Comparison of the clinical response to immunotherapy; C. Comparison of the three immunotypes between two DRGS subtypes of patients using the Chi-square test.
We also analyzed the three immune subtypes of IMvigor210 (immune inflamed, desert. We found that the DRGS high score patients had a significantly higher percentage of "immune inflamed" tumors and a lower percentage of "immune desert" and "immune excluded" tumors (P = 0.012, chi-square test, Figure 7C).

Potential therapeutic value of the DRGS score
To explore the effect of the DRGS score system on drug response, we evaluated the association between the DRGS score and the response to drugs in gastric cancer cell lines. We identified 5 significantly correlated pairs between the score and the  kinase ULK1_4989 (Rs= -0.47, p = 0.043) and AKT inhibitor uprosertib (Rs= -0.50, p = 0.021)( Figure 8A). In addition, we found that drugs were mostly targeting PI3K/MTOR, IGF1R and Apoptosis regulation signaling pathways ( Figure 8B).

Discussion
In the emerging immunotherapy strategies such as the ICB therapy in various types of cancer [22][23][24], the interaction of tumor DNA damage and repair landscape with the immune system and related therapy has recently been revealed as a complex biological process [2,13,25]. Nevertheless, many prognostic and predictive biomarkers are being evaluated clinically to identify criteria for establishing customized therapeutic approaches. This has been verified in cancers with MMR deficiency which leads to microsatellite instability (MSI) phenotype, in which ICIs were found to be higher effective [12].
In this study, taking into the known association of DNA damage and repair with immunotherapy response, we identified a 15-DNA repair gene signature (DRGS) using the gene expression data in TCGA-STAD database and developed the score system with the median score as the cut-point to divide gastric cancer patients into high and low score subtypes. We assessed the capacity of the DRGS score to predict the patient response to ICB therapy using the IMvigor210 cohort. The results demonstrated that DRGS high score patients showed significantly better therapeutic outcomes compared to DRGS low score patients (P < 0.001), suggesting that the DRGS is a promising predictor of patient response to immune therapies.
Of the 15 genes in DRGS, six genes are associated with tolerance and repair of  [25,26]. Accumulating evidence suggests more immunogenic in HR-deficient (HRD) tumors, and more susceptible to checkpoint inhibitors [27]. It should also be noted that in our DRGS high score group, the high expression level of the repair genes does not mean an increased cellular repair capacity, rather, it could well be a damage response following the increased levels of DNA damage inside tumor cells. Therefore, a better understanding of the DNA damage in the tumor will help to know the mechanism of DRGS as a biomarker. In addition, change in gene expression levels and protein levels do not correlate well in most of the cases, mainly due to the regulation control at many different levels.
It is known that the tumor immune microenvironment has great implications in gastric cancer progression and susceptibility to immunotherapy [28]. Given that a multifaceted condition may affect gastric cancer primed for response to immunotherapy, we analyzed several important factors or potential immunotherapy response biomarkers in relation to the two DRGS score groups using TCGA-STAD, including tumor-infiltrating lymphocytes, immune subtypes, TMB, PD-L1 expression and other histologic and molecular classifications (detailed see results). In TCGA-STAD samples analyzed by ImmuCellAI, we found that anti-tumor cells including γδ T cells, neutrophils and Th1 cells were more abundant in the DRGS high score samples, while several types of immunosuppressive or tumor enhancement cells were more abundant in the low-score patients. Next, we analyzed the somatic mutation and PD-L1 expression data in TCGA-STAD and found that both were positively associated with the score, suggesting that DRGS high score patients are more likely to respond to immunotherapy. We also analyzed the three immune subtypes of IMvigor210 and found that the DRGS high score patients were associated with a significantly higher percentage of "immune inflamed" subtype, which has been shown to be infiltrated by a number of subtypes of immune cells [29]. Finally, DRGS high score group had a higher percentage of MSI-H phenotype based on TCGA subtype classifications. MSI-H is characterized by better survival [30] and a high number of microsatellite mutations [21]. Recent clinical trials have confirmed that MSI-positive gastric tumors are sensitive to ICB therapy [31]. In summary, our high-score patients are characterized by "immune inflamed", "intestinal", and "MSI-H" phenotypes, which indicate an abundant immune cell infiltration, high expression of PD-L1, and high number of microsatellite mutations. All of which are associated with higher sensitivity and better response to immunotherapy.
There were also limitations in this work. In the study we used the data from a different type of tumor (IMvigor210 cohort on locally advanced or metastatic urothelial bladder cancer) to evaluate the ability of the DRGS score to predict PD-L1 blockade-based immunotherapy response, as a lack of such dataset in gastric cancer.
Future studies will use data from the gastric cancer patients, including those from our own hospital.

Conclusion
The DRGS is a promising predictor of patient response to immune-directed therapies. Further studies are needed to evaluate the clinical utility of DRGS for tailoring immunotherapy. Figure S1. PCA analysis was used as a dimensionality reduction method to obtain the 1st dimension correlation coefficients of 15 genes in IMvigor210, GSE26901, GSE15459, GSE26899 and GC cell lines (from A to E).