RAD51 may regulate the expression of genes involved in autophagy through interaction with E- box binding proteins in cancer cell lines

RAD51 is a recombinase that plays a pivotal role in homologous recombination. Although the role of RAD51 in homologous recombination has been extensively studied, it is unclear whether RAD51 can be involved in gene regulation as a co-factor. In this study, we found in silico evidence that RAD51 may contribute to the regulation of genes involved in the autophagy pathway through interaction with E-box proteins such as USF1, USF2, and/or MITF in GM12878, HepG2, K562, and MCF-7 cell lines. The canonical USF binding motif (CACGTG) was significantly identified at RAD51 binding sites in all four cell lines. In addition, genome-wide USF1, USF2, and/or MITF-binding regions significantly coincided with the RAD51-binding sites in the same cell line. Interestingly, the promoters of genes associated with the autophagy pathway were significantly occupied by RAD51 in all four cell lines. Taken together, these results predicted a novel role of RAD51 that had not been addressed previously, and provided evidence that RAD51 could possibly be involved in regulating genes associated with the autophagy pathway, through interaction with E-box binding proteins.


Introduction
RAD51 plays a crucial role in homologous recombination (HR) during DNA double-strand break (DSB) repair [1][2][3][4]. RAD51 catalyzes the homology search and ATP-dependent DNA strand exchange of the bound single DNA strand with the complementary strand within the duplex. In addition to its well-known role as a recombinase in DSB repair, it has also been studied in a nonenzymatic rold in multiple processes in response to replication stress [5]. RAD51 promotes the reversal of the replication fork. It also protects nascent DNA at regressed forks from degradation. Recently, a new role for the immune response of RAD51 has been suggested [6]. Inhibition of RAD51 results in the accumulation of self-DNA in the cytosol, which in turn increases the STING-dependent innate immune response. Since these various roles of RAD51 in DNA damage are closely related to genome stability, the normal cell cycle, and immunity, RAD51 has been considered a promising therapeutic target for various cancers including lung, breast, and squamous cell carcinoma [7].
RAD51 is highly expressed in various cancers including breast, lung, and pancreatic cancers [8][9][10]. It is also known to interact directly with tumor suppressor proteins, such as BRCA1 and BRCA2, which play important roles in DNA repair in response to damage [11]. Indeed, BRCA2 is known to control the DNA-binding ability and intracellular localization of RAD51 [11]. Over-expressed RAD51 causes increased survival of tumor cells and resistance to DNA damaging treatments such as radiotherapy and chemotherapy [12]. The hyperactivation of RAD51 contributes to increased homologous recombination associated with tumor progression and metastasis [7]. For example, in ER-positive breast cancer, elevated RAD51 expression is related to resistance to neoadjuvant endocrine therapy including aromatase inhibitor, and the poor survival of patients [13]. Intriguingly, this role of RAD51 is also related to the function of autophagy in cancers. Although autophagy seems to have a tumor-suppressive role in normal cells, it provides resistance to DNA repair targeted cancer therapy through the inhibition of genomic instability [14,15]. Mechanistically, a loss of autophagy leads to elevated p62, the ubiquitin-binding protein. Accumulated nuclear p62 suppresses HRassociated DSB repair through the proteasomal degradation of RAD51 and FLNA [16,17]. In proteinprotein interactions, one study suggested that the relationship between RAD51 and autophagy is associated with checkpoint kinase 1 (CHK1) in an esophageal cancer model by inhibiting autophagy [18]. Thus, in several cancers, the inhibition of autophagy promotes DNA damage and a susceptibility to cancer treatment by the down-regulation of RAD51 [19][20][21]. On the contrary, one study reported that inducing autophagy reduces RAD51 expression in non-small cell lung cancer (NSCLC) cell models [22]. Taken together with the case studies above, the RAD51-autophagy interaction seems to have the nature of a two-faced role of autophagy in relation to cancers [23], where a different point of view may be required to understand the RAD51-autophage axis. In short, it has been studied that autophagy regulates RAD51 protein, but the precise molecular mechanisms underlying how RAD51 regulates autophagy in cancers are still elusive.
Recent two investigations into the molecular mechanism of RAD51 have been extended to the transcriptional level through a genome-wide approach, using chromatin immunoprecipitation followed by sequencing (ChIP-seq). First, RAD51 was recruited to the active chromatin regions, which are preferentially repaired by HR [24]. These sites show the transcriptional elongationassociated histone mark, histone 3 trimethylated lysine 36 (H3K36me3). Second, the genome-wide mapping of DSBs shows that the co-occupancy of RAD51, together with the transcription factor, TEAD4 at oncogenic super-enhancers is associated with the overexpression of oncogenes [25]. Despite these studies, the exact mechanisms by which RAD51 regulates gene expression are not fully understood.
In this study, we investigated the novel role of RAD51 in regulating genes at the transcriptional level, by reanalyzing RAD51 ChIP-seq data in multiple cancer cell lines, such as GM12878, HepG2, K562, and MCF-7. We found that genome-wide RAD51 bindings were more strongly enriched at the active promoters with upstream stimulating factor (USF1 and USF2) motifs. Deeper analysis revealed that RAD51 co-occupied with E-box proteins such as USF1, USF2, and/or MITF, especially on the genes associated with autophagy and lysosome. Our findings demonstrated an unrecognized function of RAD51 in transcriptional regulation of genes, which is related to autophagy in cancers.

ChIP-seq data analysis
ChIP-seq data was downloaded from the ENCODE database (https://www.encodeproject.org/). The accession numbers for all samples can be found in Table S1. Raw files (FASTQ) were trimmed by quality, using Trim Galore (https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/, version 0.6.4) with default parameters. Trimmed reads were aligned to the human genome (hg38 genome assembly) using Bowtie2 (version 2.3.4.1) with default parameters [26]. Duplicates of mapped reads were removed using Sambamba (version 0.6.7) with default parameters [27]. Peaks were identified using HOMER (version 4.11.1; findPeaks) [28] with the corresponding control sample (Table S1). A false discovery rate (FDR)-adjusted p-value cutoff of 0.001 was used for the analysis. To further identify reliable peaks, those peaks that did not overlap at least 50 % of regions between the biological replicates were discarded, using Bedtools (version 2.26.0) [29].

DNA binding motif analysis
Motifs, which were significantly associated with given binding sites (150 bp), were identified using HOMER (findMotifsGenome.pl) with the following parameters: -size 150 -mask.

Gene ontology analysis
Gene ontology (GO) analysis was conducted using Metascape (http://metascape.org/) [30] with the genes closest to the top 500 peaks (descending order of RPKM values in Table S2) in each cell line.

Analysis of genome-wide RAD51 binding sites in GM12878, HepG2, K562, and MCF-7 cell lines
Although RAD51 plays a central role in HR, recent studies using chromatin immunoprecipitation followed by sequencing (ChIP-seq) revealed that RAD51 binds to transcriptionally active sites [24,25], suggesting a potential role for RAD51 in gene regulation. To investigate a novel role of RAD51 in regulating genes, we reanalyzed RAD51 ChIP-seq data deposited in the encyclopedia of DNA elements (ENCODE) database (https://www.encodeproject.org/) [35]. The accession number of all samples used in this study can be found in Table S1. Two biological replicates of RAD51 and the corresponding control ChIP-seq performed on GM12878, HepG2, K562, and MCF-7 cell lines were analyzed as described previously [36]. RAD51 binding sites (peaks) in each cell line were identified using HOMER with a false discovery rate (FDR)-adjusted p-value cutoff of 0.001. To further filter out non-reliable RAD51 binding sites, we discarded the peaks that did not overlap between biological replicates. The analysis revealed that the characteristic of genome-wide RAD51 binding was surprisingly similar to those of typical transcription factors. For example, there were totals of 5137, 2611, 7192, and 3498 RAD51 binding sites in GM12878, HepG2, K562, and MCF-7 cell lines, respectively (Table S2), and up to 44% and 40% of RAD51 binding sites were located in the intergenic and promoter regions, respectively ( Figure 1a). This result show that the majority of genome-wide RAD51 binding sites coincided with promoters, which are the most important regulatory elements for gene regulation, suggesting that RAD51 may be involved in gene regulatory mechanisms. In addition, the average enrichment of RAD51 to their binding sites clearly showed a sharp peak-like shape at the center of the peaks (Figure 1b), which is typically observed in those of transcription factors as shown, for example, in the YY1 transcription factor (Figure 1c). Manual investigation of RAD51 binding sites also indicated that RAD51 could participate in regulating nearby genes. For instance, the promoter regions of the GET4 and CLN3 genes were significantly occupied by RAD51 in the cells of all four cell lines (Figure 1d). Accordingly, active histone modifications, such as H3K4me3 and H3K27ac, were markedly enriched around the binding sites. To further gain biological insights into RAD51, regarding gene regulation, the RAD51 binding sites were categorized into two groups (promoter-associated or enhancer-associated), depending on the distance to the transcription start sites (TSSs) of known genes. The promoter-associated RAD51 group was defined as RAD51 binding sites within the 2-kb flanking regions of the TSSs, while the enhancerassociated RAD51 group contained the rest of the RAD51 binding sites that did not belong to the promoter-associated group. Intriguingly, promoter-associated RAD51 peaks showed stronger RAD51 enrichment than enhancer-associated RAD51 binding sites, and expression levels of the genes with the promoter-associated RAD51 binding sites were significantly higher than those with the enhancer-associated RAD51 peaks ( Figure S1). Collectively, these results provided evidence that RAD51 may be involved in gene regulation, possibly through the mechanism similar to those of transcription factors. This unexpected role of RAD51 has not been addressed previously, due to the well-known enzymatic role of RAD51 in homologous recombination. To discover DNA binding motifs recognized preferentially by RAD51, motif analysis was performed on the RAD51 peaks identified in GM12878, HepG2, K562, and MCF-7 cell lines. HOMER was used to assess over-represented motifs in the flanking (150 bp) regions relative to the center of the RAD51 peaks. Surprisingly, the known binding motifs for the upstream stimulating factors (USF1 and USF2), followed by CLOCK, MITF, bHLHE40, and TFE3 were significantly over-represented at the RAD51 binding sites in all four cell lines (Figure 2a). For example, the USF2 motif was found significantly in RAD51 binding sites (73.5%) in GM12878 cells with a p-value of 1.0 x 10 -1214 . Notably, all of these transcription factors are known to bind to a consensus DNA element called an enhancer box (E-box), which harbors the palindromic DNA sequence "CACGTG", and these E-box motifs were highly specific for RAD51 binding since motifs found at H3K4me3-enriched regions (active histone modification) in the same cell line showed much less significant than the E-box motifs in terms of the p-value ( Figure S2). In addition, there is no substantial difference in these motifs between the promoter and enhancer regions in all four cell lines (Figure 2a). Accordingly, these results led us to investigate a novel role for RAD51 in gene regulation besides the well-known enzymatic role in homologous recombination. Based on the above results (Figures 1 and 2), we hypothesized that RAD51 could participate in gene regulation by binding to the target site through interaction with Ebox binding proteins. To verify the hypothesis, we first reanalyzed available USF1, USF2, CLOCK, MITF, bHLHE40, and/or TFE3 ChIP-seq data in the ENCODE database to pinpoint which E-box binding transcription factors predominantly coincided with the RAD51 binding sites. Unfortunately, some of the E-box binding protein ChIP-seq data were not available. Therefore, only those available ChIP-seq data were reanalyzed for the analysis. The results showed that most RAD51 binding sites coincided with USF1 and/or USF2 in GM12878, HepG2, and K562 cell lines (Figure 2b). For instance, 52%, 69%, and 31% of RAD51 binding sites in GM12878, HepG2, and K562 cell lines, respectively, overlapped with USF2 peaks. Reciprocally, 48%, 29%, and 83% of USF2 binding sites in GM12878, HepG2, and K562 cell lines, respectively, coincided with RAD51 peaks. Nearly similar results were obtained between RAD51 and USF1. Hierarchical clustering, based on the percentage of overlaps between the given proteins indicated that RAD51, USF1, and USF2 were well-clustered among all Ebox proteins reanalyzed in this study. This is an unexpected result because most, if not all, RAD51 studies never reported the co-localization of RAD51 and USF factors. To ascertain the finding, enrichment heatmaps were drawn, based on the binding sites of each protein, along with all available samples ( Figure S3). The results confirmed that the RAD51 binding sites mostly coincided with USF1, and USF2 binding sites, followed by MITF and TFE3 peaks. In contrast, the binding sites of proteins known to be associated with RAD51 during homologous recombination, such as BRCA1, TEAD3, and TEAD4, rarely overlapped with the RAD51 binding sites. Overall, these results indicated that RAD51 could be involved in gene regulation by interacting with E-box binding proteins such as USF1, USF2, and/or MITF, which has not been addressed previously. Heatmaps show the percentage of co-occupancy between given proteins. The percentage was calculated by dividing the number of co-occupied sites by the total number of binding sites on a given protein (row). Hierarchical clustering based on the co-occupancy percentage was conducted using the average-linkage method with the one-minus Pearson correlation metric.

RAD51 and autophagy
Our comprehensive analysis of genome-wide RAD51 binding sites revealed that those in GM12878, HepG2, and K562 cells significantly coincided with E-box binding proteins such as USF1 and USF2 (Figure 2b). The motif analysis (Figure 2a) also indicated that USF1 and USF2 motifs occurred significantly in those RAD51 binding sites. Thus, these results collectively indicate that RAD51 might be involved in gene regulation by binding indirectly to regulatory elements in the genome through interaction with E-box binding proteins. To assess the potential biological pathway that the RAD51-mediated gene regulation could contribute, gene ontology (GO) analysis was performed with the genes whose promoters were bound by the top 500 RAD51 peaks (in descending order of normalized mapped reads) (Table S2). Previously, this in silico analysis strategy successfully identified the known FOXM1-mediated pathway, such as the cell cycle, in various cancer cell lines [36]. The result showed that autophagy and lysosome pathways were the main two pathways significantly associated with the top 500 RAD51 binding sites in all four cancer cell lines (Figure 3a). This is an unexpected finding, predicting a gene regulatory role for RAD51 in the autophage pathway, as most RAD51 studies, to the best of our knowledge, have focused on the enzymatic role of RAD51 in homologous recombination. These cancer cell lines are all derived from different types of cancer. Nevertheless, the genes related to the autophagy pathway were commonly identified in all four cell lines as RAD51 target genes. Therefore, this finding suggests that the role of RAD51 in regulating the autophagy pathway might be common in at least some cancer types. To further identify a core gene set, genes associated with the top 500 RAD51 binding sites in all four cell lines were intersected (Figure 3b). As expected, three-fifths of the RAD51 target genes (292 genes) were shared in GM12878, HepG2, K562, and MCF-7 cell lines (Table S3). Gene ontology analysis of the core gene set verified that the autophagy pathway was the most significantly associated pathway with these common genes ( Figure 3c). Next, we determined whether the expression level of RAD51 was indeed associated with clinical features of various cancer patients. Intriguingly, the expression of the RAD51 gene was significantly upregulated in most cancers, as compared to corresponding normal controls. For example, among 31 types of cancer, RAD51 was significantly over-expressed in 21 cancers. In contrast, its expression was significantly reduced only in acute myeloid leukemia (LAML) (Figure 4a). Expression levels of the RAD51 gene were also significantly associated with the prognosis of patients in some cancers. In breast invasive carcinoma (BRCA) and liver hepatocellular carcinoma (LIHC), patients showing higher expression of the RAD51 gene showed significantly poor prognoses (Figure 4b), as compared to the other group. Collectively, these results suggested that RAD51 may contribute to the regulation of the autophagy pathway in GM12878, HepG2, K562, and MCF-7 cells, and the expression level of the RAD51 gene could at least be used to predict outcomes for breast and liver cancer patients. Further experimental and clinical evaluation regarding RAD51 is needed to verify the predicted RAD51-mediated gene regulation and associated clinical characteristics. Figure 3. Predicted biological pathways associated with the top 500 RAD51 binding sites. (a) Gene ontology analysis was performed using Metascape [30] with the nearest genes of the top 500 RAD51 binding sites. (b) The Venn diagram shows the number of overlapping RAD51 target genes between cell lines. (c) Biological pathways that were related to common RAD51 target genes were predicted.

Discussion
Since RAD51 plays a pivotal role in homologous recombination, almost all studies regarding RAD51 have focused primarily on the role of RAD51 in homologous recombination. This is because RAD51 is a recombinase that filaments on double-and single-stranded DNA [37], in order to mediate strand exchange during homologous recombination [38,39]. When DNA double-strand breaks (DSBs) occur, either nonhomologous end-joining (NHEJ) or homologous recombination (HR) are activated to repair them [40]. RAD51 plays a key role in HR, while not involved in NHEJ. Interestingly, several recent studies have reported that the disruption of autophagy leads to the impairment of DNA repair [41][42][43]. In addition, loss of autophagy has little impact on NHEJ, whereas HR is greatly diminished in cells lacking Atg7 [43]. However, the mechanistic link between homologous recombination and autophagy is still unclear. In this study, we provide some evidence that RAD51 may be involved in regulating genes related to the autophagy pathway, in cooperation with E-box binding proteins such as USF1 and USF2. First, RAD51 was found to bind to up to several thousand target sites on the genome in GM12878, HepG2, K562, and MCF-7 cells (Figure 1). Motif analysis based on these RAD51 binding sites in the four cancer cell lines indicated that RAD51 binding did not occur on random sites of the genome, but did bind significantly on regulatory elements with the cognate E-box motif (CACGTG) in all four cell lines (Figure 2 and S2). This prediction was confirmed by the integrative analysis of ChIP-seq data of E-box binding proteins in the same cell line (Figures 2b and S3). Of the E-box binding proteins, USF1 and USF2 were the most prominent E-box binding transcription factors predicted to interact with RAD51. However, adequate experimental validation is needed to support this prediction. Second, we found that the promoters of a specific gene set were significantly occupied by RAD51 in four different cancer cell lines (Figures 1, 2, S2, and S3). This means that there may be a conserved gene regulatory circuit in which RAD51 and E-box binding proteins are involved, and that activating this regulatory circuit is beneficial to cancer cells. This concept is partially supported by the fact that almost all cancer tissues showed an over-expression of RAD51, as compared to the corresponding normal tissues (Figure 4). Intriguingly, one of the predicted RAD51-mediated pathways turned out to be the autophagy pathway (Figure 3). This is the first report proposing a novel role of RAD51 as a transcriptional co-factor, capable of controlling genes that contribute to the autophagy pathway in cancer cells. The exact mechanism that RAD51 contributes is still unclear, but this unexpected finding underpins recent studies that there may be a mechanistic link between homologous recombination and autophagy, despite their occurrence in spatially-distinct cellular compartments [17,44].
The paradoxical role of autophagy in cancer cells has been documented. For example, the role of autophagy in cancer initiation and epithelial-mesenchymal transition (EMT) is antitumoral, while its role in the growth of the primary tumor and anoikis resistance is protumoral [45]. Thus, the role of autophagy in cancer cells is context-dependent. Based on the RAD51-mediated molecular mechanism predicted in this study, possibly linking homologous recombination and autophagy, we propose the following hypothesis that should be validated in the near future. Cancer cells generally proliferate abnormally and survive for longer than a normal cell's life span [46]. In order for cancer cells to survive under stressful conditions caused by rapid proliferation, the autophagy pathway must be activated [47,48]. In addition, chromosome breaks that spontaneously occur in proliferating cells must be repaired through RAD51 [49]. Based on our results, we propose that RAD51 is involved in these two processes. In particular, our results suggest that RAD51 may modulate the expression of genes that are important for autophagy by interacting with E-box binding proteins such as USF1, USF2, and/or MITF, while acting on homologous recombination by filamenting on DNA single or double-stranded breaks for DNA repair and replication [50,51]. This hypothesis should be validated further through various experiments. Nevertheless, the present study has the following limitations that need to be addressed experimentally in the near future. Although the identification of genomewide RAD51 binding sites was based on quality-controlled RAD51 ChIP-seq experiments conducted by the ENCODE consortium, the motif and co-occupancy analyses were in silico analyses. In addition, the significant overlap of genomic binding sites between RAD51 and E-box binding proteins only provided evidence of co-localization, not direct interactions between those proteins. However, the results provided novel insights into RAD51 in regulating autophagy-related genes as a co-factor while most, if not all, RAD51 studies have extensively focused on homologous recombination. Further work is required to prove the multiple roles of RAD51 in homologous recombination and autophagy to clarify whether they operate separately or in cooperation in the RAD51-autophagy axis.
Supplementary Materials: The following are available online at www.mdpi.com/xxx/s1, Figure S1: Characteristics of RAD51 binding sites in enhancer and promoter regions, Figure S2: Motif analysis of H3K4me3enriched regions, Figure S3: Comparative analysis of binding sites of RAD51-and E-box-related proteins, Table  S1: ChIP-seq data reanalyzed in this study, Table S2: RAD51 binding sites identified in GM12878, HepG2, K562 or MCF-7 cells., Table S3: Common genes that were predicted to be regulated by RAD51 in all four cell lines. Conflicts of Interest: Y.C. and B.B. are employed by Deargen Inc. K.K1. is one of the co-founders of, and a shareholder in, Deargen Inc.