Molecular Profiles of Prostate Cancer in Men of Middle Eastern Ancestry Identifies Key Differences with Western Populations: A Multiethnic SNP Array Study

Background : Our knowledge of prostate cancer (PCa) genomics mainly reflects European (EUR) and Asian (ASN) populations. Our understanding of the influence of Middle Eastern (ME) and African (AFR) ancestry on the mutational profiles of prostate cancer is limited. Design, setting and participants : To characterize genomic differences between ME, EUR, ASN and AFR ancestry, fluorescent in situ hybridization (FISH) studies for NKX3-1 deletions and MYC amplification was carried out on 42 tumors arising in individuals of ME ancestry. These were supplemented by analysis of copy number profiles of 401 tumors of all ancestries. Outcome measurement and statistical analysis: FISH results of NKX3-1 and MYC were assessed in the ME cohort and compared to other ancestries. Gene level copy number aberrations (CNAs) for each sample were statistically compared between ancestry groups. Results and Limitation: NKX3- 1 deletions by FISH were observed in 17/42 (17.5%) prostate tumors arising in men of ME ancestry, while MYC amplifications were only observed in 1/42 (0.23%). Using CNAs called from arrays, the incidence of NKX3-1 deletions was significantly lower in ME vs . other ancestries (20% vs . 52%; p = 2.3 x 10 -3 ). No differences in MYC amplifications was noted between the two cohorts using array-based CNAs. Across the genome, tumors arising in men of ME ancestry had fewer CNAs than those of other ancestries (p = 0.014). Additionally, the somatic amplification of 21 specific genes was more frequent in tumors arising in men of ME vs . EUR ancestry (Proportions test; Q < 0.05). Those included amplifications in the glutathione S-transferase family on chromosome 1 ( GSTM1 , GSTM2 , GSTM5 ) and the IQ Motif Containing family on chromosome 3 ( IQCF1 , IQCF2 , IQCF13 , IQCF4 , IQCF5 , IQCF6 ). Conclusion: Significant differences in genetic background exists between different ancestries. A Hospital (Al United Arab reflecting ME men, was investigated by FISH for NKX3-1 deletion and MYC amplification. All selected cases had histopathologic diagnosis of prostate adenocarcinoma. The original histopathologic classification was confirmed by one of the study pathologists (A.A, SA or TAB). Formalin-fixed, paraffin-embedded tissue sections were used for interphase fluorescence in situ hybridization (FISH). Deparaffinized tissue was treated with 0.2 mol/L HCl for 10 minutes, 2× SSC for 10 minutes at 80°C and digested with Proteinase K (Invitrogen) for 8 minutes. The tissues and BAC probes were co-denatured for 5 minutes at 94°C and hybridized overnight at 37°C. Post-hybridization washing was with 2× SSC with 0.1% Tween 20 for 5 minutes, and fluorescent detection was done using anti-digoxigenin conjugated to fluorescein (Roche Applied Science, Indianapolis, IN) and streptavidin conjugated to Alexa Fluor 594 (Invitrogen). Slides were counterstained and mounted in ProLong Gold Antifade Reagent with 4′,6-diamidino-2-phenylindole (Invitrogen). Slides were examined using a Leica DMRA fluorescence microscope (Leica, Deerfield, IL) and imaged with a CCD camera using the CytoVision software system (Applied Imaging, Santa Clara, CA). FISH signals were scored manually (X100 oil immersion) in morphologically intact and non-overlapping nuclei by pathologists, and a minimum of 100 cancer cells from each site were recorded. Cancer sites with very weak or no signals were recorded as insufficiently hybridized. All BACs were obtained from the BACPAC Genomics (Emeryville, CA), and probe locations were verified by hybridization to metaphase spreads of normal peripheral lymphocytes. For detection of gene deletion or amplification, the


Introduction
Prostate cancer (PCa) is the most diagnosed non-skin malignancy in men. Risk factors include age [1], family history [2], diet, obesity and ancestry [3]. Genome-wide association studies have identified common genetic polymorphisms on chromosome 8p to be a significant influence on PCa initiation and progression in populations of European (EUR) ancestry (3a).
. Similarly, specific germline polymorphisms have been associated with prostate cancer risk in men of various ancestries [4,5].
In a similar way, several somatic genomic alterations differ in frequency between ancestries, including TMPRSS2-ERG genomic rearrangements, SPINK1 overexpression and SPOP single nucleotide variants (SNVs), supporting differences in prostate cancer risk and pathway progression [6,7]. The rate of PTEN loss, as assessed by IHC, is lower in men with ME ethnicity and tumors with PTEN loss are not enriched for ERG loss, as documented in EUR cohorts (7a).
Copy number aberrations (CNAs) are the main mutational drivers of prostate cancer, occurring early in evolution [8], being highly prognostic [9], driving subtype [10] and shaping the tumor transcriptome and proteome [11]. The large majority of prostate cancer genomic data is from men with European or Asian ancestry [12] leaving other ancestry groups understudied. To begin to fill this gap, we have assembled a cohort of publicly available CNA data from 376 patients of diverse ancestries, and new data from 25 men of Middle Eastern ancestry.

Fluorescence in situ Hybridization Patient Cohort
A retrospective cohort of 42 radical prostatectomies and transurethral resection of the prostate, from Tawam Hospital (Al Ain-United Arab Emirates) reflecting ME men, was investigated by FISH for NKX3-1 deletion and MYC amplification. All selected cases had histopathologic diagnosis of prostate adenocarcinoma. The original histopathologic classification was confirmed by one of the study pathologists (A.A, SA or TAB). Formalin-fixed, paraffin-embedded tissue sections were used for interphase fluorescence in situ hybridization (FISH). Deparaffinized tissue was treated with 0.2 mol/L HCl for 10 minutes, 2× SSC for 10 minutes at 80°C and digested with Proteinase K (Invitrogen) for 8 minutes. The tissues and BAC probes were co-denatured for 5 minutes at 94°C and hybridized overnight at 37°C. Post-hybridization washing was with 2× SSC with 0.1% Tween 20 for 5 minutes, and fluorescent detection was done using anti-digoxigenin conjugated to fluorescein (Roche Applied Science, Indianapolis, IN) and streptavidin conjugated to Alexa Fluor 594 (Invitrogen). Slides were counterstained and mounted in ProLong Gold Antifade Reagent with 4′,6-diamidino-2-phenylindole (Invitrogen). Slides were examined using a Leica DMRA fluorescence microscope (Leica, Deerfield, IL) and imaged with a CCD camera using the CytoVision software system (Applied Imaging, Santa Clara, CA). FISH signals were scored manually (X100 oil immersion) in morphologically intact and non-overlapping nuclei by pathologists, and a minimum of 100 cancer cells from each site were recorded. Cancer sites with very weak or no signals were recorded as insufficiently hybridized. All BACs were obtained from the BACPAC Genomics (Emeryville, CA), and probe locations were verified by hybridization to metaphase spreads of normal peripheral lymphocytes. For detection of gene deletion or amplification, the following probes were used: for NKX3-1, RP11-325C22 (green) and for MYC, RP11-1136L8 (red). BAC DNA was isolated using a QIAFilter Maxi Prep kit (Qiagen, Valencia, CA), and probes were synthesized using digoxigenin-or biotin-nick translation mixes (Roche Applied Science).

OncoScan SNP Array Patient Cohort
We assessed, 22 out of the 42 cohort samples used for FISH analysis and three additional samples, using the OncoScan SNP microarray. Representative slide(s) from each case corresponding to the formalin-fixed paraffin-embedded block(s), were identified to obtain tissue cores for OncoScan SNP microarray profiling. For molecular analysis, approximately 8-10 punches of 1.5 mm, were collected in Eppendorf tubes from regions with highest percentage of invasive carcinoma, non-tumoral, and involved regional lymph nodes if any. sample processing for the 376 remaining non ME samples were performed as described in a previous publication [10].

SNP Microarray Data Generation and CNA Calling
SNP microarrays were performed with 200 ng of DNA on Affymetrix OncoScan FFPE Express 3.0 arrays as previously described [10]. The genotypes of 217,439 SNPs were extracted from the OncoScan OSCHP files and converted to VCF format. Gene level CNAs for each patient were identified by overlapping copy number segments, with RefGene (2014-07-15) annotation, using BEDTools (v2.17.0) [13]. Percent genome altered (PGA) was calculated for each sample by dividing the number of base pairs that were involved in all copy number segments by the total length of the genome.

Inferring Ancestry
The HGDP-CEPH dataset (n = 1,042) was downloaded [14], subsetted to SNPs that overlap with the OncoScan SNP array (n = 63,320), and converted to VCF format. HGDP-CEPH VCFs were merged with the OncoScan cohort using VCFtools [15]. SNPs in linkage equilibrium with each other were pruned using the -indep command in PLINK [16]. Principal Component Analysis (PCA) was implemented using PLINK 1.9 (--pca) in the entire dataset, as well as within, the European population and Middle Eastern population separately.
The ADMIXTURE v 1.3.0 algorithm [17] was used to infer the ancestry of individuals based on the ancestry proportion given k ancestral populations with the PLINK BED file as input.
Unsupervised ADMIXTURE analysis was run with k-fold cross validation with iterations of k from 2 to 8. This assignment was used to compare CNA profiles between ancestry populations.

Statistical Analysis
Specific statistical tests used are indicated in the figure legends or appropriate methods section and were performed within the R statistical environment (v3.3.1). Visualization in R was performed with the BPG package (v5.9.2) [18].

Results
By FISH, 16% (7/42) of prostate tumors arising in men of ME ancestry harbored an NKX3-1 deletion. These results were confirmed by SNP microarray, where only 20% (5/25) of tumors from men with Middle Eastern ancestry had a deletion of NKX3-1. By comparison, fully 52% (186/346) of tumors arising in men of European ancestry harbored a deletion of NKX3-1 (p = 2.3 x 10 -3 ).
Similarly, by FISH and ~2.5% (1/42) of tumors arising in men of ME ancestry had MYC amplification: this rate was higher by SNP microarray at 16% (4/25), and comparable to the 18% (65/346) rate in men of European (p = 0.94). Fig. S1 shows an example of NKX3-1 deletion detected by FISH.
To expand these intriguing results, we collected a dataset of 401 patients with sporadic, localized, treatment naive disease. Each patient in this cohort had whole genome copy-number profiling of the index lesion of their tumor. This cohort included 25 newly generated tumors from men of Middle Eastern Ancestry ( Table 1; Table S1). Patients either underwent image-guided radiotherapy (IGRT) or surgery (radical prostatectomy), and the histologically most representative region was molecularly profiled. There was no difference in cellularity or other metrics as a function of processing batch (two-way ANOVA; P = 0.15; Fig. S2), as measured by ASCAT [19].
First, we inferred the genetic ancestry of this joint dataset. We integrated our data with the Human Genome Diversity Panel (HGDP-CEPH), which contains samples from 51 different global populations [14]. To determine the ancestry fraction for each individual in the present cohort, we used the software ADMIXTURE [17] (Fig. 1A; Fig. S3). HGDP-CEPH samples previously assigned an ancestry group were not re-assigned, independent of the admixture results. Four primary admixture populations were apparent (Africa -Red; America -Yellow, East Asia -Green and Oceania -Aquamarine). The other populations were a combination of these primary populations, including samples from men with Middle Eastern ancestry, who showed a combination of African, Central South Asian and European genotypic features.
Three Middle Eastern samples in our dataset contained a larger than average proportion of the African primary admixture component (Fig. S4). We performed principal component analysis (PCA) using 63,320 SNPs to investigate genetic diversity (Fig. 1B). PC1 and PC2 explained 30% and 27% of genetic variance, respectively, with PC1 distinguishing East from West populations while PC2 divided African/Middle Eastern populations from non-African populations. PCA of specific geographical regions showed sub-structures within the larger populations, including within specific regions of Europe and the Middle East (Fig 1C-D). As previously reported [20,21],  Fig 2B). Tumors from men with Middle Eastern ancestry had lower PGA (median = 1.93%) than those arising in men of European ancestry (median = 6.45%; p = 1.07 x 10 -2 ; Mann-Whitney test). There was no difference in PGA between tumors from men with African ancestry despite the larger number of aberrations (median = 9.72%) or tumors from men with East Asian ancestry (median = 6.83%).
We assigned each tumor to prostate cancer subtype based on their gene level CNA profile ( Fig.   2A & C) [9]. These subtypes (S1-S4) were defined by specific genomic aberrations (S1chromosome 7 amplification, S2 -8p deletion, 8q amplification, S3 -8p deletion, 16p deletion, S4 -quiet profile). All genetic ancestry groups had a large proportion of tumors in S4, indicative of quiet CNA profiles (Fig 2A). Despite the differences in global genomic instability, the proportion of tumor subtypes did not differ between the different genetic ancestry groups, at least at the statistical power afforded by existing cohorts (Pearson's Χ 2 test; p = 0.16; Fig. 2C).
Understanding the interplay between genetic ancestry and CNAs is important for population specific biomarker identification. To this end, we investigated recurrent gene level CNAs in tumors between men with Middle Eastern ancestry, African ancestry, and East Asian ancestry to tumors from men with European ancestry (Table S2) Fig. 2D). This includes a region on chromosome 7 spanning 26 genes that is amplified in 88% (9/11) of tumors of East Asian ancestry in our cohort. These data provide us with evidence that genes are mutated at different frequencies in individuals of different ancestry, providing further strong support to the idea that germline variation is essential for understanding the emergence of somatic phenotypes [22].

Discussion
Increasingly evidence now shows that germline genetic variation strongly shapes the somatic profiles and evolutionary history of prostate cancer. Both rare deleterious variants in DNA damage response (DDR) genes and common polymorphisms have been shown to do so in Caucasian populations [2,22]. Recent sequencing studies have shown that tumors arising in men of Asian, African or African-American ancestry show distinct somatic mutational features [23][24][25][26]. This study expands on these observations, looking at a multi-ancestric cohort and provides the first analysis of CNAs in men of Middle Eastern (ME) descent.
Tumors in men of ME descent showed less genomic instability than those of other groups, but harbored amplifications in the glutathione S-transferase family of genes on chromosome 1 (GSTM1, GSTM2, GSTM5), the IQ Motif Containing family of genes on chromosome 3 (IQCF1, IQCF2, IQCF13, IQCF4, IQCF5, IQCF6) and PARP3, an enzyme which is required for DNA repair not commonly mutated in tumors arising in men of other ancestries. GSTM1 amplifications have been reported to be marginally associated with prostate cancer risk in a Caribbean population of African descent [27] warranting further investigation into the function of this gene in non-Caucasian cohorts. All of these variants are putatively somatic, but it is possible our genomic analysis included contaminating germline structural variants, which would be of significant interest. Interestingly, all three samples from men of Middle Eastern ancestry that had a high proportion of African admixture components had an amplification in GSTM1.
• Although, we did not observe significant differences between the two ethnicities in the proportion of amplifications and deletions in known genes associated with prostate cancer, we identified significant difference in the NKX3-1 genomic deletions. This is in line with previous work confirming differences in ERG, PTEN and SPINK1 genomic aberrations between the two ancestries (7a). These data confirms that the genomics of prostate cancer in the ME population are different than those in ASN, AFR and EUR populations, likely reflective of factors leading to the higher incidence, rate of disease progression and patients' cause specific mortality observed in ASN, AFR and EUR compared to ME ethnicity.
In conclusion, ancestry should be considered when investigating and characterizing biomarkers and molecular signature relative to disease progression, prognosis and potentially therapeutic targeting. Despite our very limited sample-size, they suggest germline-somatic interactions may be at play in tumors arising in men of different ancestries. Further multi-omic studies of tumors arising in men of different ancestry are clearly warranted.

Funding
This study was conducted with the support to PCB by the NIH/NCI under award number    Percent genome altered (PGA) differed across tumors from men with different ancestries. P value is from a Mann-Whitney test.
c) The proportion of tumors assigned to the [9] subtypes did not differ between tumor from men with different ancestry. P value is from a chi-squared test. The colour of the stacked bars correspond to the CNA subtypes in Fig. 1A.

Figure S3 | ADMIXTURE cluster plots varying inferred ancestries from 2 to 8
Each sample is represented by a vertical line partitioned into coloured segments whose length is proportional to the ancestry coefficient in up to eight inferred populations.

Figure S4 | ADMIXTURE cluster plots for men with Middle Eastern ancestry
Three samples with a larger than average proportion of the Africa primary admixture component are labelled in red.

Figure S5 | CNA profile of prostate cancer associated genes
Each column is a sample and each row is a gene. The color represents whether that sample had a deletion (blue), amplification (red), or is copy number neutral (white). The covariate bar on top indicates the ethnicity of the patient. Table S1 | Patient clinical information   For each patient, this table provides clinical information including ancestry, age at treatment,