Identification of RPS 28 as a Promising Therapeutic Target for 2 Osteosarcoma Patients with Poor Prognoses Stratified by a 3 Seven-gene Signature

1 Department of Orthopaedics, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai Jiao 7  Tong University, Shanghai 200233, PR China 8  2 Department of Pathology, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai Jiao 9  Tong University, Shanghai 200233, PR China 10  3 Department of Computer Science & Engineering, & Center for Cognitive Machines & Computational Health, 11  SEIEE School, Shanghai Jiao Tong University, Shanghai 200240, PR China 12  4 Department of Orthopaedic Surgery, Shanghai Tenth People's Hospital Affiliated to Tongji University, 13  Shanghai 200072, PR China 14  * Correspondence: tushikui@sjtu.edu.cn (S.T); zhiyanliu@shsmu.edu.cn (Z.L.); dongyang6405@163.com (Y.D.) 15  † These authors contributed equally to this work 16


Introduction
Osteosarcoma (OSA), the most common primary malignant solid tumor of bone, pre-49 dominantly occurs in adolescents with a second peak in incidence among adults aged >65 50 years [1]. The standard therapy for OSA, comprising neoadjuvant chemotherapy, surgical 51 removal of the primary lesion along with clinically evident metastatic disease followed by 52 adjuvant chemotherapy, was established in the 1980s and developed gradually, resulting 53 in long-term survival in >60% of patients with localized disease [2][3][4]. Unfortunately, lim- 54 ited progress has been made in improving the survival of patients with OSA [5,6]. As a 55 result, more than 40% of OSA patients' prognoses are still poor, especially for those who 56 have high-risk factors, including Enneking stage III [7] and Huvos grade I/II [8,9]. How-57 ever, such qualitative assessments are relatively subjective, time-consuming, and some-58 times inaccurate [10]. Additionally, more intensified adjuvant chemotherapy for the pa-59 tients in high risk separated by Huvos grade was found only increasing toxicity without 60 improving event-free survival [11]. Therefore, it is imperative to discover actionable bi-61 omarkers for patient stratification and effective therapeutic targets for high-risk patients 62 with this rare but devastating malignancy. 63 With the rapid development of high-throughput gene detection technology, such as 64 microarrays and next-generation sequencing, multi-omics data of tumors were available 65 to identify novel diagnostic or prognostic biomarkers as well as therapeutic targets for 66 various types of cancers by applying deep learning to analyze large and disparate datasets 67 from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) [12]. As 68 for OSA, because of its rareness, only a few small datasets are accessed in GEO, and the 69 database named Therapeutically Applicable Research To Generate Effective Treatments 70 (TARGET), which have been used to construct multitudinous signatures to predict the 71 prognosis of patients with OSA, including a four-methylated LncRNA signature [13], hy-72 poxia-associated signature [14], glycolysis-related signature [15], immune-related signa-73 ture [16][17][18]. But few of them aimed to identify novel therapeutic targets for further exper-74 imental validation. Besides, most of these studies just focused on one kind of molecule or 75 pathway selected according to currently popular areas, with more or less bias, which may 76 compromise the prediction power and reliability. 77 Weighted Gene Co-expression Network Analysis (WGCNA) is a systems biology 78 method widely used to reveal the relevance of genes and sample traits [19], which was 79 frequently used in OSA datasets to identify the hub genes associated with survival time, 80 metastasis, and alive status [20][21][22]. However, there are many factors, such as the different 81 follow-up times of each patient, influencing these sample traits. Recently, it was reported 82 to be a reliable method using WGCNA to reveal the hub genes correlated to a linear pa-83 rameter with prognostic significance, such as the normalized enrichment score (NES) of 84 gene sets [16,23]. 85 In the Molecular Signatures Database (MSigDB), there are fifty hallmark gene sets, 86 which convey specific biological states or processes correlated with cancers, such as P53 87 PATHWAY, HYPOXIA, and WNT BETA CATENIN SIGNALING [24,25], in which HY-88 POXIA was identified as a dominant risk factor for overall survival in lung adenocarci-89 noma patients [23], and seven gene sets were found to be associated with the metastasis 90 status of OSA patients [15]. But few studies reported the relationship between these fifty 91 hallmarks gene sets with the overall survival of OSA patients. 92 In this study, we attempted to discover the hallmarks with prognostic significance 93 for OSA patients, based on which a robust gene signature for prognosis prediction and 94 patient stratification was constructed. Besides, many efforts were taken to discover and 95 validate the therapeutic targets preliminarily for high-risk OSA patients across three co-96 horts. Firstly, the NES of the fifty hallmarks in the TARGET-OS cohort were calculated by 97 Single Sample Gene Set Enrichment Analysis (ssGSEA), of which three hallmarks, UV RE-98 SPONSE UP, CHOLESTEROL HOMEOSTASIS, and XENOBIOTIC METABOLISM, were 99 found to be independent prognostic factors for overall survival via univariate and multi-100 variate Cox regression, which were used to construct a multivariate Cox model, by which 101 a linear parameter of each case, was calculated and named as RiskScoreByGS. Subse-102 quently, WGCNA was used to find the most correlated gene modules to the RiskScore- 103 ByGS and the NES of the three hallmarks. Afterward, multiple methods were combined 104 to screen for robust biomarkers and construct a seven-gene signature to separate OSA 105 patients into high-and low-risk groups, which was further validated using microarray 106 datasets GSE21257 and GSE42352. Furthermore, Differential Expression Analysis, Robust 107 Rank Aggregation (RRA) [26], Metascape [27], univariate and multivariate Cox regression 108 analyses were combined to discover novel therapeutic targets for patients in the high-risk 109 group. Finally, multiple platforms and methods were used to preliminarily validate that 110 RPS28 functions as a tumor promoter by promoting proliferation and metastasis of OSA 111 cells and simultaneously facilitating immune escape in the high-risk patients stratified by 112 the signature, making it a potential therapeutic target worthy of further investigation. 113  The fifty hallmark gene sets were downloaded from the Molecular Signatures Data-144 base (MSigDB) [25]. The normalized enrichment score (NES) of each hallmark of the 88 145 cases in the TARGET-OS cohort were quantified by the ssGSEA algorithm via R package 146 "gsva" (version 1.38.2 ) based on transcriptome profiling data presented as log2(TPM+1). 147 The significance of different hallmarks in the TARGET-OS cohort was evaluated via uni-148 variate Cox regression for overall survival using the R package "survival" (version 3.1-12). 149 With a threshold of p-value < 0.05, the significant hallmarks were selected for the follow-150 ing analysis. 151

152
The significant hallmarks mentioned above were screened according to the Akaike 153 information criterion (AIC) [28] by using the R package "my. stepwise" (version 0.1.0). 154 Finally, using the R package "survival", selected hallmarks were used to construct a mul-155 tivariate Cox proportional-hazard model based on 85 cases with overall survival time > 0. 156 Then, the RiskScoreByGS of all patients was calculated using the function: predict (model, 157 data). The patients with appropriate prognostic information were divided into two groups 158 using the median of RiskScoreByGS as the cut-off value. The Kaplan-Meier (KM) survival 159 analysis was used to assess the clinical predictive capacity of the RiskScoreByGS via "sur-160 vminer" packages (version 0.4.9). 161 163 In the TARGET-OS dataset, 88 cases with RiskScoreByGS and the NESs of the three 164 hallmarks were used for performing WGCNA to identify the modules most correlated 165 with the RiskScoreByGS and each hallmark. The TPM matrix of 19690 mRNAs in 88 cases 166 was constructed, in which 6726 mRNAs, whose TPM values were 0 in more than 44 cases, 167 were removed. Finally, 12964 mRNAs with median absolute deviation (MAD) > 0.01 were 168 selected as input data for subsequent WGCNA using the R package "wgcna" (version 1.70-169 3). No outlier samples were detected and excluded with sample hierarchically clustering 170 Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 22 November 2021 method before six was selected as an appropriate soft threshold power for achieving 171 standard scale-free networks. Furthermore, via the construction of adjacency and topo-172 logical overlap matrix (TOM) and the calculation of corresponding dissimilarity (1-TOM), 173 gene dendrogram and module identification were achieved with dynamic tree cut, and 174 the minimum module size was 30. Additionally, clustering of module eigengenes was 175 conducted to merge highly similar modules with dissimilarity < 0.25. The correlation be-176 tween module eigengene and sample traits, including RiskScoreByGS, as well as the NESs 177 of UV RESPONSE UP, CHOLESTEROL HOMEOSTASIS, and XENOBIOTIC METABO-178 LISM, was calculated. Finally, two modules were selected for the following analyses. The genes of two selected modules were conducted a large-scale survival analysis 182 using the R package "survival". With a threshold of the p-value of univariate Cox regres-183 sion < 0.01 as well as the p-value of log-rank test < 0.01, 43 genes were selected and used 184 as candidates to be further filtered according to Akaike information criterion (AIC) [28]. 185 Finally, selected genes were used to construct a multivariate Cox proportional-hazard 186 model to predict overall survival (OS) using the R package "survival". The Risk Score was 187 calculated by using a function: predict (model, data). OSA patients were divided into two 188 groups by using the median Risk Score as the cut-off value. The Kaplan-Meier survival 189 analysis was used to assess the clinical predictive capacity of the Risk Score in the whole 190 cohort and subgroups via "survminer" packages (version 0.4.9). Time-dependent receiver 191 operating characteristic curves (tROC) was performed to measure the predictive power, 192 and the areas under the curve at different time points [AUC(t)] of our signature and other 193 clinicopathological prognostic factors were compared. Furthermore, the value of patients 194 stratification of this signature was validated in GSE21257 and GSE42352, in which, R pack-195 age "ggstatsplot" (version 0.8.0) was used to conduct Chi-square test and visualization. 196 Finally, based on this multivariate Cox model, a nomogram and a calibration curve were 197 plotted using the R package "regplot" (version 1.1) and "rms" (version 6.2-0), respectively. 198

199
The patients in TARGET-OS, GSE21257, and GSE42352 were separated into low-and 200 high-risk groups according to the respective median of Risk Score predicted based on the 201 multivariate Cox model mentioned above. Differential Expression Analysis was con-202 ducted between high-risk cases and low-risk cases by using the R package "limma" (ver-203 sion 3.46.0) and "edgeR" (version 3.32.1). To integrate the three datasets, we used Robust 204 Rank Aggregation (RRA) method to determine the robust differential expressed genes 205 (DEGs) by using the R package "RobustRankAggreg" (version 1.1), which was reported as 206 a standard method to minimize the bias and errors among several datasets [29]. Genes 207 with the mean of |log2(fold change)| > 0.69 and p-value < 0.05 were considered as the 208 significant robust DEGs. Finally, the Metascape database (https://metascape.org/) [27] was 209 used for functional and pathway enrichment analyses. A min overlap ≥ 3 & p-value ≤ 210 0.01 was considered statistically significant. Protein-protein interaction (PPI) enrichment 211 analyses were carried out and visualized, and Molecular Complex Detection (MCODE) 212 algorithm was applied to identify densely connected network components. 213

214
R package "estimate" (version 1.0.13) [30] was used to calculate ImmuneScore, Stro-215 malScore, EstimateScore, and TumorPurityScore, whose correlation with Risk Score as 216 well as RPS28 was calculated and visualized using R package "ggpubr" (version 0.4.0). 217 The relative enrichment scores of 24 immune cell types in the tumor microenvironment 218 Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 22 November 2021 were estimated using the Immune Cell Abundance Identifier (ImmuneCellAI) [31], whose 219 correlation with down-regulated genes was further analyzed and visualized by using the 220 R package "pheatmap" (version 1.0.12) with a little modified. 221

222
OSA tissues along with para-tumor soft tissues were obtained from five OSA patients 223 who have been conducted limb-sparing or amputation surgery from January to June in 224 2021, whose corresponding normal bone tissues were harvested from the broken-end of 225 tumor segments. This study followed standard guidelines and was approved by the Ethics 226 Committee of Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, 227 PR China. Written informed consent was obtained from all participants, whose demo-228 graphic information and clinicopathological characteristics were displayed in Table 2. 229 OSA tissues were fixed in 4% paraformaldehyde. Bone tissues were decalcified in 10% 230 ethylenediaminetetraacetic acid decalcification solution. All tissues were dehydrated, em-231 bedded in paraffin. Subsequently, four μm-thick tissue sections were stained with pri-232 mary antibodies against RPS28 (Sigma-Aldrich Cat# HPA047132, dilution 1:100, Saint 233 Louis, State of Missouri, USA) and a horseradish peroxidase-conjugated IgG (Servicebio, 234 Wuhan, China). The staining results were shown by diaminobenzidine (DAB), scanned 235 using the NDP NanoZoomer S210 (Hamamatsu, ShizuokaPref, Japan), and analyzed with 236 the NDP view 2.0 software. 237 As described before [32], all of the human OSA cell lines were purchased from the 240 Chinese Academy of Sciences (Shanghai, China) and cultured in Dulbecco's modified Ea-241 gle's medium (DMEM) (Hyclone, Tauranga, New Zealand) with 10% fetal bovine serum 242 (FBS) (Gibco, NY, USA), as well as penicillin (100 U/ml) and streptomycin (100 μg/ml) 243 (Invitrogen, CA, USA) at 37℃ in a humidified 5% CO2 atmosphere. 244

274
Cell counting kit-8 (CCK-8) and colony formation assays were conducted to assess 275 the proliferation of OSA cells. For the CCK-8 assay, RPS28-knockdown and control MG-276 63 cells were seeded in 96-well plates in triplicate at a density of 3 x 10 3 cells/well. The cell 277 viability was evaluated quantitatively at the indicated timepoints by adding 100 μL fresh 278 culture medium with 10% CCK-8 reagent (Dojindo Laboratories Co., Ltd., Kumamoto, 279 Japan) to each well after removing the old medium, followed by incubation at 37℃ for 1 280 hour and detecting the optical density (O.D.) at 450 nm by using a microplate reader (BI-281 OTEK, Vermont, USA). Finally, each O.D. value was normalized to the control group's 282 average to obtain the ratio of viable cells, as published previously [32]. 283 For the colony formation assay, RPS28-knockdown and control MG-63 cells were 284 seeded in 6 cm dishes in triplicate at a density of 1 x 10 3 cells/dish, then cultured with the 285 medium replaced every three days. After one week, colonies were fixed with 4% para-286 formaldehyde, stained with 1% crystal violet, washed with phosphate-buffered saline 287 (PBS) solutions, and photographed. The number of clones with more than 50 cells was 288 counted. 289

290
For the wound healing assay, RPS28-knockdown and control MG-63 cells were 291 seeded in 6-well plates in triplicate. Cells were scratched with a 200 μL sterile pipette tip 292 to make a gap when the cell density reached approximately 90%, which were further 293 cultured in serum-free medium for 24 hours. The gaps were observed and photographed 294 at 0 and 24 hours by CKX41 inverted microscopy (Olympus, Tokyo, Japan) and analyzed 295 with Fiji Image J Software. 296 For transwell migration assay, cells were pretreated with 10ug/ml mitomycin C 297 (Selleck Chemicals, TX, USA) for two hours. Then, 5 x 10 4 cells with 100 μL serum-free 298 DMEM were seeded into the upper transwell® chamber (Corning #3422) (Corning, NY, 299 USA), whose lower chamber was added 600 μL DMEM with 10% FBS. After incubation 300 for 48 h, cells in the bottom chamber were fixed with 4% paraformaldehyde, stained with 301 1% crystal violet, washed with PBS solutions, and photographed. Three random fields of 302 vision were captured and counted the fixed cells. 303 Matrigel invasion assay was carried out with Corning® BioCoat™ Matrigel Invasion 304 Chambers (#354480)( Corning, NY, USA). Briefly, 5 x 10 4 cells with 100 μL serum-free 305 DMEM were seeded into the upper chamber, whose lower chamber was added 600 μL 306 DMEM with 10% FBS. After incubation for 72 h, cells in the bottom chamber were stained, 307 imaged, and counted as the same methods in the migration assay. 308

309
Unless stated otherwise, statistical analyses were conducted based on the R language 310 (version 4.0.2) (https://cran.r-project.org/) and Graphpad Prism Version 9.0 software. All 311 experiments were performed at least three times independently, and results are expressed 312 as means ± standard deviation (S.D.). The student's t-test was used to analyze differences 313 between the two groups. The differences were considered significant if p-value <0.05. Firstly, six hallmark gene sets are identified as significant risk or protective factors 317 for overall survival in OSA patients. Three were selected to construct a multivariate Cox 318 model based on gene sets (Figure 1a). Then, WGCNA, univariate Cox regression analysis, 319 Kaplan-Meier (KM) survival analysis, and multivariate Cox regression were combined to 320 filter promising candidates and establish a gene signature to predict overall survival (Fig-321  ure 1b). Subsequently, the prognostic value of the gene signature was assessed and vali-322 dated in the training and two independent validation cohorts ( Figure 1c). Differential Ex-323 pression Analysis was conducted between high-and low-risk patients, separated accord-324 ing to the median Risk Score in three independent cohorts, respectively. The Robust Rank 325 Aggregation method was used to determine the robust DEGs. The upregulated and down-326 regulated DEGs were respectively imported into Metascape to perform pathway and pro-327 cess enrichment analyses as well as construct protein-protein interaction (PPI) networks, 328 in which the Molecular Complex Detection (MCODE) algorithm has been applied to iden-329 tify densely connected network components, which may be the critical therapeutic targets, 330 in which RPS28 was identified as an independent prognostic factor ( Figure 1d). Finally, 331 RPS28 was preliminarily validated as a promising therapeutic target using multiple meth-332 ods (Figure 1e).

341
Based on the NES of cancer hallmarks and overall survival information in the TAR-342 GET-OS cohort, the Cox coefficient of each hallmark was calculated. With the threshold 343 of the p-value of univariate Cox regression < 0.05, six significant hallmarks were discov-344 ered and displayed (Figure 2a), in which three were selected further to construct a multi-345 variate Cox proportional-hazard model, including UV RESPONSE UP, CHOLESTEROL 346 HOMEOSTASIS, and XENOBIOTIC METABOLISM (Figure 2b). The RiskScoresByGS of 347 85 samples in this cohort were calculated and ranked, of which the median was used as 348 the cut-off value to divide 85 patients into low-risk and high-risk groups (Figure 2c). After 349 that, the overall survival status of the two groups of patients was displayed and compared 350 (Figure 2d, Figure2f). A heatmap was drawn to illustrate the profiles of the three hall-351 marks in all samples (Figure 2e). Kaplan-Meier (KM) curves demonstrated that low-risk 352 patients had better survival than high-risk ones (p = 0.0017), showing that RiskScoreByGS 353 has a practical value for predicting overall survival (Figure 2f). The same analyses were 354 conducted in 81 patients whose event-free survival data could be obtained, further vali-355 dating the RiskScoreByGS had a remarkable prognostic value (Figure 2g-2j).

Establishment of Multivariate Cox Proportional-Hazard Model Based on Gene Signature 364
Related to the Significant Hallmark Gene Sets 365 WGCNA was conducted using transcriptome profiling data of 88 samples in the 366 TARGET-OS cohort to find the modules related to the RiskScoreByGS and the NESs of the 367 three significant hallmark gene sets ( Figure S1a). β = 6 and R2 = 0.85 were recognized as 368 the soft-threshold for scale-free network ( Figure S1b). The minimum module size was set 369 at 30, and modules with a correlation coefficient greater than 0.75 were merged. Subse-370 quently, 58 modules were set up ( Figure S1c). Module-trait relationships' heatmaps (Fig-371  ure 3a)

Assessment and Validation of the Prognostic Value of Multivariate Cox Proportional-Hazard 395
Model Based on Gene Signature 396 According to the multivariate Cox proportional-hazard model based on the seven-397 gene signature, Risk Scores of 85 samples in the TARGET-OS cohort were calculated and 398 ranked. The median was used as the cut-off value to separate 85 patients into low-risk and 399 high-risk groups (Figure 4a). After that, the overall survival status of the two group pa-400 tients was displayed and compared (Figure 4b, Figure 4d). A heatmap was drawn to illus-401 trate the expression profiles of the seven genes in the corresponding samples (Figure 4c). 402 Kaplan-Meier (KM) curves demonstrated low-risk patients had remarkably better sur-403 vival than high-risk ones (p < 0.0001) (Figure 4d). Time-dependent ROC curves indicated 404 that this multivariate Cox model had good robustness with a 1-year AUC of 0.75, 3-years 405 AUC of 0.89, and 5-years AUC of 0.9 (Figure 4e). When compared with clinicopathological 406 characteristics, including gender, age, stage, and response to neoadjuvant chemotherapy, 407 the prognostic value of this gene signature exhibited the most potent and stable ability for 408 predicting overall survival most of the time (Figure 4f). 409 To further evaluate the independence and applicability of this seven-gene signature, 410 patients were separated into different subgroups according to clinicopathological charac-411 teristics and performed KM survival analysis. The KM curves showed that regardless of 412 Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 22 November 2021 gender, age, stage, and response to neoadjuvant chemotherapy, significant differences be-413 tween the overall survival of low-risk and high-risk patients were observed (Figure 5a-414 5h). The same analyses were conducted in 81 patients whose event-free survival data 415 could be obtained, further validating this gene signature had a remarkable prognostic 416 value (Figure 4g-4i and Figure 5i-5p). 417 GEO datasets, GSE21257 and GSE42352, were used as independent validation co-418 horts to ensure the reliability of this seven-gene signature. GSE21257 comprised 53 cases, 419 in which 27 were separated into high-risk groups and 26 were in the low-risk group. KM 420 curve showed that the low-risk group had a better overall survival (p = 0.028) ( Figure S2a). 421 Time-dependent ROC curves demonstrated that this signature also had good robustness 422 in an independent cohort with a 1-year AUC of 0.82, 3-years AUC of 0.72, and 5-years 423 AUC of 0.77 ( Figure S2b). In GSE21257, only 39 cases whose event-free survival time was 424 more than 0 months were used to validate the prognostic value for predicting event-free 425 survival ( Figure S2c-S2d). In GSE42352, the patients who relapsed within five years had a 426 higher risk score than those non-relapsed (p = 0.0019) ( Figure S2e). Additionally, 85% of 427 patients relapsed within five years in the high-risk group, significantly higher than that in 428 the low-risk group (Chi-square test p = 0.002) ( Figure S2f). These results demonstrated 429 that this seven-gene signature was an excellent prognostic instrument, including overall 430 and event-free survival in validation cohorts. 431 Enneking staging system is a known prognostic factor for OSA patients, which was 432 combined with Risk Score to conduct multivariate Cox regression to verify the independ-433 ence of Risk Score in estimating prognosis (Figure 6a and 6b). Nomogram was a powerful 434 method to integrate different risk factors to predict the prognosis of patients with tumors. 435 Therefore, The multivariate Cox model using seven-gene signature was visualized by us-436 ing nomogram (Figure 6c), whose calibration curve was plotted in Figure 6d, illustrating 437 that the nomogram predicted 5-year overall survival better than 1-and 3-year overall 438 survival for their relative more minor divergence between actual and predicted survival 439 status (Figure 6d).   458 We used the Robust Rank Aggregation (RRA) method to integrate three cohorts with 459 a minimal bias to determine the DEGs between low-and high-risk groups stratified by 460 the median Risk Score in each cohort, respectively. A total of 202 robust DEGs were se-461 lected, including 139 down-regulated and 63 upregulated genes (Supplementary Table 462  S1). According to the robust DEGs' log2 (fold change), we displayed the top 30 upregu-463 lated and downregulated ones via heatmap (Figure 7a). Via pathway and process enrich-464 ment analyses, the down-regulated genes were found mainly involved with humoral im-465 mune response, and the upregulated ones were found involved with extracellular matrix 466 and endochondral ossification (Figure 7b). The downregulated genes were separated into 467 eight modules via protein-protein interaction enrichment analysis and the MCODE algo-468 rithm. The top 3 with their corresponding function description are displayed in Figure 7c. 469 In Figure 7d, only two modules and interrelated functions of upregulated genes are shown, 470 illustrating that the genes in MCODE_1 were involved with collagen trimer and protein 471 digestion and absorption. The genes in MCODE_2 were involved with the translation and 472 peptide biosynthetic process. The results above indicated the widespread relevance be-473 tween this Risk Score and immune status and the possibility of the down-regulated gene 474 perhaps mainly expressed in the immune cells infiltrated in OSA. To testify these two 475 hypotheses, firstly, the correlation analysis between Risk Score and TumorPurity, Im-476 muneScore, and StromalScore was conducted and displayed in Figure 7e. Secondly, a 477 heatmap (Figure 7f) was drawn to show the expression profiler of the gene within the top 478 5 modules in different samples, including OSA biopsy specimens (Biopsy), mesenchymal 479 stem cell (MSC), osteoblast (OB), and OSA cell lines (OS), demonstrating many down-480 regulated genes, such as HLA-DRB1, CD33, IFI30, VAMP8 and so on, highly expressed in 481 biopsy specimens, but nearly did not express in OS. Most of the down-regulated genes 482 did not influence the overall survival in the TARGET-OS cohort. Therefore, we reckon 483 that the downregulated genes perhaps represent the down-infiltrated immune cells. 484 Therefore, ImmuneCellAI was used to estimate the composition of 24 immune cells in 485 OSA biopsy specimens of the TARGET-OS cohort. Figure 7g showed the down-regulated 486 genes indeed positively correlated with many immune cells including, nature regulatory 487 T cell (nTreg), macrophage, natural killer cell (N.K.), Cytotoxic Tcell (Cytotoxic), Type I 488 regulatory T cell (Tr1), Dendritic cells (D.C.), and Gamma delta T cell. Six upregulated 489 genes were identified as risk factors via univariate Cox regression (Figure 7h). Finally, 490 only RPS28 was identified as an independent risk factor via multivariate Cox regression 491 (Figure 7i). variate Cox regression analysis of the ten clustered upregulated genes; (c) Forest plot of the multivariate Cox regression 503 analysis showed that RPS28 is an independent risk factor for overall survival in the TARGET-OS cohort. (*p < 0.05, ** p < 504 0.01, *** p < 0.001). 505 506 KM curve showed that the patients with higher RPS28 expression were with much 507 worse OS and EFS in the three cohorts, including TARGET-OS (Figure 8a), GSE21257 (Fig-508 ure 8b), and GSE42352 (analyzed in R2) (Figure 8c). Figure 8d showed that RPS28 was 509 significantly highly expressed in OSA cell lines compared with the putative progenitor 510 cells of OSA, including MSC and OB. Figure 8e showed RPS28 was significantly highly 511 expressed in OSA specimens when compared with normal bone tissues. Therefore, we 512 reckon that RPS28 may be a promising therapeutic target in OSA cells, not the microenvi-513 ronment. Nevertheless, it was reported that RPS28 controls MHC class I peptide genera-514 tion and influences CD8 + T cell cancer immunosurveillance [33]. Indeed, in The Human 515 Protein Atlas, RPS28 protein was highly expressed in one of the OSA cell lines, U2-OS, but 516 HAL-A was no staining, supporting the expression of RPS28 and HAL-A may be nega-517 tively correlated (Figure 8f). Additionally, across the three cohorts, using the median of 518 RPS28 expression value, patients were separated into the low-RPS28 group and high-519 RPS28 group, and ImmuneScores were found significantly lower in the latter (Figure 8g). 520 By using the Query Gene Function on TIDE (http://tide.dfci.harvard.edu/query/) [34], the 521 expression of RPS28 was found negatively correlated with the cytotoxic T lymphocyte 522 (CTL) level in GSE21257 (Figure 8h). 523 To further validate the clinical significance of RPS28 in OSA, immunohistochemical 524 staining of RPS28 protein in and around the tumor was performed in the five patients of 525 our hospital, whose tumor necrosis rates were all less than 90% (Table s1), showing that 526 RPS28 protein was intensively expressed in OSA tissues, but rarely expressed in either 527 para-tumor soft tissue or normal bone tissue (Figure 9a and Figure S3a-c). It was reported 528 that decreased RPS28 protein concentrations impaired Hela cell viability [35]. Therefore, 529 we guessed that RPS28 might promote tumor immune escaping and involve OSA 530 development directly, which was further validated in vitro. 531 In multiple OSA cell lines, MG-63 showed the highest expression of RPS28 mRNA 532 (Figure 9b), which was selected for further RNA interference experiments, in which si-533 RPS28-3 showed the best interference effect (Figure 9c), whose knockdown efficiency was 534 verified by western blotting analysis ( Figure 9d). As shown in Figure 9e-g, the CCK-8 and 535 colony formation assays demonstrated that RPS28 knockdown significantly inhibited the 536 proliferation of MG-63. As mentioned above, the patients with higher RPS28 expression 537 had much worse metastasis-free survival (Figure 8c), indicating that RPS28 may be closely 538 associated with OSA metastasis, which was also validated by wound healing assay, 539 transwell migration, and matrigel invasion assays (Figure 9h-k).

559
With the establishment of standard therapy for OSA in the 1980s, 60% of patients 560 with localized diseases achieved long-term survival [4]. Nevertheless, limited therapeutic 561 progress has been made since that time. We reckon it is essential to recognize the high-562 risk patients who might not respond to standard therapy and develop supplementary 563 treatments for these patients. 564 Despite being the most common primary bone cancer in children and young adults, 565 OSA remains a rare but genomically complex malignancy [36]. As a result, it is challenging 566 to get enough OSA specimens in a short time when we want to start research. Therefore, 567 it is crucial to conduct big-data analysis by integrating multiple public datasets for dis-568 covering universal and robust therapeutic targets. 569 In the present research, we firstly identified six hallmark gene sets that were involved 570 with the overall survival of OSA patients, in which UV RESPONSE UP containing the 571 genes upregulated in response to ultraviolet (U.V.) radiation, and CHOLESTEROL HO-572 MEOSTASIS containing the genes involved in cholesterol homeostasis were found as risk 573 factors. In contrast, INTERFERON GAMMA RESPONSE containing the genes upregu-574 lated in response to IFNG, ALLOGRAFT REJECTION containing the genes upregulated 575 during transplant rejection, IL6 JAK STAT3 SIGNALING containing the genes upregu-576 lated by IL6 via STAT3, and XENOBIOTIC METABOLISM containing the genes encoding 577 proteins involved in the processing of drugs and other xenobiotics were found as protec-578 tive factors. Subsequently, a multivariate Cox model was conducted based on hallmark 579 gene sets, including UV RESPONSE UP, CHOLESTEROL HOMEOSTASIS, and 580 XENOBIOTIC METABOLISM. However, because the microarray datasets lack a lot of 581 genes, the results of ssGSEA across the three cohorts were not robust enough. As a result, 582 the predictive value of this model could not be validated in independent cohorts (the data 583 were not displayed). But we still reckon it is an important discovery. Because it was re-584 ported that disturbed cholesterol balance underlies not only cardiovascular disease but 585 also an increasing number of other diseases such as neurodegenerative diseases and can-586 cers [37], and we revealed that cholesterol homeostasis might also influence the overall 587 survival of OSA patients, which is worth studying intensely. For constructing a universal 588 model across the three cohorts, a seven-gene signature composed of CORT, MMD, CD180, 589 COL5A2, CDC42EP3, FUCA1, and SLC8A3 was established. 590 Monocyte to macrophage differentiation-associated (MMD) protein is a member of 591 the progestin and AdipoQ receptor (PAQR) family and was found in mature macrophages 592 and reported to be associated with macrophage activation in vitro [38]. CD180, also 593 known as RP105, is a cell surface molecule consisting of extracellular leucine-rich repeats 594 (LLR) and a short cytoplasmic tail. The extracellular LRR is associated with a molecule 595 called MD-1 and forms the cell surface receptor complex, RP105/MD-1. It belongs to the 596 family of pathogen receptors, Toll-like receptors (TLR). RP105/MD1 controls B cell recog-597 nition and lipopolysaccharide (LPS) signaling by working with TLR4 [39]. Solute carrier 598 family 8 member A3 (SLC8A3) encodes a member of the sodium/calcium exchanger inte-599 gral membrane protein family. The downregulation of SLC8A3 is positively correlated 600 with the prognosis of stage II-IV colon cancer [40]. Interestingly, human macrophages and 601 monocytes express functional SLC8A3 [41]. Cortistatin (CORT) encodes a neuropeptide 602 that shares high homology with somatostatin (somatotropin release-inhibiting factor, 603 SRIF) and can bind with high affinity to all somatostatin (SST) receptor subtypes [42]. Alt-604 hough a report has shown that CORT could inhibit the proliferation in cancer cells [43], it 605 is a risk factor in OSA, according to the results of this study. Because the transcriptome 606 data were from bulk RNA-sequencing, the mRNA detected came from OSA cells and in-607 filtrating immune and stromal cells. As a result, the contradiction is understandable. 608 Precisely, CORT could be detected in cells and tissues from the human immune system 609 and played an essential role in monocyte differentiation and activation [44]. Collagen type 610 V alpha 2 chain (COL5A2) encodes an alpha chain for one of the low abundance fibrillar 611 collagens, which is the component of the extracellular matrix and is an unfavorable prog-612 nostic factor in gastric cancer [45] and bladder cancer [46]. CDC42 effector protein 3 613 (CDC42EP3) is a member of the cell division cycle 42 (CDC42) effector protein family. 614 Previous studies showed that CDC42EP3 plays a role in the function of cancer-associated 615 fibroblasts and DNA damage response [47-49] and is a promoter in gastric cancer and 616 colorectal cancer [50,51]. The protein encoded by Alpha-L-fucosidase 1 (FUCA1) is a lyso-617 somal enzyme involved in the degradation of fucose-containing glycoproteins and glyco-618 lipids. It was reported that FUCA1 as a target of p53 functions as a tumor suppressor in 619 colon cancer [52], luminal B LN+ breast cancer [53], and thyroid cancer [54], but as a tumor 620 promoter in glioma instead [55]. Tumor suppressor protein p53 encoded by Tumor pro-621 tein p53 (TP53) is a tumor suppressor associated with OSA [56]. It is believable that 622 FUCA1, one of the targets of p53, functions as a protective factor in OSA. As mentioned 623 above, this seven-gene signature is reasonable, which was further validated in both inter-624 nal and external OSA cohorts. 625 Using this seven-gene signature, the Risk Score of each case was calculated, whose 626 median was used to separate patients into high-and low-risk groups. KM survival analy-627 sis demonstrated that patients in the high-risk group have much poor overall and event-628 free survival in the training and validation cohorts. To further estimate the predictive 629 value of this signature, KM survival analysis was conducted in several clinical sub-groups 630 in the TARGET-OS cohort. Consistently, in all clinical subgroups, patients in the high-risk 631 group had poorer overall and event-free survival except for in the huvos III/IV, which is 632 because the cases in this sub-group were too little to get a statistical significance. Time-633 dependent ROC curve analysis confirmed the satisfactory accuracy of this predictive 634 model in 1-, 3-and 5-years. Compared with clinicopathological characteristics, the pre-635 dictive value of this signature exhibited the most potent and stable ability for predicting 636 overall and evet-free survival most of the time. Additionally, multivariate Cox regression 637 analysis showed that both Risk Score and Enneking stage were independent risk factors 638 in both the TARGET-OS cohort and GSE21257 cohort. Taken together, the results men-639 tioned above proved this seven-gene signature had a substantial predictive value in OSA 640 patients. A nomogram composed of the signature was designed to facilitate stratifying 641 patients based on the TPM value of seven genes obtained from the bulk RNA sequencing 642 using their biopsy specimens. 643 To discover effective therapeutic targets for high-risk OSA patients, we firstly 644 identified the differentially expressed gene between high-and low-risk patients in the 645 TARGET-OS cohort, GSE21257 cohort, and GSE42352 cohort by using Differential 646 Expression Analysis and RRA algorithms. Then, by constructing the PPI network and 647 identifying the highly interconnected genes using the MCODE algorithm, the top 3 648 functional subnet modules in down-regulated genes and the only two functional subnet 649 modules in upregulated genes were identified used for further analyses. According to the 650 results of G.O. enrichment analyses, most down-regulated genes were enriched in 651 immune-related biological processes and cellular components, including antigen 652 processing and presentation of peptide antigen via MHC class II, tertiary granule, and 653 primary lysosome. Further results showed that Risk Score was negatively correlated with 654 ImmuneScore, and most of the down-regulated genes almost did not express in OSA cell 655 lines, indicating they might represent several immune cells infiltrating in OSA specimens, 656 which could be a potential therapeutic target in the tumor microenvironment. The only 657 two MCODES in the upregulated genes were enriched in collagen trimer and translation, 658 in which prolyl 4-hydroxylase subunit alpha 1 (P4HA1) was validated as a risk factor, 659 promoting the metastasis of OSA cells in vitro and in vivo [15]. However, RPS28 was 660 finally found as an independent risk factor, which was validated further by multiple 661 methods, including KM survival analyses, comparing of RPS28 expression between OSA 662 tissues and normal tissues in mRNA level and protein level, RNA interference experiment, 663 and so on. 664 Ribosomes, consisting of a small 40S subunit and a large 60S subunit, are responsible 665 for translating information contained in mRNA into functional proteins, the ultimate step 666 in the genetic program [57]. Notably, the hyperactivation of ribosome biogenesis, which 667 can be initiated by oncogenes or the loss of tumor suppressor genes, had a critical role in 668 cancer initiation and progression [58]. Ribosomal protein S28 (RPS28) gene, located at 669 19p13.2, encodes a ribosomal protein that is a component of the small 40S subunit. A 670 report showed that RPS28 controls MHC class I peptide generation and influences CD8 + T 671 cell cancer immunosurveillance in melanoma [33]. We reckon that RPS28 might play the 672 same role in OSA as a risk factor, based on the pieces of evidence including U2OS cell 673 lines with an intensive expression of RPS28 protein but no expression of HLA-A, the OSA 674 patients in the high-RPS28 group with lower ImmuneScores, and the negative correlation 675 between the expression of RPS28 mRNA with CTL level in OSA cohort GSE21257. 676 Additionally, in vitro experiments showed that RPS28 knockdown impaired the 677 proliferation, migration, and invasion of MG-63 cell lines, consistent with its influence 678 exerted in Hela cell lines [35] and three OSA cohorts mentioned above. Our results suggest 679 that RPS28 functions as a tumor promoter in OSA by promoting the abilities of 680 proliferation and metastasis of tumor cells, simultaneously facilitating immune escape, 681 making it a potential therapeutic target worthy of further investigation. 682 Some limitations still existed. Firstly, the size of used cohorts was relatively small, 683 and some cases had missing clinical information. Consequently, we are constructing a 684 more extensive multi-omics OSA database using our cohort. Secondly, the detailed 685 mechanisms underlying how RPS28 exerts its effects mentioned above and the exact 686 efficacy of RPS28 depletion in vivo require further exploration. 687

688
In conclusion, by multiple methods, we identified a seven-gene prognostic signature 689 for OSA patients and a calculated Risk Score that was a robust independent prognostic 690 biomarker. Additionally, according to the signature, a nomogram was conducted to pre-691 dict the prognosis and stratification for OSA patients based on the TPM value of the seven 692 genes obtained from the bulk RNA sequencing using their biopsy specimens. Finally, We 693 found that RPS28 was a promising therapeutic target for the patients in the high-risk 694 group separated by the signature.

695
Supplementary Materials: The following are available online at www.mdpi.com/xxx/s1, Figure S1. 696 WGCNA analysis in TARGET-OS cohort, Figure S2. External validation of the predictive value of 697 the gene signature, Figure S3. Immunochemistry staining of RPS28 in our cohort, Table S1. Differ-698 entially expressed genes (DEGs) between high-and low-risk groups with |log2 (fold change)| > 0.69 699 and p-Value < 0.05 by RRA analysis.

Conflicts of Interest:
The authors declare no competing financial and non-financial conflicts of in-716 terest. 717