Predictors of Survival in Children with Ependymoma from a Single Center : Using Random Survival Forests Running Title : Random Survival Forests of Children with Ependymoma

Ependymoma is responsible for 8–10% of all pediatric brain tumors and constitutes the third most common brain tumor in children. No robust molecular markers are yet in routine clinical use. Surgical resection and adjuvant radiotherapy cure approximately 40-70% of pediatric patients with ependymoma. In our centre, we have been using prophylactic valproic acid treatment for brain tumor patients. Initial observations indicated that valproate could have a beneficial effect in the survival of patients. Recent observations by other authors have shown that patients with glioblastoma benefited from the treatment with valproic acid, a histone deacetylase inhibitor. We have used random survival forest, a novel ensemble survival modelling method to study a single-center, small number cohort of pediatric patients with ependymoma. This analysis has confirmed surgery resection extent and treatment with radiotherapy as independent predictors of overall survival. Treatment with valproic acid was also a predictor of higher survival in this cohort. These results highlight the potential usefullness of the random survival forest model in gathering information from retrospective data. More data is needed about the possible influence of histone deacetylase inhibition by valproic acid in the survival of patients with ependymoma.

supratentorial brain, posterior fossa, or spinal cord.Extent of surgical resection remains the most important factor affecting long-term disease control [3].Metastasis incidence at diagnosis is highly variable in different series.It seems that metastatic disease is present in 9-20% of patients at diagnosis, and it is associated with a dismal outcome [4].In recent years, genetic studies of ependymoma have begun to improve the understanding of its biology and to suggest approaches to defining disease risk [5][6][7].However, no robust molecular markers are yet in routine clinical use.Although earlier studies reported the influence of tumor biology on disease outcome, a consensus on this question has not been possible until now.Nonetheless, a comprehensive stratification system has been proposed that groups patients into low-risk or high-risk [8].Surgical resection and adjuvant radiotherapy cure approximately 40-70% of pediatric patients with ependymoma [3].
In our centre, we have chosen prophylactic valproic acid treatment for brain tumor patients [9].Initial observations of the survival of our patients have rendered us suspicious that valproate could have a possible beneficial effect [10].Recently, Weller et al reappraised the EORTC/NCIC glioblastoma temozolomide clinical trial data, showing that patients treated with temozolomide and radiotherapy, as well as with valproic acid, have had a significant survival advantage [11].A retrospective single-centre study and metaanalysis of published data has also confirmed a statistically significant survival advantage in patients with newly diagnosed glioblastoma treated with valproate [12].In contrast, we have published evidence that shows lack of statistically significant survival advantage in children with malignant brain tumors when treated with prophylatic valproic acid [9].In order to study the possible influence of valproate in the survival of pediatric brain tumor patients in our centre we have done a retrospective cohort study.
Survival models are often based on multivariable Cox proportional hazard regression.These methods have a potential for creating bias and are prone to variation of results.In addition, regression modeling and variable selection are not straightforward, nor easy to understand by clinicians without a statistics background.One of the main disadvantages of these methods is the fact that one may specify the co-variates previously.
Classification and regression tree (CART) models may be an intuitive alternative for clinicians, because they illustrate the importance and relationship of variables at a glance [13].However, classification and regression trees suffer from high variance and poor performance, which leads to instability.Random survival forests (RSF) modeling is a new statistical method that grows numerous mature trees with many branches, reducing variance and bias by using all variables collected and by automatically assessing for nonlinear effects and complex interactions.[14,15]

MATERIAL AND METHODS:
These results are part of a retrospective study approved by the institutional review board of our institution.We reviewed the charts of patients referred to our institution and diagnosed between January 2000 and December 2010 with ependymoma, aged 0-17 years.
Since January 2007, valproate sodium at doses 10-15mg/kg/day every 8-12h was routinely prescribed for all pediatric brain tumor patients in our institution as prophylactic anticonvulsant.
The primary study endpoint was time to death from any cause, measured from the diagnosis, from which overall survival (OS) percentage was computed.The primary objective of the statistical analysis was to determine the predictors of OS.The following variables were assessed for prognostic value: age, sex, metastasis at diagnosis (metastasis), anaplasia, tumor site, treatment with chemotherapy, treatment with radiotherapy, extent of surgery, prophylatic treatment with valproate.Random survival analysis used all-cause mortality for the outcome.A survival forest of 1000 survival trees was constructed.
Importance of a variable was assessed by minimal depth from the tree trunk Variables were selected with minimal depth method and the model was rerun, yielding another forest with only the selected variables.Prediction accuracy for RSF was assessed by Harrell C-index (C) using out-of-bag (OOB) data.The error rate was computed as 1 − C. A hundred replications were run and the mean and standard deviation of the concordance error rate were recorded (1000 trees in each replicated forest).Informativeness of each predictor selected was assessed graphically by plotting importance values and partial predicted survival time for a given predictor, after adjusting for all other predictors.Next, a nested analysis was done by sorting predictors by their importance values and considering the nested sequence of models starting with the top variable, followed by the model with the

RESULTS:
Between 2000 and 2010, 27 patients were diagnosed with ependymoma.There were seventeen males and 10 females.Mean and median ages were 7.6 and 8.3 years.Seventeen had posterior fossa tumors, 6 had supratentorial lesions and 4 had spinal tumors.Eighteen were submitted to complete surgical resection, whereas 9 had partial resection or biopsy.
Nineteen received radiotherapy.Seventeen received adjuvant chemotherapy.Simple Kaplan-Meier estimates were calculated (figure 1).Univariate analysis with log-rank test indicated that surgical extent (complete resection vs not complete; 4-year survival 60% vs 22%, p<0.01), radiotherapy (treated vs. not treated; 4-year survival 63% vs 12%, p<0.01) and valproic acid therapy (treated vs. not treated; 4-year survival 67% vs 12%, p=0.01) modified survival, but not tumor site (spinal vs supratentorial vs posterior fossa; 4-year survival 75% vs 56% vs 40%, p=0.5).RSF model indicated 6 variables as predictors of OS: surgery extent, valproic acid treatment, radiotherapy treatment, anaplasia, topography and chemotherapy.After variable selection, 3 variables were left, in order of importance: valproic acid, surgery, and radiotherapy.The others were discarded.Mean concordance error rate for the final RSF model was 0.25 (± 0.016).Drop in concordance error rate was 0.2365 for surgery extent, and 0.0166 for valproic acid treatment.The concordance error rate for bootstraped Cox model was 0.28, close to but higher than the 0.22 average value for RSF.
Survival trees generated in RSF are similar to those built by CART.We plotted CART trees from the predictors in our analysis as example (figure 2, A through D).The first tree was modelled with all predictors, yielding a one-split tree with two terminal nodes (2A).In this model, only surgical resection extent predicted survival (OR=0.59 for complete resection, encompassing 67% of patients).The next tree was modelled without surgical extent as predictor, and revealed radiotherapy treatment as a predictor (OR=0.65,70% of patients, figure 2B).Modelling without both surgical extent and radiotherapy produced a tree with valproic acid treatment as predictor (OR=0.62,63% of patients, figure 2C).Excluding further valproic acid treatment as a predictor had the effect of showing topography as a predictor of survival (OR=0.66 for combined supratentorial and spinal, 37% of patients, figure 2D).
In RSF modelling, 1000 of those trees were generated, randomly varying both predictors as well as number of selected patients (see figure 3 for trees sampled from the forest).Then, the aggregate variable importance (calculated from C-index) was computed.
This yielded a set of variables classified by its importance as predictors of survival (figure 4, upper panel).The first 5 variables were predictive for survival.Using the minimum depth method, we selected only the most strong variables (figure 4, lower panel).The Figure 5 shows the ensemble survival predicted by the final model, weighed by the three chosen variables: surgical resection, radiotherapy and valproic acid prophylatic treatment.

DISCUSSION:
The predictors of overall survival in this single-center cohort of pediatric patients wih ependymoma were surgery extent (complete vs. uncomplete resection), radiotherapy treatment and prophylatic seizure treatment with valproic acid.In RSF, the most important variables are identified as those that most frequently split the branches near the tree trunks.
There are no prespecified assumptions regarding variables, and randomization is introduced into this model by both random bootstrap sampling of patients from the original cohort and random sampling of variables for each tree branch.In RSF, the most predictive variables for the cohort are defined as those whose minimal depth (averaged over the forest) is smaller than the mean depth determined under the null hypothesis of no effect.
Concordance error is a suitable way to perform a model diagnosis.If error equals 0.5 this corresponds to random guessing, whereas an error of zero indicates perfect accuracy.
Preferably, the error should be about 0.25 or less.Robustness of RSF derives from its nonparametric nature and randomization.Even though it is unclear if RSF constitutes a better model than standard Cox models, they are suitable alternatives, and could be used to acquire complementar information from censored data.In special, the mining for predictive parameters through RSF seems very promising [14,15].
Ensemble learning methods are statistical algorithms that search through a hypothesis space generated by a single base learner method to build a suitable hypothesis that will make good predictions with a particular problem.Ensembles can be viewed as methods to choose weak learners and build an ensemble strong predictive model from them.Random forests are ensembles of decision trees.One of its main applications is in classification of biomedical datasets, which often have more covariates than sample numbers, a problem refered to as the dimensionality problem.Random forests algorithms combine the randomization of decision trees with bootstrap aggregating (bagging) in order to maximize the classification accuracy.Bagging uses randomly sampled subsets of the main dataset to train the ensemble model.The most important risk of ensemble models is overfitting, when a model describes random error or noise instead of the underlying relationships, thus achieving a poor predictive performance.In this report, the approach of Ishwaran et al was followed strictly.He proposed a method to calculate an ensemble mortality, meaning the expected total number of deaths derived from the cumulative hazard fubction of the model [14].In his method, C-index is used as a surrogate for prediction error, allowing the model to quantify misclassification and avoid overfitting.
Valproic acid (valproate sodium) can be used for treating seizures in brain tumor patients and has recently demonstrated possible antitumor effects [16].It is a non-hepatic enzyme inducing anti-epileptic drug and hence has little interaction problems with chemotherapy, although it can induce significant side effects [16,17].Seizure prophylaxis in brain tumor patients is controversial and not routinely recommended [18].However, proper evidence is actually lacking and the decision to start an antiepileptic drug for seizure prophylaxis or treatment in children with brain tumor is ultimately guided by assessment of individual risk factors and careful discussion [16,19].Valproic acid has demonstrated antiproliferative effects in glioma cells lines.However, these properties have been non-uniform and considerable differences do exist between the effects and molecular mechanisms of action of valproic acid in diverse cell lines [20].Additionally, valproic acid has induced sensitization of glioma cells to temozolomide and gamma-radiation induced toxicity [21].
The broad mechanism of action of valproic acid is gene expression modulation by the inhibition of histone deacerylases [22].Histone deacetylases (HDAC) are enzymes that remove acetyl residues from histone protein aminoacids, modifying its tridimensional structure.This renders DNA strands more tighly wrapped by histones, impairing the transcription of a great number of cell genes.Ordinarily fine tuned, this mode of epigenetic gene expression control is substantially disordered in cancer cells.Inhibitors of HDAC thus open a new oportunity to cancer treatment [22].Preclinical and clinical data has shown that valproate inhibits tumor growth and have activity against a varied number of animal tumor models and human cancers [23].Antiangiogenic properties, secondary to HDAC inhibition could partially explain in vivo antiproliferative action of valproate on animal tumor models [24].
Valproic acid administration to children with high-grade glioma heavily treated with chemotherapy was safe in a trial cohort [25].We have used a low prophylatic dose of valproate, and trials that used higher doses have successfully demonstrated surrogate markers of epigenetic inhibition in human patients [23].Nevertheless, oral chronic administration of 10-20mg/kg/day of valproate for children can achieve sustained therapeutic range plasma concentrations [26].Our patients had not surrogate markers of epigenetic inhibition or valproate plasma levels measured.Our previous study that included patients with ependymoma, high-grade glioma and medulloblastoma showed a difference in survival in the valproate-treated group, even though this difference was not statistically significant, and we attributed it mainly to chance [9].The non-specific action of valproate on gene expression could explain why it seems to induce different effects in different cell types.In our present report, we selected from that cohort only the group of patients with ependymoma and used an alternative survival analysis approach that can overcome the dimensionality problem, allowing us to detect some purported interactions with a low number of patients and underpowered retrospective design.Because of the potential overfitting, these results must be confirmed by other retrospective evaluations, coupled with preclinical data and possibly carefully planned prospective clinical trials.

CONCLUSIONS:
The analysis of this single center cohort of pediatric patients with ependymoma has confirmed surgery resection extent and treatment with radiotherapy as independent predictors of overall survival in this group of patients.These factors were also shown to be predictive in other studies and cooperative trials [3,8,27].While this concordance highlights the potential usefullness of the RSF model in gathering information from retrospective data, it does not induce any specific certainty upon our other finding of the relation between valproic acid use and outcome.However, we believe that our results merit consideration in light of the recent findings regarding the effect of HDAC inhibitors upon the survival of brain tumor patients.We suggest that a multicentric retrospective cohort evaluation of children with ependymoma that received valproate could test this hypothesis.Additionally, one needs more preclinical data about HDAC inhibition and ependymomas.We believe that this is an exciting new line of research.
top 2 variables, until include all selected predictors.The variation in error rate was checked for each nested model.Finally, a bootstraped Cox proportional hazards model was constructed and compared with the RSF model for predictive accuracy of the model and for selection of important risk factors for all-cause mortality.Prediction accuracy for the Cox proportional hazards model was assessed by Harrell C-index OOB data.OOB estimate of C-index for Cox model was based on 100 bootstraps [14,15].Data descriptive statistics and statistical calculations were performed on R 2.12 for Mac OSX (R Foundation for Statistical Computing, 2010).

Preprints
(www.preprints.org)| NOT PEER-REVIEWED | Posted: 3 November 2016 doi:10.20944/preprints201611.0028.v1impact of each of the predictor variables in the outcome (survival time), weighted by all other variables was plotted in figure 2. Boxplots represent mean and 2 standard deviations from it (whiskers) and loess estimates of partial values (boxes).They represent effect size.

Figure 1 :
Figure 1: Simple Kaplan-Meier estimates were calculated (time in months).Univariate analysis with log-rank test indicated that surgical extent (blue: complete, red: incomplete or biopsy only), radiotherapy (blue: treated, red: not treated) and valproic acid therapy (same) modified survival, but not tumor site (blue: spinal, green: supratentorial, red: posterior fossa).

Figure 3 :Figure 4 :Figure 5 :
Figure 3: Trees sampled from the modelled survival forest (all the predictors).Variables are color-coded and each vertex has a number indicating its level in the tree.Name of the variable and splitting rule are depicted (e.g.: a single value in the case of age, a continuous variable; the others are dichotomous variables).RT = radiotherapy treatment; Surg = surgical treatment; ana = anaplasia; MX = metastasis; VPA = valproic acid treatment; chemo = chemotherapy.