A Real Data-Driven, Robust Survival Analysis on Patients who Underwent Deep Brain Stimulation for Parkinson’s Disease by Utilizing Parametric, Non-Parametric, Semi-Parametric Approaches

Malinda Iluppangama; Dilmi Abeywardana; Chris Tsokos

doi:10.20944/preprints202502.0072.v1

Submitted:

31 January 2025

Posted:

03 February 2025

You are already at the latest version

Abstract

Parkinson’s Disease (PD) is a devastating neurodegenerative disorder that affects millions of people around the globe. Many researchers are continuously working to understand PD and develop treatments to improve the condition of PD patients that affects their day-to-day lives. Since the last decades, the treatment, Deep Brain Stimulation (DBS) has given promising results for motor symptoms by improving the quality of daily living of PD patients. In the methodology of the present study, we have utilized sophisticated statistical approaches such as Nonparametric, Semiparametric, and robust Parametric survival analysis to extract useful and important information about the long-term survival outcomes of the patients who underwent DBS for PD. Finally, we were able to conclude that the probabilistic behavior of the survival time of female patients is statistically different from that of male patients. Furthermore, we have identified that the probabilistic behavior of the survival times of Female patients is characterized by the 3-parameter Lognormal distribution while that of Male patients is characterized by the 3-parameter Weibull distribution. More importantly, we have found that the Female patients have higher survival compared to the Male patients after conducting a robust parametric survival analysis. Using the semi-parametric COX-PH, we found that the initial implant of the right side leads to a high frequency of events occurring for the female patients with a bad prognostic factor while for the male patients, a low events occurring frequency with a good prognostic factor. Furthermore, we have found an interaction term between the number of revisions and the initial size of the implant which increases the frequency of events occurring for the Male patients with a bad prognostic factor.

Keywords:

deep brain stimulation

;

movement disorder

;

Parkinson’s Disease

;

survival analysis

;

COX-PH

;

parametric

;

semi-parametric

Subject:

Biology and Life Sciences - Life Sciences

1. Introduction

Parkinson’s Disease (PD) is a devastating neurodegenerative disorder that affects the brain neuron system and medical professionals and research scientists have been studying PD for a long period of time. However, the cause of PD still remains undetected. If the cause of PD is unknown it is called idiopathic PD and is the most common type of PD. Also, patients who get PD before 50 years of age are known as young-onset Parkinson’s disease. Unintended/Uncontrollable movements such as shaking stiffness, and loss of balance are some of the major symptoms of PD, among others. Symptoms of PD start slowly, progress over time, and finally, get worse over time. Patients in the later stage of PD are unable to walk or talk, [1,2]. Moreover, PD is a well-known symptom of the devastating dementia disease.

Over 10 million people around the globe suffer from PD and more than 50,000 people are diagnosed with PD each year only in the United States. However, these numbers do not represent thousands of undetected cases of PD around the world, [3]. In our previous studies, we have utilized a real data-driven machine learning modeling approach to detect PD patients with a very high degree of accuracy, [4] and invented a real data-driven analytical model to identify the causes of PD through Unified Parkinson’s Disease Rating Scale (UPDRS), [5].

Research scientists have been working through the last few decades on therapies and drugs to slow down or cure this devastating disease. Lepadopa and Dopamine agonist are the most common medications among others to slow down this devastating disease, [6]. Also, Deep Brain Stimulation (DBS) is a surgical therapy that has been used as a treatment to improve the movement functions and the quality of life of patients with movement disorders. Since movement disorders are one of the major symptoms of PD, DBS is widely used to treat PD. DBS is one of the most powerful innovative treatments that showed promising results for the movement symptoms of advanced PD, such as bradykinesia, essential tremor, and other motor complications of PD. Also, it reduces other medication usages of an individual patient with PD, [7,8,9].

In the literature we found that several studies have confirmed that the survival times of PD patients increase with DBS therapy compared to the PD patients who are taking other treatments, [10]. For instance, we found several articles that have been trying to analyze the long-term survival outcome of PD patients who underwent DBS, and all the articles uniformly conclude the significant overall improvement of the patient’s motor symptoms. Also, medication like levodopa equivalent daily dose was reduced over 55% for 5 years of follow-up after DBS, [11,12]. Furthermore, we found that a greater number of publications shows over 98% survival rate after 5 years of DBS treatment and also it shows a very high survival rate even after 3 years of DBS treatment, [13,14,15]. In another study the estimated long-term (10 years) survival probability was identified as 51%, [2]. However, in the literature, most researchers utilized the most widely used nonparametric Kaplan-Meier approach to perform the analysis and made conclusions about the survival of the patients who underwent DBS for PD, but it is neither a powerful nor a robust method. Furthermore, researchers utilized the semiparametric Cox-Proportional Hazards approach without addressing its underlying important assumptions and made conclusions about the survival of the patients and these results may be misleading because of the inappropriateness/invalidness of the developed models with unsatisfied underlining assumptions.

In this study, we analyze the survival of the patients who underwent DBS for PD more sophistically and investigate the sustainability of the long-term survival of PD by utilizing well-defined robust statistical methodologies such as Nonparametric, Semiparametric, and Parametric. Thus, the rest of this paper is organized as follows. Section 2 and Section 3 consist of a descriptive analysis of the data and a brief introduction to sophisticated statistical methodologies, respectively. Finally, Section 4 and Section 5 discuss the results and the conclusions of the study highlighting the usefulness of this study and concluding remarks.

2. Materials and Methods

2.1. The Real Data

The real data consists of information on individuals who underwent the DBS treatment for PD starting from 1999 to 2006 with at least 10 years of follow-up at Pennsylvania Hospital. We had the opportunity to get access to this data through the contact of Dr.Ashwin Ramayya (MD), [2]. The data consists of a total of 320 individuals who underwent DBS for PD. However, we considered only complete observations (deceased patients) for our study of survival analysis and it consists of only 149 individual patients after removing the censored observations and the missing observations without using any data imputation method to replace those missing survival times of the individual patients.

Age, Age at Implant, Gender, Survival time (in months), Initial implant side, and Number of revisions are the six individual covariates that we have used to develop the present study. The final data frame consists of individual patients in the range of age from 40 years to 93 years old and the mean age of the patients who underwent DBS surgery is 64.41 years (± 8.59 standard deviation). Also, 73% out of the total individual patients are Males. Furthermore, 69% of the patients underwent bilateral DBS for PD. The following data diagram in Figure 1 and Table 1 illustrate the exploratory data analysis for some continuous covariates.

2.2. Methodology

In this study we utilize the well-known non-parametric Kaplan-Meier (KM) approach, robust Parametric approach, and widely used semi-parametric multivariate Cox-PH method to analyze the long-term survival outcomes of the individuals who underwent DBS for PD. In the following section, each of the proposed models and their usefulness is briefly discussed.

2.2.1. Kaplan-Meier Survival Analysis

Kaplan-Meier (KM) estimation of the survival time is a fully non-parametric approach for analyzing the time of occurring the event of interest (time until death) and it was first introduced in 1958, [16]. KM is a well-known approach in the field of survival analysis and reliability analysis. However, KM is not as powerful nor robust as the Parametric or Bayesian survival analysis since it does not make any distributional assumption about underlying survival time. KM involves estimating the survival probabilities of patients at a certain point of time as follows,

S_{t} = \frac{N u m b e r o f p a t i e n t s l i v i n g a t t h e s t a r t (t_{0}) - N u m b e r o f p a t i e n t s d i e s a t t i m e t}{N u m b e r o f p a t i e n t s l i v i n g a t t h e s t a r t (t_{0})} .

Suppose that there are n individual patients at the start of the time (

t_{0}

) and their respective observed event times (died at time

t_{i}

) are

t_{1} < t_{2} < t_{3} < . . . < t_{k}

with

d_{i}

units fail at time

t_{i}

. Then the KM estimator of the survival function

S_{K M} (t)

is given by following equation,

S_{K M} (t) = \prod_{i : t_{i} < t} \frac{(n_{i} - d_{i})}{n_{i}} = \prod_{i : t_{i} < t} (1 - \frac{d_{i}}{n_{i}}),

(1)

where

t > 0

,

n_{i}

is the number of patients at risk at time

t_{i}

of the study,

d_{i}

is the number of patients who died at time

t_{i}

. We proceed and calculate survival probabilities using KM estimators given by the Equation (1) under the following assumptions, [17,18,19],

Patients who have been censored have the same survival prospects as those who continue the follow-up.
Survival probabilities are the same for individuals who were recruited early and late in the study.
Event happens at the time specified.

2.2.2. Parametric Survival Analysis

In this section, we briefly discuss the more powerful and robust parametric survival analysis and its usefulness.

Let T be a non-negative random variable, which measures the survival time of an individual patient, and its probabilistic behavior is characterized by the well-defined probability density function

f (t)

. Then the probability that an event occurs prior to time t can be defined as the cumulative distribution function and is given below by Equation (3.1.2),

F (t) = P (T \leq t) = \int_{0}^{t} f (t) d t, t > 0,

(2)

where

f (t)

is the probability density function of the survival time.

The survival of an individual patient after time t is given by the Survival Function and it can be derived from the above CDF. The generalized analytical form of the survival function can be derived as follows in Equation (3).

S (t) = 1 - F (t) = P (T > t) = \int_{t}^{\infty} f (t) d t, t > 0 .

(3)

One can utilize Equation (3.1.2) and Equation (3) to derive the hazard function to measure the chances of instantaneous death of a given patient at any time t as follows in Equation (4),

h (t) = lim_{δ t \to 0} \frac{P (t < T \leq t + δ t | T > t)}{δ t} = \frac{f (t)}{S (t)} .

(4)

2.2.3. Multivariate Cox-Proportional Hazards Model

In the context of survival/reliability analysis, the hazard function plays a major role. It explains the instantaneous risk of an event at time t. That is, the probability of the desired event occurring any time after time t is the hazard function under the assumption that individual observation has not experienced the desired event until time t.

Let T be a random variable that represents the survival time of a given patient. Then the probability that an event occurs prior to or at time t is defined by the cumulative distribution function (CDF) and is given below,

F (t) = P (T \leq t) = \int_{0}^{t} f (t) d t, t > 0,

(5)

where the probability density function can be derived as follows,

f (t) = \frac{d F (t)}{d t} .

(6)

The probability of a given patient surviving after time t (Survival function) can be defined by the following Equation (3.1.2),

S (t) = 1 - F (t) = P (T > t) = \int_{t}^{\infty} f (t) d t, t > 0 .

(7)

Utilizing Equation (6) and Equation (3.1.2), the hazard function for an individual patient at time t can be derived as follows,

h (t) = lim_{δ t \to 0} \frac{P (t < T \leq t + δ t | T > t)}{δ t} = \frac{f (t)}{S (t)} .

(8)

To estimate the hazard function the Cox Proportional Hazards Model (Cox-PH) was introduced by D. R Cox in 1972, [20], which specifies the hazard function directly rather than deriving it as the ratio of probability density function and survival function. The analytical form of the Cox-PH model is given by Equation (9),

h_{i} (t) = h_{0} (t) e x p (\sum_{j = 1}^{p} β_{j} X_{j} + \sum_{j = 1}^{p} α_{j K} X_{j} * X_{K}),

(9)

where

h_{0} (t)

is the baseline hazard function and it depends on only the survival time.

β_{j}

is the coefficient parameter for

j^{t h}

covariate and

α_{j k}

is the coefficient parameter for the interaction between

j^{t h}

and

k^{t h}

covariates. In the Cox-PH model, the baseline hazard function

h_{0} (t)

is unspecified and it is treated as a non-parametric function of time t. However, the Cox-PH model assumes the time independence of the covariates, linearity in the covariates, additivity, and proportional hazards. Thus, the Cox-PH model is considered as a semi-parametric survival model, [21]. To estimate the parameters of the Cox-PH model we cannot use the full likelihood approach since the baseline hazard function is unspecified. Thus, the partial likelihood approach proposed by Cox in 1979 is utilized to estimate the model coefficients of the Cox-PH model, [22].

In many situations, researchers are interested in the factor exp(coef) of the model rather than the estimating entire

h_{i} (t)

which determines how an individual patient varies as a proportion to the baseline hazard function. Equation (9) can be rearranged to find the hazard ratio for a given individual patient as given below,

H R = \frac{h_{i} (t)}{h_{0} (t)} = e x p (\sum_{j = 1}^{p} β_{j} X_{i j}) .

(10)

Suppose that we have two types of treatment groups, namely treatment group (T) and placebo group (P). Then we can extract the following important and useful information about the two treatment groups using the HR,

HR $> 1 (β_{i} > 0$ ). The treatment group has a higher hazard over the placebo group. (i.e., the placebo group is favored.)
HR $\approx 1$ . The treatment group and placebo group have no significant difference.
HR $< 1 (β_{i} < 0$ ). The treatment group has a lower hazard over the placebo group. (i.e., the treatment group is favored.)

3. Results

Prior to model the survival times, we assess whether the median survival time of the Male patients who underwent DBS for PD is statistically significantly different from that of female patients by utilizing hypothesis testing such as Wilcoxon rank sum test and log-rank test, [23,24,25]. In the following section, we have defined our primary null hypothesis and alternative hypothesis,

H_{0}

: Median Survival Time of Male patients is not different from that of Female patients.

Vs

H_{1}

: Median Survival Time of Male patients is different from that of Female patients.

Figure 2. Visualization of Log-Rank Test Results

The Wilcoxon rank sum test gave a p-value of 0.029 and for the log-rank test p-value is 0.013. Both Wilcoxon and log-rank tests uniformly confined that p-values are less than the 5% significant level. Thus, we could reject the null hypothesis and concluded that the median survival time of male patients is different from that of females. Since the median survival time of males is significantly different from that of females we consider males and females as two different entities and continue our analysis separately for male patients and female patients.

3.1. Univariate Survival Analysis

In this subsection, we summarize the results of parametric and nonparametric univariate survival analysis for PD patients who had DBS.

3.1.1. Kaplan Meier (KM)

KM survival probabilities have been estimated for both male patients and female patients separately because initially, we identified that their survival times are statistically significantly different. The KM estimation results for the Male patients and Female patients are summarized in Table 2 and graphically illustrated by Figure 3 and Figure 4 for Females and Males, respectively.

When carefully investigating the above KM plots for the two genders it is clear that the median survival time of female patients is higher than the median survival time of the Male patients after performing DBS for PD.

3.1.2. Parametric Survival Analysis

In the case of Parametric survival analysis, it is essential to identify the probability distribution functions that characterize the probabilistic behavior of the survival time of the Male patients and Female patients separately since their median survival times are statistically significantly different. Thus, we identified the best-fitted probability distribution functions for both male and female patients that characterize the probabilistic behavior of their survival times and utilized well-defined goodness of fit tests, Kolmogorov-Smirnov, Anderson-Darling, and Chi-Squared to validate the resulted probability distribution functions. The goodness of fit test results are summarized in Table 3, given below.

The goodness of fit test results given by Table 3 is uniformly confirmed that the probabilistic behavior of the survival time of the Male patients are characterized by the 3-parameter Weibull distribution and its analytical form is given by the mathematical function 11, given below. Furthermore, parameters of the Weibull distribution were estimated by using the most commonly applied Maximum Likelihood Estimation (MLE) method, [26].

f_{M} (x | θ, η, τ) = \frac{η}{θ} {(x - τ)}^{η - 1} e x p {- \frac{1}{θ} {(x - τ)}^{η}}, η > 0, θ > 0, x > τ,

(11)

where

θ

,

τ

and

η

are Shape, Scale, and Location parameters, respectively. MLE estimates for the model parameters are

θ

= 4.9593,

η

= 186.21,

τ

= -76.262.

Once the parameters are estimated the shape of the Weibull distribution that characterizes the probabilistic behavior of the survival times of the male patients is graphically illustrated by Figure 5, given below.

From Figure 5 one can compute the probabilities of the survival times of the patients who underwent DBS for PD between time

t_{i}

and

t_{i + 1}

. As an illustration, we can compute the probability of survival between 25 months to 50 months for a PD patient who underwent DBS as

P (25 \leq t \leq 50)

\approx 0.00475 - 0.002 \approx 0.00275

. In Figure 5 we can clearly see the probability of a patient who will survive between 25 months to 50 months is approximately

0.3 %

.

Once we have the analytical form that characterizes the probabilistic behavior of the survival time of the male patients, the Cumulative Distribution Function, (CDF) for male patients (

F_{M} (t)

) can be derived as follows in Equation (12) and its graphical illustration for male patients is by Figure 6.

F_{M} (t | θ, η, τ) = P (T \leq t) = \int_{0}^{t} f_{M} (x | θ, η, τ) d x, t > 0,

F_{M} (t | θ, η, τ) = 1 - e^{- {(\frac{t - τ}{θ})}^{η}} .

(12)

Identifying the CDF for a given random variable is a crucial achievement in probabilistic analysis because the CDF determines the probability or likelihood that a random variable will take at a specific value or less than that value. Thus, one can estimate the probability that an individual patient who underwent DBS for PD will survive until time t. Suppose a PD patient who underwent DBS wants to know what is the probability that he live until 140 months. That is

P (t \leq 140) \approx 0.88

as sketched in Figure 6.

Furthermore, the patient can estimate his/her survival probability beyond 140 months by utilizing the survival function. Since we have already estimated the CDF the derivation of the Survival Function for the male patients and its analytical form is given by Equation (13) and its behavior by Figure 7.

S_{M} (t | θ, η, τ) = 1 - F_{M} (t | θ, η, τ) = P (T > t) = \int_{t}^{\infty} f_{M} (x | θ, η, τ) d x, t > 0,

S_{M} (t | θ, η, τ) = e^{- {(\frac{t - τ}{θ})}^{η}} .

(13)

Thus, as shown in Figure 7 an individual patient surviving beyond 140 months can be estimated using Equation (13) and that is approximately

0.12

.

According to the summary of the goodness of fit test results given in Table 3, the survival times of the Female patients are characterized by the Lognormal (3P) distribution and its analytical form is given by the following Equation (14).

f_{F} (x | μ, σ, γ) = \frac{1}{(x - γ) σ \sqrt{2 π}} e x p {- \frac{{[l n (x - γ) - μ]}^{2}}{2 σ^{2}}}, γ < x, - \infty < μ < \infty, σ > 0,

(14)

where

μ

,

γ

, and

σ

are Scale, Location, and Shape parameters of the 3P-Lognormal distribution, respectively. We have utilized the Maximum Likelihood Estimation (MLE) approach to estimate those three parameters of the Lognormal distribution as illustrated in the literature, [27,28]. Thus, the estimated parameters are

μ

= 7.138,

σ

= 0.03655,

γ

= -1149.2. The graphical visualization of the shape of the Lognormal distribution is given by the following Figure 14. For example, one can interpret the estimated probability that a PD patient will survive between 50 months and 100 months by using the probability distribution of the survival time of the female patients as shown in Figure 14. That is,

P (50 \leq t \leq 100) = 0.0085 - 0.0032 = 0.0053

.

Once we derived the analytical form that characterizes the probabilistic behavior of the survival times of the female patients, the Cumulative Distribution Function (CDF) for female patients (

F_{F} (t)

) can be defined as the following Equation (15), given below, and its graphical illustration for female patients is given by Figure 9. We can use this CDF to estimate the probability of survival of the female patients until time t

P (t \leq T)

. For example, suppose that we want to estimate the probability of surviving until 140 months (

P (t \leq 140)

) using the CDF of the female patients given in Equation (15). Then the estimated probability is

P (t \leq 140) \approx 0.75

. The graphical visualization of the estimated value is shown in Figure 9.

F_{F} (t | μ, σ, γ) = ϕ (\frac{l n (t - γ) - μ}{σ}) .

(15)

Finally, the estimated CDF can be used to derive the Survival Function for the female patients and its analytical form is given by Equation (16) and its graphical visualization of the behavior by Figure 10.

S_{F} (t | μ, σ, γ) = 1 - F_{F} (t | μ, σ, γ) = 1 - ϕ (\frac{l n (t - γ) - μ}{σ}) .

(16)

Concerning the graphical illustration of the survival functions of the male and female patients, it is noticeable that the female patients have significantly higher median survival time than the median survival time of the male patients. This outcome is confirmed by the univariate results obtained from the KM survival analysis. However, the univariate survival analysis was performed under the assumption that all the patients died under the same cause regardless of their other information.

3.2. Multivariate Survival Analysis

3.2.1. Cox-PH Model

The Multivariate Cox-Ph model developing procedure was started with four individual risk factors that are identified by the medical professionals and believed to significantly contribute to the survival of a patient who underwent DBS, [Reference for Risk factors]. Thus, we initialize the model development procedure with four individual risk factors for both genders, but the performance of the models was not impressive. Thus, we utilized all individual risk factors and their two-way interactions along with the step-wise elimination method. Equation (17) expresses the analytical form of the final cox-proportional hazard model for male patients and Equation (18) gives the final analytical form for the female patients.

\hat{h_{M} (t)} = h_{0} (t) e x p (0.03074 X_{1} - 2.90540 X_{2 (1)} - 3.09802 \sqrt{X_{3}} + 2.63631 X_{2 (1)} : \sqrt{X_{3}}),

(17)

\hat{h_{F} (t)} = h_{0} (t) e x p (0.17810 X_{1} + 2.45162 X_{2 (1)} - 2.07761 X_{3}),

(18)

where

h_{M} (t)

is the hazard function for male patients at time t,

h_{F} (t)

is the hazard function for female patients at time t,

h_{0} (t)

is the arbitrary baseline hazard function,

X_{i (l)}

is the

i^{t h}

covariate with level l (for a categorical risk factor) as described in Section 2.

The above analytical models can be rearranged into a linear relationship between risk factors and logarithmic of the relative hazard ratios as follows for males and females, respectively.

l o g (\frac{\hat{h_{M} (t)}}{h_{0} (t)}) = 0.03074 X_{1} - 2.90540 X_{2 (1)} - 3.09802 \sqrt{X_{3}} + 2.63631 X_{2 (1)} : \sqrt{X_{3}}

(19)

l o g (\frac{\hat{h_{F} (t)}}{h_{0} (t)}) = 0.17810 X_{1} + 2.45162 X_{2 (1)} - 2.07761 X_{3}

(20)

The Table 4, given below, consists of the information about the developed Cox-PH models for male patients and female patients including the estimates of model coefficients (parameters), hazard ratios (exp(coef)), standard errors of coefficients and 95% lower and upper confidence intervals for the hazard ratios. Also, Table 5, given below, provides the global statistical significance values for the overall significance of the proposed Cox-PH models by using three hypothesis tests: Likelihood ratio test, Wald test, and Score test and all the p-values (

< < 0.05

) uniformly confirm that the proposed Cox-PH models are statistically significant at 5% level of significance. These three tests are asymptotically equivalent. However, for small samples, the likelihood ratio test slightly outperforms the Wald test and both outperform the Score test.

Since the hazard ratio (exp(coef)) of the Cox-PH model measures the relative risk it is possible to interpret the model coefficients in a meaningful way. If the model coefficient is positive (

\hat{β} > 0

), i.e., hazard ratio, (exp(coef)) > 1 then the event of occurrence is high with a bad prognostic factor. However, if the coefficient is negative (

\hat{β} <

0), i.e., hazard ratio, (exp(coef)) < 1 then the event of occurrence is low with a good prognostic factor. In the case of proposed models, both male and female patients consist of the coefficient of

X_{1}

(

{\hat{β}}_{M X 1} = 0.050 > 0

and

{\hat{β}}_{F X 1} = 0.114 > 0

) and are positive. It means the age at implant of both male and female patients has a higher risk for death with a bad prognostic. In the case of coefficients of

X_{2}

, the behavior of the two models is different from each other. For the female patients

X_{2}

(

{\hat{β}}_{F X 2} = 1.014 > 0

) has positive coefficient and has higher risk for death with a bad prognostic while male patients

X_{2}

(

{\hat{β}}_{M X 2} = - 2.74 < 0

) has negative coefficient and the event of occurrence is low risk with a good prognostic factor.

3.2.2. Validation of the Assumptions

In this section, we assess the five major assumptions of a Cox-PH model that need to be satisfied to perform a high-quality and robust survival analysis.

(1): Proportional hazards assumption

We have performed the statistical test and also the graphical visualization based on Scaled Schoenfeld Residuals to validate the proportional hazards assumption on the Cox-PH model. Table 4, given below, consists of p-values for the test based on scaled Schoenfeld residuals. The P-values for all covariates are greater than $0.05$ at the 5% level of significance confirming the fact that our proposed models for male and female patients satisfy the proportional hazards assumption.

Next, we visualize the graphical approaches of the scaled Schoenfeld residuals given in Figure 11 and Figure 12 for Male patients and Female patients, respectively. Both figures indicate a zero slope pattern by confirming the proportional hazard assumption.
(2): Linear functional form of Continuous variables

The linearity Assumption is one of the major underlying assumptions of the Cox-PH model that should be satisfied by the final model to achieve more robust results without misinterpretation. The graphical illustration of the Martingale residuals for the continuous covariates can be used to identify a violation of this assumption. Figure 13 and Figure 14 uniformly confirm that both Cox-PH models given by Equations (17) and 18 have satisfied the linearity assumption as given below for the continuous risk factors.
(3): Check for Extreme Observations

Under the next assumption, we check for the extreme values in our final Cox-PH model by utilizing Devience residual plots as given by Figure 2 and Figure 16. Both figures don’t consist of any extreme outline observations. Thus, we can conclude that the proposed models have met the underlying assumption of no outliers of extreme observations.
(4): Overall Fit of the Cox-PH Model

Finally, we assessed the overall fit of the proposed Cox-PH models using the cumulative hazard of the Cox-Snell residuals plot. Figure 17 and Figure 18, given below, graphically visualize the Cox-Snell residuals plots for both genders and they follow a straight line with a zero intercept and a unit slope by confirming the high quality of the proposed models.

Thus, the proposed models for Male patients and Female patients have satisfied the underlying assumptions of the Cox-PH model and its high quality.

4. Discussion and Conclusions

In the scope of this paper, we have utilized nonparametric, semi-parametric, and parametric survival analysis to perform systematic and deep univariate and multivariate survival analysis for the patients who underwent DBS for PD. Within the first part of this study, the univariate survival analysis was performed through widely used nonparametric KM estimation method and more powerful and robust parametric approaches under the assumption that all patients deceased due to the same cause of death. Thus, we were able to compute and compare the survival probabilities using nonparametric KM and parametric methods for an individual patient. Since the survival times of the male and female patients indicated a statistically significant difference as shown in Section 3, we performed survival analysis for male and female patients independently. In the sense of parametric survival analysis, 3-Parameter Lognormal Distribution probabilistically characterizes the survival times of the Female patients while the probabilistic behavior of the survival times of the Male patients is characterized by 3-Parameter Weibull Distribution.

Figure 19 and Figure 20, given below, graphically illustrate the comparison of the parametric and nonparametric survival curves for female patients and male patients, respectively. According to the graphical illustration, it is clear that KM method underestimates the survival probabilities of the female patients while it overestimates the survival probabilities for the male patients. At this point of our study, we rely on survival probabilities estimated by the robust and more powerful parametric approach because of its robustness as explained previously compared to the widely used KM method. As shown in Figure 19 the median survival time of the female patients is approximately 110 months while in Figure 20 the median survival time for male patients is approximately 95 months. Thus, we can conclude that the female patients who underwent DBS for PD have higher survival compared to that of male patients. However, in the case of univariate survival results, they are based on the assumption that all the patients’ underlined cause of death is the same without considering personalized risk factors or other relevant information.

Then, the effect of covariates was taken into account for both female and male patients separately and developed their own Cox-PH models. In the developing process, we considered the three individual covariates: Age at Implant, Initial Side of Implant, and Number of Revision. Furthermore, the two-way interactions of the individual covariates are taken into account. However, it is important to note that in the literature, researchers do not consider interaction effects when developing a Cox-PH model. In Section 3.2, we have developed the independent Cox-PH models for male patients and female patients. The final model for Male patients given by Equation (17) consists of three individual statistically significant covariates: Age at Implant, Initial Side of Implant, the square root of the Number of Revision and a statistically significant two-way interaction term between Initial Side of Implant and the square root of the Number of Revision. Moreover, the final Cox-PH model for female patients consists of only three individual statistically significant covariates: Age at Implant, Initial Side of Implant and Number of Revision as given by Equation (18). In the model 17, the individual covariate, Initial side has

H R < 1

. However, when it comes to the interaction between the Initial Side of Implant and the Number of Revision it has

H R > 1

. This is a perfect example to illustrate the importance of adding interaction terms. Furthermore, in Section 3.2.2 we have addressed all underlying assumptions of the Cox-PH model and they uniformly confirmed the robustness of our developed model and its high quality.

Once the Cox-PH model is finalized by addressing all underlying assumptions, the following important results are available concerning the long-term survival outcomes of the patients who underwent DBS for PD.

For the covariate, Initial Side of Implant, the HR value (2.76) for female patients is greater than 1, which means the risk of death is higher with a bad prognosis. That is, Initial Right Side Implant has a higher risk of death than Initial Left Side Implant. For the male patients, the HR value (0.064) is less than 1, which means the risk of death is lower with a good prognosis.
For the interaction between Initial Side of Implant and the square root of Number of Revisions, the HR values are greater than 1, which means the risk of death is higher with a bad prognosis. That is, Number of Revisons that a patient has done has a stronger effect on hazard ratio with the Right side of the Initial Implant compared to the Left side of the initial implant.
Based on the hazard ratios, the most contributing individual covariates are Age at Implant, Initial Side of Implant and Number of Revision for male patients, and Initial Side of Implant, Age at Implant, Number of revision for female patients, respectively. Furthermore, the ranking of the individuals and interaction terms for both models are given by Table 7.

Author Contributions

Conceptualization, Malinda Iluppangama and Chris Tsokos; methodology, Malinda Iluppangama and Dilmi Abeywardana; software, Malinda Iluppangama and Dilmi Abeywardana; validation, Malinda Iluppangama, Dilmi Abeywardana and Chris Tsokos; formal analysis, Malinda Iluppangama and Dilmi Abeywardana; investigation, Malinda Iluppangama; resources, Chris Tsokos; writing original draft preparation, Malinda Iluppangama and Chris Tsokos; writing review and editing, Chris Tsokos; visualization, Malinda Iluppangama and Dilmi Abeywardana; supervision, Chris Tsokos; project administration, Chris Tsokos. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Informed Consent Statement

The research project used anonymized patient records for retrospective analysis, which waived the need for informed consent.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author(s).

Acknowledgments

The authors would like to acknowledge clinical data shared by the University of Pennsylvania research team (Dr. Ashwin Ramayya(MD)).

Conflicts of Interest

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Abbreviations

The following abbreviations are used in this manuscript:

PD	Parkinson’s Disease
DBS	Deep Brain Stimulation
Cox-PH	Cox Proportional Hazards
KM	Kaplan-Meier

References

National Institute of Ageing. Parkinson’s Disease: Causes, Symptoms, and Treatments, 2024. https://www.nia.nih.gov/health/parkinsons-disease/parkinsons-disease-causes-symptoms-and-treatments, Last accessed on 2024-08-10.
Hitti, F.L.; Ramayya, A.G.; McShane, B.J.; Yang, A.I.; Vaughan, K.A.; Baltuch, G.H. Long-term outcomes following deep brain stimulation for Parkinson’s disease. J Neurosurg 2020.
National Institute of Neurological Disorders and Stroke. Parkinson’s Disease: Challenges, Progress, and Promise, 2024. https://www.ninds.nih.gov/current-research/focus-disorders/parkinsons-disease-research/parkinsons-disease-challenges-progress-and-promise, Last accessed on 2024-08-10.
Iluppangama, M.; Abeywardana, D.; Tsokos, C. Utilizing Real Data-Driven Popular Machine Learning Methods To Determine Patients With Parkinson’s Disease Subject To Identified Unique Risk Factors. Under Review.
Iluppangama, M.; Abeywardana, D.; Tsokos, C. A Real Data-Driven Analytical Model That Identifies The Risk Factors of Parkinson’s Disease. Under Revieww.
Zahoor, I.; Shafi, A.; Haq, E.; Pharmacological. Treatment of Parkinson’s Disease. Pathogenesis and Clinical Aspects [Internet]. Brisbane (AU) 2018.
Parkinson’s Foundation. Considering Deep Brain Stimulation. https://www.parkinson.org/library/fact-sheets/deep-brain-stimulation, 2024. Accessed: 2024-08-10.
Rajamani, N.; Friedrich, H.; Butenko, K. Deep brain stimulation of symptom-specific networks in Parkinson’s disease. Nature Communications 2024.
Deuschl, G.; Schade-Brittinger, C.; Krack, P.; Volkmann, J.; Schäfer, H.; Bötzel, K.; Daniels, C.; Deutschländer, A.; Dillmann, U.; Eisner, W.; et al. A Randomized Trial of Deep-Brain Stimulation for Parkinson’s Disease. New Engl. J. Med. 2006, 355, 896–908, [https://www.nejm.org/doi/pdf/10.1056/NEJMoa060281]. [CrossRef]
Ngoga, D.; Mitchell, R.; Kausar, J.; Hodson, J.; Harries, A.; Pall, H. Deep brain stimulation improves survival in severe Parkinson’s disease. J Neurol Neurosurg Psychiatry 2014.
Fasano, A.; Romito, L.M.; Daniele, A.; Piano, C.; Zinno, M.; Bentivoglio, A.R.; Albanese, A. Motor and cognitive outcome in patients with Parkinson’s disease 8 years after subthalamic implants. Brain 2010, 133, 2664–2676, [https://academic.oup.com/brain/article-pdf/133/9/2664/16697548/awq221.pdf]. [CrossRef]
Schüpbach, W.M.; Chastan, N.; Welter, M.L.; Houeto, J.L.; Mesnage, V.; Bonnet, A.M.; Czernecki, V.; Maltête, D.; Hartmann, A.; Mallet, L.; et al. Stimulation of the subthalamic nucleus in Parkinson’s disease: A 5 year follow up. J Neurol Neurosurg Psychiatry 2005.
Kim, A.; Kim, H.J.; Kim, A.; Kim, Y.; Kim, A.; Ong, J.; Park, H.; Paek, S.; B, J. The mortality of patients with Parkinson’s disease with deep brain stimulation. Front. Neurol. 13:1099862 2023. [CrossRef]
Ryu, H.S.; Kim, M.S.; You, S.; Kim, M.J.; Kim, Y.J.; Kim, J.; Kim, K.; Chung, S.J. Mortality of advanced Parkinson’s disease patients treated with deep brain stimulation surgery. J Neurol Sci 2016.
Kim, A.; Yang, H.J.; Kwon, J.H.; Kim, M.H.; Lee, J.; Jeon, B. Mortality of Deep Brain Stimulation and Risk Factors in Patients With Parkinson’s Disease: A National Cohort Study in Korea. J Korean Med Sci 2023.
Kaplan, E.L.; Meier, P. Nonparametric Estimation from Incomplete Observations. J. Am. Stat. Assoc. 1958, 53, 457–481.
Rich, J.T.; Neely, J.G.; Paniello, R.C.; Voelker, C.C.; Nussenbaum, B.; Wang, E.W. A practical guide to understanding Kaplan-Meier curves. Otolaryngol Head Neck Surg 2010.
Goel, M.; Khanna, P.; Kishore, J. Understanding survival analysis: Kaplan-Meier estimate. Int J Ayurveda Res 2010.
Abeywardana, D.; Iluppangama, M.; Tsokos, C. Parametric and Non-Parametric Survival Analysis on Papillary Thyroid Cancer. Under Review.
Cox, D.R. Regression Models and Life-Tables. Journal of the Royal Statistical Society. Series B (Methodological), Vol. 34 1972.
Abeywardana, D.; Iluppangama, M.; Tsoko, C. Survival Analysis on Papillary Thyroid Cancer Through Cox Proportional Hazard Model. submitted 2024.
Emmert-Streib.; Frank.; Matthias, D. Introduction to Survival Analysis in Practice. Mach. Learn. Knowl. Extr. 2019, 1, 1013–1038. [CrossRef]
Bland, J.; Altman, D. The logrank test. BMJ 2004.
Rey, D.; Neuhäuser, M., Wilcoxon-Signed-Rank Test. In International Encyclopedia of Statistical Science; Lovric, M., Ed.; Springer Berlin Heidelberg: Berlin, Heidelberg, 2011; pp. 1658–1659. [CrossRef]
Kim, H. Statistical notes for clinical researchers: Nonparametric statistical methods: 1. Nonparametric methods for comparing two groups. Restor Dent Endod 2014.
Carlos, M. Parametric and Bayesian Modeling of Reliability and Survival Analysis. PhD thesis, USF T ampa Gr aduate Theses and Dissertations., 2011.
Rodrigo, A. Estimating the Parameters of the Three-Parameter Lognormal Distribution. Master’s thesis, FIU Electronic Theses and Dissertations. 575., 2012.
Cohen Jr, A.C. Estimating Parameters of Logarithmic-Normal Distributions by Maximum Likelihood. J. Am. Stat. Assoc. 1951, 46, 206–212. [CrossRef]

Figure 1. Data Diagram of Risk Factors

Figure 3. KM Survival Plot for Female Patients.

Figure 4. KM Survival Plot for Male Patients.

Figure 5. Probability Distribution Function of Survival Times of Male Patients

Figure 6. Cumulative Distribution Function of the Survival Times of Male Patients

Figure 7. Survival Function of Male Patients.

Figure 8. Probability Distribution of Survival Times of Female Patients.

Figure 9. Cumulative Distribution Function of Survival Times of Female Patients

Figure 10. Survival Function for Female Patients.

Figure 11. Plots of Scaled Schoenfeld Residuals for Each Term of The Proposed Cox-PH Model for Male Patients

Figure 12. Plots of Scaled Schoenfeld Residuals for Each Term of The Proposed Cox-PH Model for Female Patients.

Figure 13. Martingale Residuals Plot for the Continuous Variables of Male Patients

Figure 14. Martingale Residuals Plot for the Continuous Variables of Female Patients

Figure 15. Deviance Residuals Plot for the Proposed Cox-PH Model for Male Patients

Figure 16. Deviance Residuals Plot for the Proposed Cox-PH Model for Female Patients

Figure 17. Cox-Snell Residuals Plot for the Proposed Cox-PH Model of Male Patients

Figure 18. Cox-Snell Residuals Plot for the Proposed Cox-PH Model of Female Patients

Figure 19. Comparison of Survival Probabilities for Female Patients Using Parametric and Nonparametric Approaches

Figure 20. Comparison of Survival Probabilities for Male Patients Using Parametric and Nonparametric Approaches

Table 1. Descriptive Statistics of Continuous Risk Factors.

Risk Factor	Minimum	Maximum	Mean	Standard Deviation
Age at Implant	33	86	$64.41$	$8.59$
Survival time	3	212	$98.80$	$42.12$

Table 2. Kaplan Meier Survival Estimations

Time	n.risk	n.event	Survival(pr)	std.err	95% Lower	95% Upper
			Male Patients
3	109	1	0.99083	0.00913	0.97309	1.0000
4	108	2	0.97248	0.01567	0.94224	1.0000
7	106	1	0.96330	0.01801	0.92864	0.9993
13	105	1	0.95413	0.02004	0.91565	0.9942
16	104	1	0.94495	0.02185	0.90309	0.9888
19	103	1	0.93578	0.02348	0.89087	0.9830
22	102	1	0.92661	0.02498	0.87892	0.9769
33	101	1	0.91743	0.02636	0.86719	0.9706
36	100	1	0.90826	0.02765	0.85565	0.9641
39	99	1	0.89908	0.02885	0.84428	0.9574
44	98	1	0.88991	0.02998	0.83305	0.9507
46	97	1	0.88073	0.03104	0.82194	0.9437
47	96	3	0.85321	0.03390	0.78929	0.9223
48	93	1	0.84404	0.03475	0.77860	0.9150
50	92	1	0.83486	0.03556	0.76799	0.9076
			Female Patients
6	39	1	0.9744	0.0253	0.9260	1.000
12	38	1	0.9487	0.0353	0.8820	1.000
40	37	2	0.8974	0.0486	0.8071	0.998
47	35	1	0.8718	0.0535	0.7729	0.983
57	34	1	0.8462	0.0578	0.7402	0.967
67	33	1	0.8205	0.0615	0.7085	0.950
74	32	1	0.7949	0.0647	0.6777	0.932
76	31	2	0.7692	0.0675	0.6477	0.914
77	30	1	0.7436	0.0699	0.6184	0.894
80	29	2	0.6923	0.0739	0.5616	0.853
93	27	1	0.6667	0.0755	0.5340	0.832
96	26	2	0.6154	0.0779	0.4802	0.789
106	24	1	0.5897	0.0788	0.4539	0.766
111	23	1	0.5641	0.0794	0.4281	0.743
118	22	1	0.5385	0.0798	0.4027	0.720

Table 3. Goodness of Fit Test Summary

				Goodness of Fit Test(p-vales)
Gender	Distribution	Obs	Kolmogorov-Sminorv	Anderson-Darling	Chi-Square
Male	Weibull (3P)	109	0.840	0.475	0.605
Female	Log-normal(3P)	39	0.429	0.527	0.924

Table 4. Coefficients and Hazard Ratios of the Proposed Cox-PH Models for Male Patients and Female Patients.

Factor	Coefficient	Hazard Ratio (exp(coef))	SE(coef)	95% Lower	95% Upper
		Male Patients
$X_{1}$	0.050	1.051	0.015	1.022	1.082
$X_{2 (1)}$	-2.74	0.064	0.633	0.019	0.223
$\sqrt{X_{3}}$	-3.23	0.039	0.517	0.014	0.109
$X_{2 (1)} : \sqrt{X_{3}}$	2.65	14.129	0.524	5.056	39.488
		Female Patients
$X_{1}$	0.114	1.12	0.032	1.05	1.193
$X_{2 (1)}$	1.014	2.76	0.496	1.04	7.293
$X_{3}$	-0.474	0.6224	0.172	0.445	0.871

Table 5. Global Statistical Significance of the Proposed Cox-PH Models.

	P-value
Hypothesis Test	Male Patients	Male Patients
Likelihood ratio test	3e-11	1e-4
Wald test	6e-10	0.001
Score (log-rank) test	2e-8	6e-4

Table 6. P-values for Testing the Proportional Hazards Assumption for Male and Female patients.

Covariate	P-value
	Male Patients
$X_{1}$	0.83
$X_{2 (1)}$	0.78
$\sqrt{X_{3}}$	0.11
$X_{2 (1)} : \sqrt{X_{3}}$	0.24
	Female Patients
$X_{1}$	0.26
$X_{2 (1)}$	0.68
$X_{3}$	0.51

Table 7. Ranking of Contributing Covariates and Interactions Using Hazard Ratios

Covariate	Hazard Ratio(exp( Coeff)	Rank
	Male Patients
$X_{2 (1)} : \sqrt{X_{3}}$	14.129	1
$X_{1}$	1.051	2
$X_{2 (1)}$	0.064	3
$\sqrt{X_{3}}$	0.039	4
	Female Patients
$X_{2 (1)}$	2.76	1
$X_{1}$	1.12	2
$X_{3}$	0.6224	3

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

A Real Data-Driven, Robust Survival Analysis on Patients who Underwent Deep Brain Stimulation for Parkinson’s Disease by Utilizing Parametric, Non-Parametric, Semi-Parametric Approaches

Abstract

Keywords:

Subject:

1. Introduction

2. Materials and Methods

2.1. The Real Data

2.2. Methodology

2.2.1. Kaplan-Meier Survival Analysis

2.2.2. Parametric Survival Analysis

2.2.3. Multivariate Cox-Proportional Hazards Model

3. Results

3.1. Univariate Survival Analysis

3.1.1. Kaplan Meier (KM)

3.1.2. Parametric Survival Analysis

3.2. Multivariate Survival Analysis

3.2.1. Cox-PH Model

3.2.2. Validation of the Assumptions

4. Discussion and Conclusions

Author Contributions

Funding

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

MDPI Initiatives

Important Links

Subscribe