Bayesian Analysis of Competing Risks Models with Masked Causes of Failure and Incomplete Failure Times

Bayesian analysis for masked data under competing risk frameworks is studied for the purpose of assessing the impact of covariates on the hazard functions when the failure time is exactly observed for some subjects but only known to lie in an interval of time for the remaining subjects. Such data, known as partly interval-censored data, usually result from periodic inspection. Dirichlet and Gamma processes are assumed as priors for masking probabilities and baseline hazards. The Markov Chain Monte Carlo (MCMC) technique is employed for the implementation of the Bayesian approach. The effectiveness of the proposed model is tested through numerical studies, including simulated and real data sets.


Introduction
Suppose we have a unit subjected to causes of failure under study.Let T be the time until the unit experiences failure due to one of the causes.It often happens that there are a few unidentified causes responsible for the unit's failure.Such type of incomplete data is generally referred to as masked data.Why a unit fails can only be identified up to a Minimum Random Subset (MRS) S ⊆ {1, … , K}.If the precise cause of failure is identified as , then S = {K} is a singleton.If the cause of failure is not identified, then S = {1, … , K} resulting in full masking of the cause.Thus, every detected failure time ,i = 1, … , N, is accompanied by a detected MRS denoted by .However, another reason for incomplete detections could be because the exact failure time T is not known.Work on masked data is in abundance in the literature, Miyakawa (1984) provided maximum likelihood estimates (MLEs) with two causes of failure and independent exponential failure-times when the data is masked and uncensored.Dinse (1986) suggested nonparametric maximum likelihood estimators of prevalence and mortality when the MRS is a known cause or a full masked cause.Reiser et al. (1995) presented a Bayesian analysis assuming exponentially distributed component lifetimes, and Guttman et al. (1994) took this work further to the case where the masking probability depends on the actual cause of failure.Considering the partial masking cases, Mukhopadhyay & Basu (1997) studied a Bayesian analysis with independent Weibull distributions and made the assumption of similar shape structures for all the K risks.Basu et al. (1999) on the other hand, developed a Bayesian analysis for masked data from a general K component system with non-identical Weibull distributions.Further, they investigated a Bayesian analysis based on a general flexible parametric framework for complex forms of censoring (Basu et al. 2003).Without the symmetry assumption, Kuo & Yang (2000) developed a Bayesian analysis with independent exponential as well as Weibull distributions, while Mukhopadhyay & Basu (2007) studied the case of a series system, the components of which possess independent log-normal life distribution.Xu & Tang (2011) considered a nonparametric Bayesian approach for masked data which extended the findings of Neath & Samaniego (1996) to series systems comprising some masked competing risks.In contrast, based on a cause-specific formulation, Flehinger et al. (2002) proposed an approach of completely parametric cause-specific hazards using stage 1 and stage 2 information when the failure times for the competing risks have a Weibull distribution.Craiu & Reiser (2006) developed an EM-based method that allowed dependent competing risks and produced estimators for the sub-distribution functions.Moreover, Lu & Tsiatis (2001) presented parametric models to estimate the regression coefficients where by the cause-specific hazard for the cause of interest is associated with the covariates through a proportional hazards relationship.Sen et al. ( 2010) introduced a semi-parametric Bayesian approach discussing three different models using variety in priors.Yosra, et al. (2016) discussed partly interval-censored data under competing risks framework when the cause of failure might be masked, and compared their result with Sen's models.In engineering field, researchers often show interest in component reliability.Therefore, most of the work mentioned above was developed for masked data based on the series system formulation considering the cases where the failure time is complete (no censored units), right-censored (RC) or interval-censored.Nevertheless, one can be interested in studying the impact of the risk factors on the hazard function other than the estimation of reliability.Since the covariates' effect is of interest, a Bayesian analysis under cause-specific hazard framework is considered in this paper employing Cox proportional hazards model, which is used extensively but mostly for public health studies (see (Han et al. 2017) and(Liu et al. 2017)) .We investigate the case where the data are masked and the failure time is partly interval-censored (PIC).This work can be regarded as a development of Sen et al.'s (2010) model.Section 2 and 3 introduce the model construction and the Bayesian computation techniques.Section 4 provides some results using simulated data to evaluate the model performance, while section 5 illustrates our approach using an actual data set, while section 6 concludes this paper.

Model Structure
In the case of masked data, for each unit, we not only observe the failure time but also a set of causes that include the true cause of failure.Assume that we observe N units each has K causes of failure acting on it.Let X denote the observed collection of covariates.Then for any unit , we observe the vector ( , , ), where denotes the failure time and denotes the MRS of causes that are possibly responsible for the unit failure.For the unit, the likelihood contribution from the data ( , , ) consists of P( , | ), Kuo & Yang (2000) and can be expressed as P( , | ) = P( , = | )P( | , = , ), j = 1, … , K , where denotes the actual cause of failure of the unit.Note that P( , = | ) = ( | ).Then when the observation of C is incomplete, see Martin J. Crowder ( 2001), the likelihood contribution for an observed failure can be modified to; In the case of partly interval-censored data where the time of failure is incomplete, we observe the exact failure time for some units but only observe the interval of time that includes the true failure time for the remaining units.This type of data arises when the units might be inspected periodically.Let ( , ], where < , denotes the observed interval including the true failure time of the unit ( ∈ ( , ]) .If the unit failure occurred before the first inspection time, then we have a left-censored observation ( ∈ (0, ]) and if the unit did not fail until the last inspection time, then we have a right-censored observation ( ∈ ( , ∞]).Define δ , γ as censoring indicators taking the value of one if the failure time T is left-censored or interval-censored and taking a value of zero otherwise.Then, the likelihood contribution of the unit when the observation of T is incomplete can be expressed as; where , ( + = ) are the numbers of the units whose failure time is exact and intervalcensored (including left-and right-censored), respectively.In this study, our interest is in the semi-parametric Bayesian estimation of the regression coefficients.Therefore, we prefer to work with the cause-specific formulation utilizing the popular proportional hazards (PH) model that is; λ (T, X) = λ (T)e β ′ , j = 1, … , K , (2.1) where λ , β are, respectively, the baseline hazard and the regression coefficient of the cause of failure, and X represents the vector of the explanatory variables.In this study, where both failure time and cause of failure are incomplete, we need to consider the two cases discussed above to formulate the likelihood function.Let , , ( + + = ) denote the numbers of the units whose failure times are exact, right-censored, and intervalcensored (including left-censored) respectively.Then the full likelihood of masked and partly interval-censored data can be expressed as; When dealing with the masked data, many researchers adopt the symmetry assumption that involves an equal chance of detecting a similarly masked subset of causes regardless of the actual cause, that is; P(S |T , C = j, X ) = P(S T , C = j ′ , X ), j, j ′ ∈ S (2.4)The assumption (2.4) makes the analysis proceed with less likelihood that is not reliant on masking probabilities.In this paper, we introduce a Bayesian analysis using the full likelihood (2.3).We model the masking probabilities to be autonomous of the failure time but dependent on cause of failure.Moreover, we allow them to depend on subject-level covariates.It is often of interest to determine the cause that is responsible for the unit failure when it is masked.To determine this cause, we need to compute the diagnostic probability, which is the probability of the i risk, causing the unit to fail given the observed masking set and the unit's failure time.According to our full likelihood, we have two different ways to compute the diagnostic probabilities depending on whether the failure time of the unit is exact or interval-censored (including left-censored).First, when the i unit is exact, the diagnostic probability can be defined as; Second, when the i unit is interval-censored, the diagnostic probability can be defined as;

Bayesian analysis
To derive the Bayesian analysis, we need to assign prior distributions to the unknown parameters which are assumed to be stochastically independent.In this paper, we consider using the most popular prior distributions in the literature.For example, we assign independent Dirichlet priors to the masking probabilities.Let J = 2 denote the number of sets that include the cause j, and let = { , … , } denote the collection of potential MRS's that contain cause j.Then the random Dirichlet variables can be defined as; μ S , … , μ S ∽ Dir α , i = 1, … , N; j = 1, … , K; J = 2 .
As prior for cause-specific baseline hazards we use an independent Gamma process that is a very common prior for the baseline hazard, and it has the form Λ (t) ∽ GP cω (t), c , j = 1, … , K , where Λ is the cumulative baseline hazard specific to i cause of failure.Here, ω (t) can be regarded as a prior guess at unknown hazard function specific to j cause of failure while c represents the degree of confidence in this guess.The regression coefficients are assumed to be independently normal distributed, that is; where β , θ and σ are the regression coefficient, the mean, and the variance, respectively, specific to the i cause of failure.
After we define the prior distributions, our interest turns to the joint posterior distribution which is defined as;  ( | , , ), ( | , , ), and ( | , , ).These distributions need to be identified for the construction of an effective simulation method.This is exactly what the WinBUGS software does.However, it performs these steps internally and automatically.

Simulation Study
Since we work under a cause-specific hazards formulation, we adopt the cause-specific hazardsbased simulation design of Beyersmann et al. (2009) to simulate the failure times.We consider a competing risks model with two causes of failure, where each has a Weibull distributed lifetime with parameters (λ , ), = 1, 2 and set = 0.005, = 0.003, = 1.9, = 1.3.First, we simulate the failure times and the censored times.Then we simulate the causes of failure and mask them randomly with equal chance to be masked or unmasked, which results in masked rightcensored data.Last, we create inspection times so that the data becomes partly interval censored data which include exact, left-censored, right-censored, and interval-censored failure times.The obtained data consist of 46% exact failure times, as well as 32% right, 10% left, and 12% interval censored times.Furthermore, 32% of the observations are masked while 30% and 6% of the observations fail due to causes 1 and 2, respectively.In order to evaluate the performance of our developed model, we apply it to the simulated partly interval-censored data, then compare the results with the ones obtained using Sen's (2010) approach.For convergence monitoring, we check the time series plots and auto-correlation function plots, and both of them suggest convergence.Further, we use Gelman and Rubin multiple sequence diagnostics.The reported results are based on five chains each of 4000 iterations with burn-in of 1000 .The results obtained from a simulated random sample of N = 50 units demonstrate that the estimations of the two approaches are comparable.Although our model deals with left-, right-, and interval-censored failure times, which means considerable missing information, table 1 shows that its posterior estimations of regression coefficients are reasonably close to those estimated from the model with only right-censored failure times.On the other hand, figure 1 shows a comparison between the cumulative baseline hazards.It can be seen from the figure that the cumulative baseline hazards obtained from the two models are noticeably close with slight fluctuation up and down each other for both causes.*Posterior credible interval.

Solid line=cause1, Dashed line=cause2
Figure1: Comparison of cumulative baseline hazards from the two approaches.

Illustration
We apply our approach to the data set reported in Klein and Basu (1981).The data represent the failure times of insulation systems for electric motors with their corresponding causes of failure.
There are three possible types of failures, namely, turn, phase, and ground.The experiment was conducted at three different stress levels (190 + 273.16)/1000, (220 + 273.16)/1000, and (240 + 273.16)/1000)where 20 units are tested at each level.To illustrate our methodology we reproduce this data to turn it into partly interval-censored data with masked causes of failure.The obtained results are based on five chains where each is run with a burn-in of 20000 iterations with 50000 retained draws and a thinning to every 15 draw.Convergence is monitored and it is achieved for all parameters.Table 2 describes the number of units across the masking sets and the failure/censored times.Table 3 summarizes the posteriors estimates (mean, median, standard error (SE), and posterior credible interval (PCI)) of the regression coefficients while Table 4 shows the posterior mean of the diagnostic probabilities of the full masked units which are computed using equations (2.5) and (2.6).The results indicate that 50%, 25%, and 25% of the units fail due to turn, phase, and ground causes, respectively.Figure 2 depicts the cumulative baseline hazard functions of the three causes of failure.It is obvious from the figure that the three causes have almost equal cumulative hazard.In addition, figure 3-5 demonstrates that as stress increases the hazard increases, irrespective of cause.This is exactly the purpose of such experiments as it is run in high levels of stress to accelerate the failure and so reduce the cost and the experiment period.

Conclusion
In this study, Bayesian analysis for competing-risk models was derived from conditions of masked failure cause and incomplete failure time.This method offers some flexibility in modeling as it is not built on assumptions with questionable validity such as the symmetry assumption or independence of the competing risks.Furthermore, it provides an assessment of the risk factors' (covariates) effect on the hazard function.Based on the simulation results, it can be seen that our method is feasible.

Figure 2 :
Figure 2: Cumulative baseline hazard of the three causes.

Figure 5 :
Figure 5: Cumulative hazard of cause ground.

preprints.org) | NOT PEER-REVIEWED | Posted: 21 October 2017 doi:10.20944/preprints201710.0142.v1
Preprints (www. .1) where L denotes the likelihood function, П denotes the prior distribution, and D denotes the observed data.Since (3.1) has a complicated form, we utilize the MCMC technique to generate

Table 1 :
The posterior summaries of the regression coefficients from the two approaches.

Table 2 :
Number of units across masking sets and failure/censored times. *T=Turn,

Table 3 :
The posterior summaries of the regression coefficients.

Table 4 :
Diagnostic probabilities of the full masked units.