Preprint
Article

Properties and Maximum Likelihood Estimation of the Novel Mixture of Fréchet Distribution

Altmetrics

Downloads

154

Views

56

Comments

0

A peer-reviewed article of this preprint also exists.

This version is not peer-reviewed

Submitted:

21 June 2023

Posted:

22 June 2023

You are already at the latest version

Alerts
Abstract
In recent decades, there have been numerous endeavors to develop a novel category of survival distributions possessing enhanced flexibility through the extension of existing distributions. This article constructs and validates the statistical properties of a novel survival distribution in order to obtain an alternative distribution that is suitable for analyzing survival data by presenting the novel mixture of the Fréchet distribution along with statistical properties such as the probability density function (PDF), cumulative distribution function (CDF), rth ordinary moment, skewness, kurtosis, moment-generating function, mean, variance, mode, survival function, hazard function, and asymptotic behavior, as well as constructing the estimators of the unknown parameter by employing the expectation-maximization (EM) algorithm, and simulated annealing. Additionally, the performance of the proposed estimators was compared with bias, mean squared errors (MSE), and simulated variances, and given an illustrative example of the proposed distribution to the survival data set in order to show that the proposed distribution is appropriate for the right-skewed data. This will be extremely advantageous in survival analysis.
Keywords: 
Subject: Computer Science and Mathematics  -   Probability and Statistics

1. Introduction

Survival analysis, a branch of statistics pertaining to death or failure, encompasses various types of statistical methods to draw conclusions. These methods include 1) nonparametric statistics, such as the Kaplan-Meier estimator and the log-rank test; 2) semi-parametric statistics, exemplified by the Cox proportional hazards model; and 3) parametric statistics, which focus on simulating survival time probabilities. Analysts may deduce that the survival function has a parametric distribution. For instance, if the survival time adheres to an exponential distribution, the hazard rate will be constant. Conversely, if the survival time conforms to a log-normal distribution, the hazard rate varies with time. Consequently, estimation of the survival function, calculation of the confidence interval, and assessment of the relative risk ensue. The utilization of a parametric survival function proves highly effective when appropriate distributions and parameter values are selected. The parametric survival distribution serves as a comprehensive representation of various types of survival data.
Hundreds of univariate continuous distributions exist. Mixture models play a crucial role in numerous applications, including survival analysis. These models involve the combination of two or more statistical distributions to create a new distribution, thereby addressing various challenges encountered in the field. Recognizing the evident necessity for mixture distributions, extensive efforts have been devoted to integrating multiple well-established distributions and utilizing them to tackle relevant issues. In the context of complete samples, Niyomdecha and Srisuradetchai [1] introduce a novel continuous three-parameter survival distribution referred to as the Complementary Gamma Zero-Truncated Poisson distribution. The traits of the maximum value in a series of independently identical gamma-distributed random variables are combined with those of zero-truncated Poisson random variables in this distribution. Abdullahi and Phaphan [2] present a mixture of Nakagami distribution, accompanied by statistical properties and a comparative analysis of the efficacy of estimators utilizing the quasi-Newton method and simulated annealing. Nanuwong et al. [3] proposed the mixture Pareto distribution by combining a Pareto distribution and a length-biased Pareto distribution. This distribution was formulated based on the concept of a weighted two-component distribution. Further investigation pertaining to the mixture models can be found in the references [4,5,6,7].
The Fréchet distribution, alternatively referred to as the inverse Weibull distribution, holds extensive application in the field of survival modeling. Fréchet [8] initially introduced the Fréchet distribution, which subsequently underwent further exploration by Fisher and Tippett [9] as well as Gumbel [10]. Furthermore, Abbas and Yincai [11] conducted a comparative analysis of the scale parameter estimation for the Fréchet distribution, employing maximum likelihood, probability-weighted moments, and Bayes estimations. Nasir and Aslam [12] utilized a Bayesian technique to estimate the parameter of the Fréchet distribution. Reyad et al. [13] established QE-Bayes and E-Bayes estimates for the scale parameters associated with the Fréchet distribution. Recent developments have introduced various extensions to the Fréchet distribution. Notably, Mead et al. [14] proposed the beta exponential Fréchet distribution.
Consequently, this article has paid special attention to developing a new survival distribution by employing the notion of a mixture distribution, which is based on the Fréchet distribution, to obtain a new alternative distribution with the value of the time-varying hazard rate and investigating the statistical properties of the new distribution, such as the probability density function, cumulative distribution function, r t h ordinary moment, skewness, kurtosis, moment-generating function, mean, variance, mode, survival function, hazard function, asymptotic behavior, comparison of the estimators with several methods, and samples of applying to real data, which will be extremely useful in survival analysis.

2. The Fréchet Distribution

The Fréchet distribution, being a specific case of the generalized extreme value distribution, finds extensive application in the field of hydrology. This distribution is commonly employed to model extreme events, including annual maximum one-day rainfalls and river discharges. Moreover, the Fréchet distribution holds considerable significance in survival analysis utilizing experimental data from clinical research. Given its status as the inverse Weibull distribution, the Fréchet distribution exhibits properties akin to the Weibull distribution, such as time-varying hazard rates. As a result, the Fréchet distribution has been a subject of widespread discussion in the field of survival analysis.
Afify et al. [15] provides the probability density function (PDF), cumulative distribution function (CDF), and mean of the Fréchet distribution, as described by
g ( x ) = δ λ δ x δ + 1 e λ x δ , x > 0 .
Given that λ > 0 represents a scale parameter and δ > 0 represents a shape parameter, the cumulative distribution function (CDF) associated with these parameters can be expressed as follows:
G ( x ) = e λ x δ .
Furthermore, the mean of the distribution can be determined as follows:
E ( X ) = λ Γ 1 1 δ .

3. The Length-biased Fréchet Distribution

Within the framework presented by Hesham et al. [16], a length-biased Fréchet distribution was introduced along with its associated CDF, PDF, and mean. The specific form of the CDF can be expressed using Equation (4).
G L ( x ) = 1 Γ 1 1 δ Γ 1 1 δ , λ x δ , x > 0 ,
where λ > 0 , and δ > 1 . The associated PDF can be expressed as follows:
g L ( x ) = δ λ δ 1 Γ 1 1 δ x δ e λ x δ .
Additionally, the distribution’s mean can be determined using the formula below:
E L ( X ) = λ Γ 1 2 δ Γ 1 1 δ , δ > 2 .

4. Theoretical Result

4.1. The Probability Density Function of the Novel Mixture Fréchet (NMF) Distribution

This subsection aims to construct a novel distribution by employing the notion of a mixture distribution. The proposed distribution will be a composite of two distinct distributions, namely the Fréchet distribution and the length-biased Fréchet distribution. The probability density function (PDF) of the newly developed distribution will be derived, utilizing the function of parameter λ as a weighted parameter. Consequently, the PDF of the novel mixture Fréchet (NMF) distribution is defined as:
f N M F ( x ) = 1 λ + 1 g ( x ) + λ λ + 1 g L ( x ) , x > 0 ,
where λ > 0 , and δ > 1 . By substituting Equations (1) and (5) into Equation (7), the resulting expression is denoted as
f N M F ( x ) = δ λ δ λ + 1 x δ e λ x δ 1 x + 1 Γ 1 1 δ , x > 0 , λ > 0 , δ > 1 .
Therefore, Equation (8) represents the PDF of the NMF distribution.

4.2. Validity Check of the NMF Distribution for a Proper Density Function

A probability density function (PDF) is considered valid if it satisfies the following conditions:
f ( x ) x = 1 .
In order to demonstrate the validity of the proposed NMF distribution as a PDF, the following steps are undertaken:
0 δ λ δ λ + 1 x δ e λ x δ 1 x + 1 Γ 1 1 δ x = 1 ,
let
u = λ x δ x = k 1 β λ ,
and
x = k λ β λ x β 1 .
By substituting Equation (11) and (12) into Equation (10), the resulting expression can be obtained.
0 f N M F ( x ) x = 1 λ + 1 0 k α 1 + 1 β e k λ k 1 β Γ ( α ) + 1 Γ α + 1 β k = 1 λ + 1 λ Γ ( α ) Γ ( α ) + 1 Γ α + 1 β Γ α + 1 β = λ ( λ + 1 ) + 1 ( λ + 1 ) = 1 .
This demonstrates that the PDF defined in Equation (8) conforms to the properties of a valid probability density distribution. Figure 1 depict the PDF of the novel mixture of Fréchet distribution for different parameter values. The displayed variety of shapes demonstrates the right-skewed nature of the NMF distribution. Additionally, being a family of asymmetric distributions, the NMF distribution proves to be valuable for analyzing skewed data, particularly data with a right-skewed distribution, such as survival data.

4.3. The Cumulative Density Function of the NMF Distribution

Let G ( x ) and G L ( x ) represent the cumulative density function (CDF) of the Fréchet distribution and the length-biased Fréchet distribution, respectively. Consider a random variable X following the novel mixture Fréchet (NMF) distribution. The CDF for X in this instance can be written as follows:
F N M F ( x ) = 0 x f ( t ) t = 0 x 1 λ + 1 g ( t ) + λ λ + 1 g L ( t ) t
= 1 λ + 1 G ( x ) + λ λ + 1 G L ( x ) .
From Equation (14), the CDF of the novel mixture of Fréchet distribution can be expressed as
F ( x ) = 1 λ + 1 e λ x δ + λ λ + 1 1 Γ 1 1 δ Γ 1 1 δ , λ x δ = 1 λ + 1 e λ x δ + λ Γ 1 1 δ Γ 1 1 δ , λ x δ .

4.4. The r t h Ordinary Moment of the NMF Distribution

The NMF distribution’s r t h ordinary moment is expressed as follows:
μ r = E M X r = 0 x r f ( x ) x .
Equation (19) gives the explicit expression for the r t h ordinary moment of the NMF distribution upon inserting Equation (8) into Equation (16) and performing integration with respect to x.
0 δ λ δ λ + 1 x δ + r e λ x δ 1 x + 1 Γ 1 1 δ x .
μ r = 1 λ + 1 E X r + λ λ + 1 E L X r ,
where
E X r = λ r Γ 1 r δ ,
and
E L ( X ) = λ r Γ 1 r + 1 δ Γ 1 1 δ .
μ r = 1 λ + 1 λ r Γ 1 r δ + 1 λ + 1 λ r + 1 Γ 1 r + 1 δ Γ 1 1 δ , r = 1 , 2 , 3 ,
The following is the mathematical expression for the mean of the NMF distribution:
E N M F ( X ) = λ ( λ + 1 ) Γ 1 1 δ + λ Γ 1 2 δ Γ 1 1 δ .
The second moment of the NMF distribution, denoted as E ( X 2 ) , can be derived from Equation (19) by setting the value of r = 2 .
E N M F ( X 2 ) = λ 2 λ + 1 Γ 1 2 δ + λ Γ 1 3 δ Γ 1 1 δ .
The third moment of the NMF distribution, denoted as E ( X 3 ) , can be obtained from Equation (19) by substituting r = 3 .
E N M F ( X 3 ) = λ 3 λ + 1 Γ 1 3 δ + λ Γ 1 4 δ Γ 1 1 δ .
The fourth moment of the NMF distribution, denoted as E ( X 4 ) , can be calculated by substituting r = 4 in Equation (19).
E N M F ( X 4 ) = λ 4 λ + 1 Γ 1 4 δ + λ Γ 1 5 δ Γ 1 1 δ .
Equation (19) at r = 1 and r = 2 and substituting into Equation (24) yields the variance of the NMF distribution.
V a r N M F ( X ) = E M ( X 2 ) [ E M ( X ) ] 2 ,
V a r N M F ( X ) = λ 2 λ + 1 Γ 1 2 δ + λ Γ 1 3 δ Γ 1 1 δ λ λ + 1 Γ 1 1 δ + δ Γ 1 2 δ Γ 1 1 δ 2 .

4.5. The Skewness and Kurtosis of the NMF Distribution

The novel mixture Fréchet (NMF) distribution’s skewness and kurtosis coefficients are provided as follows, respectively:
Φ 1 = E N M F ( X 3 ) E N M F X 2 3 2 = λ 3 λ + 1 Γ 1 3 δ + λ Γ 1 4 δ Γ 1 1 δ λ 2 λ + 1 Γ 1 2 δ + δ Γ 1 3 δ Γ 1 1 δ 3 2 ,
and
Φ 2 = E N M F ( X 4 ) E N M F X 2 2 = λ 4 λ + 1 Γ 1 4 δ + λ Γ 1 5 δ Γ 1 1 δ λ 2 λ + 1 Γ 1 2 δ + δ Γ 1 3 δ Γ 1 1 δ 2 .

4.6. The Moment Generating Function of the NMF Distribution

The NMF distribution’s moment-generating function is provided by
E N M F e X t = M X ( t ) = r = 0 t r E N M F X r r ! .
By substituting Equation (19) into (28), the NMF distribution’s moment-generating function is derived as presented in Equation (29).
M X ( t ) = r = 0 t r r ! 1 λ + 1 λ r Γ 1 r δ + 1 λ + 1 λ r + 1 Γ 1 r + 1 δ Γ 1 1 δ .

4.7. The Mode of the NMF Distribution

By computing the derivative of the natural logarithm of Equation (8) with respect to x, setting it equal to zero, and solving for x, one is able to determine the mode of the NMF distribution. In this subsection, a nonlinear equation is obtained in Equation ().
log f ( x ) = log δ λ δ λ + 1 δ log ( x ) λ x δ + log 1 x + 1 Γ 1 1 δ ,
λ x δ δ x δ x 1 x 2 1 x + 1 Γ 1 1 δ = 0 .

4.8. The Survival Function and the Hazard Rate Function of the NMF Distribution

Consider a continuous random variable, X, whose cumulative density function, F ( x ) , is specified on the range, [ 0 , ) . The following is an expression for the survival function of X:
S ( x ) = 1 F ( x ) .
The survival function of the NMF distribution is obtained by inserting Equation (15) into Equation (32):
S ( x ) = 1 1 λ + 1 e λ x δ + λ Γ 1 1 δ Γ 1 1 δ , λ x δ .
Theoretically possible to define the hazard rate function of X as:
h r f ( x ) = f ( x ) S ( x ) .
Consequently, the NMF distribution’s hazard rate function is given by
h r f ( x ) = δ λ δ λ + 1 x δ e λ x δ 1 x + 1 Γ 1 1 δ 1 1 λ + 1 e λ x δ + λ Γ 1 1 δ Γ 1 1 δ λ x δ .

4.9. Asymptotic Behavior of the NMF Distribution

The NMF distribution exhibits zero asymptotic behavior as x approaches infinity.
lim x f ( x ) = lim x δ λ δ λ + 1 x δ e λ x δ 1 x + 1 Γ 1 1 δ = 0 .
As x approaches λ :
lim x λ f ( x ) = lim x δ λ δ λ + 1 x δ e λ x δ 1 x + 1 Γ 1 1 δ = δ e λ + 1 1 λ + 1 Γ 1 1 δ .

4.10. Maximum Likelihood Estimation of the NMF Distribution

Maximum likelihood estimators will be utilized in this subsection to estimate the NMF distribution’s parameters. The likelihood function of the NMF distribution is defined as follows if x 1 , ⋯, x n represent a random sample of size n taken from the NMF distribution:
L ( x ) = i = 1 n δ λ δ λ + 1 x δ e λ x δ 1 x + 1 Γ 1 1 δ ,
( x ) = log i = 1 n δ λ δ λ + 1 x δ e λ x δ 1 x + 1 Γ 1 1 δ .
Equation (37)’s natural logarithm has been employed to derive the log-likelihood function shown in Equation (39).
( x ) = n log ( δ ) + n δ log ( λ ) n log λ + 1 δ i = 1 log ( x i ) i = 1 λ x δ + i = 1 log 1 x + 1 Γ 1 + 1 δ .
By taking the derivative of Equation (39) with respect to λ and δ and then solving for each of those values, one can obtain the maximum likelihood estimators (MLEs).
( x ) λ = n δ λ n λ + 1 i = 1 λ x δ δ λ ,
( x ) δ = n δ + n log ( λ ) i = 1 log ( x i ) i = 1 λ x δ log λ x Ψ 1 1 δ δ 2 Γ 1 1 δ 1 x + 1 Γ 1 1 δ .
Due to the nonlinearity of these equations, analytical solutions are not feasible, but iterative methods can be used to solve these numerically. This article proposes the utilization of the expectation-maximization (EM) algorithm and the simulated annealing to construct the MLEs for the NMF distribution.

4.10.1. Maximum Likelihood Estimation employing the Simulated Annealing Algorithm

This article examines the MLEs for the unknown parameters of the NMF distribution. Analytical solutions for the MLEs are not attainable in Section 4.10. Therefore, in this part, the R optimization function, particularly the “optim” function, is employed for maximum likelihood estimation (MLE) using the simulated annealing. The steps of the Simulated Annealing Algorithm are as follows:
Step 1: Given a initial value x ( k = 0 ) , temperature T, number of iterations n, and desired accuracy ε .
Step 2: Pick a random value x ( k + 1 ) in the vicinity of x ( k ) .
Step 3: If Δ E < 0 , where Δ E = f ( x ( k + 1 ) ) f ( x ( k ) ) , and f ( x ) represents the objective function, then accept x ( k + 1 ) . Otherwise, generate a random number α such that α ( 0 , 1 ) . If α exp ( Δ E / K T ) , where K is the Boltzmann constant, then accept x ( k + 1 ) . Otherwise, return to Step 2.
Step 4: If | x ( k + 1 ) x ( k ) | < ε and T is sufficiently small, terminate the iterations. Otherwise, if the number of random number generations reaches n, decrease the value of T, let k = k + 1 , and go to Step 2. Otherwise, give k = k + 1 and go to Step 2.

4.10.2. Maximum Likelihood Estimation employing the EM-Algorithm

An Expectation-Maximization (EM) algorithm is an iterative method employed to estimate unknown parameters in incomplete statistical models. The application of the EM algorithm encompasses two primary scenarios. The first arises when the data is incomplete due to observational process issues or limitations. The second arises when optimizing the likelihood function becomes challenging. The procedure for implementing the EM algorithm for the NMF distribution is outlined as follows:
The steps involved in the Expectation (E)-Step
1. Derive the log-likelihood function for an NMF distribution.
ln L ( x ) = i = 1 n ln 1 λ + 1 g ( x ) + λ λ + 1 g L ( x ) .
2. Compute a complete log-likelihood function by assigning a missing value κ i in the function ln L ( x ) . The missing values κ i can take either 0 or 1. Thus, the complete random variable is denoted as Y = ( X ; K ) , where y 1 , y 2 , , y n represent the observations with y i = ( t i , κ i ) for i = 1 , 2 , . . . , n . Consequently, a complete log-likelihood function is written in:
l c o m p l e t e Θ y 1 , y 2 , , y n = i = 1 n κ i ln 1 λ + 1 g ( t ) + i = 1 n 1 κ i ln λ λ + 1 g L ( t ) , = i = 1 n κ i ln 1 λ + 1 g ( t ) + i = 1 n 1 κ i ln x i ( λ + 1 ) Γ 1 1 δ g ( t ) ,
where Θ = { λ , δ } . The Equation (43) can be simplified by substituting Equation (1), resulting in the complete log-likelihood function, denoted as l complete Θ y 1 , y 2 , , y n , which is expressed as follows:
l complete Θ y 1 , y 2 , , y n = i = 1 n l n ( x i ) n l n ( λ + 1 ) n l n Γ 1 1 δ i = 1 n κ i l n ( x i ) + n κ ¯ Γ 1 1 δ + n l n ( δ ) + n δ l n ( λ ) ( δ + 1 ) i = 1 n l n ( x i ) i = 1 n λ x i δ ,
where κ ¯ = 1 n i = 1 n κ i .
3. Formulate the new complete log-likelihood function by eliminating constant expressions, resulting in the following expression:
l complete Θ y 1 , y 2 , , y n = n l n ( λ + 1 ) n l n Γ 1 1 δ + n κ ¯ Γ 1 1 δ + n l n ( δ ) + n δ l n ( λ ) ( δ + 1 ) i = 1 n l n ( x i ) i = 1 n λ x i δ .
A pseudo-log-likelihood function is derived at an E-step of an EM algorithm by replacing missing values with their respective expectations. Hence, the pseudo-log-likelihood function at the k t h stage can be expressed as follows:
l complete Θ y 1 , y 2 , , y n = n l n ( λ + 1 ) n l n Γ 1 1 δ + n a ( k ) Γ 1 1 δ + n l n ( δ ) + n δ l n ( λ ) ( δ + 1 ) i = 1 n l n ( x i ) i = 1 n λ x i δ ,
where a ( k ) = 1 n i = 1 n a i ( k ) , and a i ( k ) is given by
a i ( k ) = 1 λ + 1 g ( x ; λ ( k ) , δ ( k ) ) 1 λ + 1 g ( x ; λ ( k ) , δ ( k ) ) + λ λ + 1 g L ( x ; λ ( k ) , δ ( k ) ) .
The steps involved in the Maximization (M)-Step
The M-step process involves iteratively increasing the number of function expressions. With each iteration, the values of a ( k ) and the estimated parameters λ ( k + 1 ) , and δ ( k + 1 ) will adjust. The process continues until the estimated values remain unchanged. Consequently, the MLEs for λ , and δ obtained via an EM algorithm are λ ( k + 1 ) , and δ ( k + 1 ) , respectively, achieved by maximizing Equation (46). The initial values suggested in this article for the EM algorithm are λ ( 0 ) and δ ( 0 ) , which are as follows:
λ ( 0 ) = n t 1 δ , where t = i = 1 n 1 t i δ ,
δ ( 0 ) = 2 for sample size is small , and
δ ( 0 ) = 1.5 for sample size is large .
EM-Algorithm:
Step 1: Generate a random sample t 1 , t 2 , , t n according to the NMF distribution.
Step 2: Set k = 0 and compute the initial values λ ( 0 ) and δ ( 0 ) as specified in Equation (48), (49), and (50).
Step 3: Calculate a ( k ) = 1 n i = 1 n a ( k ) i for i = 1 , 2 , , n , when a ( k ) i was given by Equation (51). For example, when k = 0 , we obtain the following:
a i ( 0 ) = 1 λ + 1 g ( x ; λ ( 0 ) , δ ( 0 ) ) 1 λ + 1 g ( x ; λ ( 0 ) , δ ( 0 ) ) + λ λ + 1 g L ( x ; λ ( 0 ) , δ ( 0 ) ) .
Step 4: Obtain the values of λ ( k + 1 ) and δ ( k + 1 ) by maximizing Equation (52). For instance, when k = 0 , we obtain the following values:
l complete Θ y 1 , y 2 , , y n = n l n ( λ + 1 ) n l n Γ 1 1 δ + n a ( 0 ) Γ 1 1 δ + n l n ( δ ) + n δ l n ( λ ) ( δ + 1 ) i = 1 n l n ( x i ) i = 1 n λ x i δ .
Step 5: If λ ( k + 1 ) = λ ( k ) and δ ( k + 1 ) = δ ( k ) , then the algorithm stops. Otherwise, update k = k + 1 and proceed to Step 3 and Step 4.

4.10.3. Assessment of the Efficacy of the Parameter Estimation

In this subsection, a series of simulations were performed to compare the outcomes of maximum likelihood estimators obtained using EM algorithms and simulated annealing. The utilization of Equation (48) and (49) as the initial value for the simulated annealing via “optim” function is favored in this context. The random number generator employed for generating samples from the NMF distribution followed an acceptance-rejection algorithm, utilizing a Fréchet distribution from a VGAM package in R program version 4.3.0. Each model was subjected to 500 repetitions. Sample sizes of n = 5 , 10 , 30 , 50 were generated for the NMF distribution with parameters λ = 1.5 , 2.5 and δ = 2 , 3 , 4 . The resulting computations yielded six models for each method and sample size, as presented in Table 1, Table 2, Table 3, Table 4, Table 5, Table 6, Table 7 and Table 8.
Upon reviewing all the results from Figure 2. The performance of the EM algorithm was remarkable, with estimated values for most parameters closely resembling the actual values. Moreover, the proposed EM algorithm demonstrated higher precision compared to the maximum likelihood estimates obtained through simulated annealing, as evidenced by reduced bias, lower mean squared error (MSE), and decreased variance estimation simulation.

5. Illustrative Example

The proposed distribution is applied to an actual dataset in this part. The dataset used in this analysis was collected from a clinical trial conducted by Freireich et al. [17], where patients received a placebo to evaluate the efficacy of 6-mercaptopurine (6-MP) in maintaining remission. Following the completion of the trial after a year, the following remission times were recorded and are expressed in weeks: 1, 1, 2, 2, 3, 4, 4, 5, 5, 8, 8, 8, 8, 11, 11, 12, 12, 15, 17, 22, 23.
Based on the results shown in Figure 3, the remission times of patients who got a placebo had a right-skewed distribution. In order to compare the goodness of fit, three right-skewed distributions—the Fréchet distribution, the length-biased Fréchet distribution, and the proposed mixture Fréchet distribution—are chosen.
While the parameters of the other candidate distributions are determined using maximum likelihood estimation utilizing simulated annealing, the parameters of the novel mixture Fréchet (NMF) distribution are estimated using the EM algorithm. The best model is the one that provides the smallest Akaike information criterion (AIC) value, which is used as the evaluation criterion.
Based on findings presented in Table 9, it is evident that the NMF distribution yields the lowest value of the AIC. This indicates that the NMF distribution outperforms the other potential distributions when using an AIC statistic as a measure of goodness-of-fit for this example data.

6. Conclusions and Discussion

This article presents the introduction of a novel survival distribution known as the novel mixture Fréchet (NMF) distribution. This distribution is characterized by its right-skewed distribution. The study explores various statistical properties of this newly proposed distribution and estimates its two parameters using both EM algorithms and simulated annealing. To assess the performance of both methods, a simulation study is conducted, involving twenty-four different combination scenarios. The illustrative examples of the proposed distribution are implemented using patient remission times data. The results reveal that the EM estimators exhibit greater efficiency compared to the simulated annealing estimators. Additionally, the NMF distribution demonstrates a better fit when compared to other candidate distributions, as indicated by the Akaike information criterion (AIC). Consequently, this article presents a novel right-skewed distribution that holds potential application in diverse areas, including survival analysis and reliability analysis.
In future research, it is advisable to investigate interval estimation using different methods, such as [18,19], to further enhance the accuracy of the estimations.

Author Contributions

Conceptualization, W.D.P.; methodology, W.D.P., and I.A.; validation, W.D.P., I.A., and W.W.P.; formal analysis, W.D.P., I.A., and W.W.P.; investigation, I.A.; writing—original draft preparation, W.D.P., I.A., and W.W.P.; writing—review and editing, W.D.P., I.A., and W.W.P.; visualization, W.D.P.; funding acquisition, W.D.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by King Mongkut’s University of Technology North Bangkok, Thailand. Contract no.KMUTNB-66-BASIC-04.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Acknowledgments

The authors express their gratitude to the reviewers for their invaluable insights and constructive feedback. Additionally, this research has been financially supported by King Mongkut’s University of Technology North Bangkok, Thailand, under contract number KMUTNB-66-BASIC-04.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Niyomdecha, A.; Srisuradetchai, P. Complementary Gamma Zero-Truncated Poisson Distribution and Its Application. Mathematics 2023, 11, 2584. [Google Scholar] [CrossRef]
  2. Abdullahi, I.; Phaphan, W. Some Properties of the New Mixture of Nakagami Distribution. Thail. Stat. 2022, 20, 731–743. [Google Scholar]
  3. Nanuwong, N.; Bodhisuwan, W.; Pudprommarat, C. A New Mixture Pareto Distribution and Its Application. Thail. Stat. 2015, 13, 191–207. [Google Scholar]
  4. Aryuyuen, S.; Bodhisuwan, W.; Volodin, A. Discrete Generalized Odd Lindley–Weibull Distribution with Applications. Lobachevskii J. Math. 2020, 41, 945–955. [Google Scholar] [CrossRef]
  5. Simmachan, T.; Phaphan, W. Generalization of Two-Sided Length Biased Inverse Gaussian Distributions and Applications. Symmetry 2022, 14, 1965. [Google Scholar] [CrossRef]
  6. Tonggumnead, U.; Klinjan, K.; Tanprayoon, E.; Aryuyuen, S. A four-parameter negative binomial-Lindley regression model to analyze factors influencing the number of cancer deaths using Bayesian inference. Commun. Math. Biol. Neurosci. 2023, 2023, 1–20. [Google Scholar] [CrossRef]
  7. Chananet, C.; Phaphan, W. On the new weight parameter of the mixture Pareto distribution and its application to real data. Appl. Sci. Eng. Prog. 2021, 14, 460–467. [Google Scholar] [CrossRef]
  8. Fréchet, M. Sur la loi de probabilité de l’écart maximum. Ann. Soc. Polon. Math. 1927, 6, 93. [Google Scholar]
  9. Fisher, R.A.; Tippett, L.H.C. Limiting forms of the frequency distribution of the largest or smallest member of a sample. Math. Proc. Cambridge Philos. Soc. 1928, 24, 180–190. [Google Scholar] [CrossRef]
  10. Gumbel, E.J. Statistics of Extremes; Columbia University Press: New York, United States, 1958. [Google Scholar]
  11. Abbas, K.; Yincai, T. Comparison of estimation methods for Frechet distribution with known shape. Casp. J. Appl. Sci. Res. 2012, 1, 58–64. [Google Scholar]
  12. Nasir, W.; Aslam, M. Bayes approach to study shape parameter of Frechet distribution. Int. J. Basic. Appl. Sci. 2015, 4, 246–254. [Google Scholar] [CrossRef]
  13. Reyad, H.M.; Younis, A.M.; Ahmed, S.O. QE-Bayesian and E-Bayesian estimation of the Frechet model. BJMCS 2016, 19, 62–74. [Google Scholar] [CrossRef] [PubMed]
  14. Mead, M.E. On five-parameter Lomax distribution: properties and applications. Pak. J. Stat. Oper. Res. 2016, 1, 185–199. [Google Scholar]
  15. Afify, A.Z.; Yousof, H.M.; Cordeiro, G.M.; Ortega, E.M.M.; Nofal, Z.M. The Weibull Fréchet distribution and its applications. J. Appl. Stat. 2016, 43, 2608–2626. [Google Scholar] [CrossRef]
  16. Hesham, M.R.; Ahmed, M.H.; Soha, A.O.; Suzanne, A.A. The length-biased weighted Frechet distribution: Properties and estimation. Int. J. Appl. Math. Stat. 2017, 3, 189–200. [Google Scholar]
  17. Freireich, E.J.; Gehan, E.A.; Frei, E.; et al. The Effect of 6-Mercaptopurine on the Duration of Steroid-Induced Remissions in Acute Leukemia: A Model for Evaluation of Other Potential Useful Therapy. Blood 1963, 21, 699–716. [Google Scholar]
  18. Srisuradetchai, P.; Dangsupa, K. On Interval Estimation of the Geometric Parameter in a Zero–inflated Geometric Distribution. Thail. Stat. 2023, 21, 93–109. [Google Scholar]
  19. Srisuradetchai, P.; Tonprasongrat, K. On Interval Estimation of the Poisson Parameter in a Zero-inflated Poisson Distribution. Thail. Stat. 2022, 20, 357–371. [Google Scholar]
Figure 1. Probability density functions for the novel mixture of Fréchet distribution at various values of λ (lambda) and δ (delta)
Figure 1. Probability density functions for the novel mixture of Fréchet distribution at various values of λ (lambda) and δ (delta)
Preprints 77283 g001
Figure 2. Box plots display the biases, MES, and variance estimation simulation of the EM estimators and simulated annealing estimators.
Figure 2. Box plots display the biases, MES, and variance estimation simulation of the EM estimators and simulated annealing estimators.
Preprints 77283 g002
Figure 3. the 21 patients who got a placebo’s times in remission.
Figure 3. the 21 patients who got a placebo’s times in remission.
Preprints 77283 g003
Table 1. The average estimations, biases, MES, and variance estimation simulation of EM estimators λ ^ and δ ^ with a sample size of n = 5 .
Table 1. The average estimations, biases, MES, and variance estimation simulation of EM estimators λ ^ and δ ^ with a sample size of n = 5 .
λ δ λ ^ δ ^ Bias ( λ ^ ) Bias ( δ ^ ) MSE ( λ ^ ) MSE ( δ ^ ) VarSim ( λ ^ ) VarSim ( δ ^ )
1.5 2 1.7470 2.7426 0.2470 0.7426 0.4008 2.1382 0.3398 1.5867
3 1.5637 4.0928 0.0637 1.0928 0.0814 4.3311 0.0773 3.1369
4 1.5621 5.3009 0.0621 1.3009 0.0455 7.8207 0.0416 6.1283
2.5 2 2.9805 2.7295 0.4805 0.7295 1.2316 1.9732 1.0007 1.4410
3 2.6579 3.9894 0.1579 0.9894 0.2846 3.6459 0.2597 2.6669
4 2.6156 5.7329 0.1156 1.7329 0.1347 13.7042 0.1214 10.7012
Table 2. The average estimations, biases, MES, and variance estimation simulation of simulated annealing estimators λ ˜ and δ ˜ with a sample size of n = 5 .
Table 2. The average estimations, biases, MES, and variance estimation simulation of simulated annealing estimators λ ˜ and δ ˜ with a sample size of n = 5 .
λ δ λ ˜ δ ˜ Bias ( λ ˜ ) Bias ( δ ˜ ) MSE ( λ ˜ ) MSE ( δ ˜ ) VarSim ( λ ˜ ) VarSim ( δ ˜ )
1.5 2 3.1557 14.6394 1.6557 12.6394 16.4554 231.1034 13.7139 71.3496
3 2.0213 15.5284 0.5213 12.5284 2.1113 225.2154 1.8395 68.2533
4 1.9652 15.0794 0.4652 11.0794 1.4879 188.2035 1.2715 65.4498
2.5 2 5.6882 14.0020 3.1882 12.0020 36.7693 213.4220 26.6046 69.3742
3 3.9382 14.6874 1.4382 11.6874 8.9652 220.4475 6.8969 83.8532
4 3.7100 13.9510 1.2100 9.9510 5.6971 180.6055 4.2331 81.5837
Table 3. The average estimations, biases, MES, and variance estimation simulation of EM estimators λ ^ and δ ^ with a sample size of n = 10 .
Table 3. The average estimations, biases, MES, and variance estimation simulation of EM estimators λ ^ and δ ^ with a sample size of n = 10 .
λ δ λ ^ δ ^ Bias ( λ ^ ) Bias ( δ ^ ) MSE ( λ ^ ) MSE ( δ ^ ) VarSim ( λ ^ ) VarSim ( δ ^ )
1.5 2 1.5660 2.2744 0.0660 0.2744 0.0973 0.2963 0.0929 0.2210
3 1.5444 3.4037 0.0444 0.4037 0.0381 0.9369 0.0361 0.7740
4 1.5279 4.6493 0.0279 0.6493 0.0197 2.3606 0.0190 1.9390
2.5 2 2.6520 2.3669 0.1520 0.3669 0.3017 0.4242 0.2786 0.2896
3 2.5847 3.3550 0.0847 0.3550 0.1212 0.9211 0.1141 0.7950
4 2.5446 4.5805 0.0446 0.5805 0.0539 1.6688 0.0519 1.3318
Table 4. The average estimations, biases, MES, and variance estimation simulation of simulated annealing estimators λ ˜ and δ ˜ with a sample size of n = 10 .
Table 4. The average estimations, biases, MES, and variance estimation simulation of simulated annealing estimators λ ˜ and δ ˜ with a sample size of n = 10 .
λ δ λ ˜ δ ˜ Bias ( λ ˜ ) Bias ( δ ˜ ) MSE ( λ ˜ ) MSE ( δ ˜ ) VarSim ( λ ˜ ) VarSim ( δ ˜ )
1.5 2 3.2963 16.4428 1.7963 14.4428 18.4601 273.3670 15.2334 64.7724
3 2.2299 16.4617 0.7299 13.4617 5.0049 240.0290 4.4722 58.8125
4 1.9610 17.9110 0.4610 13.9110 2.0165 248.4057 1.8039 54.8905
2.5 2 7.4968 15.4376 4.9968 13.4376 75.0960 254.3734 50.1281 73.8031
3 5.3225 16.0267 2.8225 13.0267 28.4780 252.5156 20.5112 82.8207
4 4.3101 15.7719 1.8101 11.7719 12.9169 227.9040 9.6403 89.3260
Table 5. The average estimations, biases, MES, and variance estimation simulation of EM estimators λ ^ and δ ^ with a sample size of n = 30 .
Table 5. The average estimations, biases, MES, and variance estimation simulation of EM estimators λ ^ and δ ^ with a sample size of n = 30 .
λ δ λ ^ δ ^ Bias ( λ ^ ) Bias ( δ ^ ) MSE ( λ ^ ) MSE ( δ ^ ) VarSim ( λ ^ ) VarSim ( δ ^ )
1.5 2 1.5302 2.1075 0.0302 0.1075 0.0302 0.0595 0.0293 0.0479
3 1.5117 3.1312 0.0117 0.1312 0.0108 0.1637 0.0106 0.1465
4 1.5042 4.1778 0.0042 0.1778 0.0058 0.3782 0.0057 0.3466
2.5 2 2.5583 2.1745 0.0583 0.1745 0.0856 0.0743 0.0822 0.0439
3 2.5240 3.1078 0.0240 0.1078 0.0309 0.1500 0.0303 0.1384
4 2.5174 4.1438 0.0174 0.1438 0.0158 0.3076 0.0155 0.2869
Table 6. The average estimations, biases, MES, and variance estimation simulation of simulated annealing estimators λ ˜ and δ ˜ with a sample size of n = 30 .
Table 6. The average estimations, biases, MES, and variance estimation simulation of simulated annealing estimators λ ˜ and δ ˜ with a sample size of n = 30 .
λ δ λ ˜ δ ˜ Bias ( λ ˜ ) Bias ( δ ˜ ) MSE ( λ ˜ ) MSE ( δ ˜ ) VarSim ( λ ˜ ) VarSim ( δ ˜ )
1.5 2 5.0867 16.3140 3.5867 14.3140 45.6847 263.9548 32.8206 59.0629
3 3.6042 17.4744 2.1042 14.4744 22.9393 272.4161 18.5115 62.9077
4 2.6454 18.8461 1.1454 14.8461 8.7214 286.8080 7.4095 66.4026
2.5 2 13.3191 16.1026 10.8191 14.1026 213.7684 278.4768 96.7157 79.5939
3 9.2326 15.6701 6.7326 12.6701 105.1613 244.4895 59.8331 83.9577
4 7.6995 14.3182 5.1995 10.3182 64.7645 201.8541 37.7293 95.3889
Table 7. The average estimations, biases, MES, and variance estimation simulation of EM estimators λ ^ and δ ^ with a sample size of n = 50 .
Table 7. The average estimations, biases, MES, and variance estimation simulation of EM estimators λ ^ and δ ^ with a sample size of n = 50 .
λ δ λ ^ δ ^ Bias ( λ ^ ) Bias ( δ ^ ) MSE ( λ ^ ) MSE ( δ ^ ) VarSim ( λ ^ ) VarSim ( δ ^ )
1.5 2 1.5185 2.0694 0.0185 0.0694 0.0167 0.0293 0.0164 0.0245
3 1.5071 3.0634 0.0071 0.0634 0.0066 0.0885 0.0065 0.0845
4 1.5038 4.0746 0.0038 0.0746 0.0033 0.1818 0.0033 0.1763
2.5 2 2.6443 2.1385 0.1443 0.1385 2.6075 0.0505 2.5867 0.0313
3 2.5367 3.0881 0.0367 0.0881 0.0511 0.0968 0.0497 0.0890
4 2.5067 4.0758 0.0067 0.0758 0.0087 0.1513 0.0087 0.1456
Table 8. The average estimations, biases, MES, and variance estimation simulation of simulated annealing estimators λ ˜ and δ ˜ with a sample size of n = 50 .
Table 8. The average estimations, biases, MES, and variance estimation simulation of simulated annealing estimators λ ˜ and δ ˜ with a sample size of n = 50 .
λ δ λ ˜ δ ˜ Bias ( λ ˜ ) Bias ( δ ˜ ) MSE ( λ ˜ ) MSE ( δ ˜ ) VarSim ( λ ˜ ) VarSim ( δ ˜ )
1.5 2 7.9441 16.3235 6.4441 14.3235 109.6222 273.1470 68.0953 67.9839
3 5.0498 17.6010 3.5498 14.6010 44.2769 290.7988 31.6761 77.6082
4 4.2309 17.1473 2.7309 13.1473 28.2579 254.7019 20.8003 81.8496
2.5 2 19.1666 15.4000 16.6666 13.4000 397.1666 255.1433 119.3912 75.5836
3 14.1514 13.6071 11.6514 10.6071 224.4617 202.7200 88.7069 90.2098
4 12.3039 12.5652 9.8039 8.5652 168.4808 178.4291 72.3640 105.0656
Table 9. The MLE of the model’s parameters for patients who received a placebo’s times of remission.
Table 9. The MLE of the model’s parameters for patients who received a placebo’s times of remission.
Fitting Distribution Estimate Parameters Akaike Information Criterion
λ δ
Fréchet Distribution 15.50508 12.18451 5.58502
Length-biased Fréchet Distribution 30.18082 1.5 11.4393
NMF Distribution 2.191814 1.685673 3.662349
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

© 2024 MDPI (Basel, Switzerland) unless otherwise stated