An Extension of the Akash Distribution: Properties, Inference
and Application

Yolanda M. Gómez; Luis Firinguetti-Limone; Diego I Gallardo; Héctor W. Gómez

doi:10.20944/preprints202310.1196.v1

Submitted:

17 October 2023

Posted:

19 October 2023

You are already at the latest version

Abstract

In this article we introduce an extension of the Akash distribution. We use the slash 1 methodology to make the kurtosis of the Akash distribution more flexible. We study the general 2 density of this new distribution, some properties, moments, coefficients of asymmetry and kurtosis. 3 Statistical inference is performed using the methods of moments and maximum likelihood via the 4 EM algorithm. A simulation study is carried out to observe the behavior of the maximum likelihood 5 estimator. An application to a real data set with high kurtosis is considered, where it is shown that 6 the new distribution fits better than other extensions of the Akash distribution.

Keywords:

Akash distribution

;

Kurtosis

;

Maximum likelihood estimation

;

Slash distribution

Subject:

Computer Science and Mathematics - Probability and Statistics

1. Introduction

The slash distribution is an extension of the normal distribution. Its representation is the quotient between two independent random variables, one normal and the other a power of the uniform distribution. In this way we say that S has a slash distribution if:

\begin{matrix} S & = & U_{1} / U_{2}, \end{matrix}

(1)

where

U_{1} \sim N (0, 1)

,

U_{2} \sim B e t a (q, 1),

U_{1}

is independent of

U_{2}

and

q > 0

, its representation can be seen in Johnson et al. [1]. This distribution has heavier tails than the normal distribution, that is, it has greater kurtosis. Properties of this family are discussed in Rogers and Tukey [2] and Mosteller and Tukey [3]. The maximum likelihood estimation of the location and scale parameters are discussed in Kafadar [4]. Wang and Genton [5] provide a multivariate version of the slash distribution and a multivariate skew version. Gomez et al. [6] and Gómez and Venegas [7] extend the slash distribution using the family of univariate and multivariate elliptic distributions. This methodology to increase the weight of the queues has also been used in distributions with positive support, for example, Gómez et al. [8] in the Birnbaum-Saunders distribution, Olmos et al. [9,10] in the half-normal and generalized half-normal distributions, Astorga et al. [11] in the Muth power distribution and Rivera et al. [12] in the Rayleigh distribution, among others.

Based on the work of Rivera et al. [12], the scale mixture of Rayleigh (SMR) distribution is introduced. We say that

Y \sim S M R (θ, q)

with

θ > 0

and

q > 0

. Then the probability density function (pdf) of Y is

\begin{matrix} f_{Y} (y; θ, q) = \frac{q y}{2 θ {(\frac{y^{2}}{2 θ} + 1)}^{\frac{q}{2} + 1}}, y > 0 . \end{matrix}

(2)

Also, a necessary distribution in the development of this paper is the gamma distribution, whose pdf is given by

g (t; a, b) = \frac{b^{a}}{Γ (a)} t^{a - 1} e^{- b t},

(3)

where

a, b, t > 0

. Its corresponding cumulative distribution function (cdf) is denoted by:

G (z; a, b) = \int_{0}^{z} g (t; a, b) d t

(4)

Shanker [13] introduced the Akash distribution and applied it to real lifetime data sets from medical science and engineering. Thus, we say that a random variable Y has an Akash distribution (AK) with shape parameter

θ

if its pdf is given by

f_{Y} (y; θ) = \frac{θ^{3}}{θ^{2} + 2} (1 + y^{2}) exp (- θ y),

(5)

where

θ, y > 0

and we denote it by

Y \sim A K (θ) .

The parameter

θ

is a shape parameter, and if we add a scale parameter the pdf is given by

f_{Y} (y; σ, θ) = \frac{θ^{3}}{σ (θ^{2} + 2)} (1 + y^{2} / σ^{2}) exp (- θ y / σ),

(6)

where

σ > 0

is a scale parameter,

θ > 0

is a shape parameter and we denote it by

Y \sim A K (σ, θ) .

Extensions of the AK distribution are carried out by Shanker and Shukla [14,15], among others. Both extensions consider adding a parameter and we will compare them with the new distribution. The two-parameter Akash distribution (TPAD) introduced by Shanker and Shukla [14] has the following pdf:

f_{Y} (y; θ, α) = \frac{θ^{3}}{α θ^{2} + 2} (α + y^{2}) exp (- θ y),

(7)

where

θ, α, y > 0

and we denote it by

Y \sim T P A D (θ, α) .

The power Akash distribution (PAD), introduced by Shanker and Shukla [15], has the following pdf:

f_{Y} (y; θ, α) = \frac{α θ^{3}}{θ^{2} + 2} (1 + α y^{2 α}) y^{α - 1} exp (- θ y^{α}),

(8)

where

θ, α, y > 0

and we denote it by

Y \sim P A D (θ, α) .

The main objective of this paper is to introduce an extension of the AK distribution given in (6), making use of the slash methodology, in order to obtain a new distribution with greater kurtosis to be able to accomcodate outliers.

The paper evolves as follows: In Section 2 we deliver the new distribution and present its properties. In Section 3 we perform inference using the method of moments and maximum likelihood via the EM algorithm, a simulation study is also carried out. In Section 4 we apply the distribution to a real data set and compare it with other extensions of the AK distribution. In Section 5 we provide some conclusions.

2. New density and its properties

In this section we introduce the representation, density and properties of the new distribution.

2.1. Representation

The representation of this new distribution is given by

X = \frac{Y}{Z},

(9)

where

Y \sim A K (θ)

,

Z \sim B e t a (q, 1)

, Y and Z are independent random variables with

θ, q > 0

. We name the distribution of X slash AK (SAK) and denote it by

X \sim S A K (θ, q)

.

2.2. Density function

The following Proposition shows the pdf of the SAK distribution is generated using the representation given in (9).

Proposition 1.

Let

X \sim S A K (θ, q)

. Then, the pdf of X is given by

f_{X} (x; θ, q) = \frac{q^{2} Γ (q) x^{- (q + 1)}}{(θ^{2} + 2) θ^{q}} \{θ^{2} G (θ x; q + 1, 1) + (q + 1) (q + 2) G (θ x; q + 3, 1)\},

(10)

where

θ, q, x > 0

and G is the cdf of the gamma distribution given in

(4)

.

Proof.

Using the representation given in (9) and procedures based on the Jacobian method, we get the result. □

Observation 1.

Table 1 and Figure 1 show that as the value of the parameter q diminishes, the weight of the right tail increases.

In particular, Table 1 compares

P (X > x)

in the AK and SAK distributions for different values of x.

2.3. Properties

The following Proposition gives the cdf in closed form. It depends on G, which is the cdf of the gamma distribution given in

(4)

.

Proposition 2.

Let

X \sim S A K (θ, q)

. Then, the cdf of X is given by

F_{X} (x; θ, q) = \frac{(θ^{2} + 2 G (θ x; 3, 1)) {(θ x)}^{q} - θ^{3} q Γ (q) G (θ x; q, 1) - Γ (q + 3) G (θ x; q + 3, 1)}{(θ^{2} + 2) {(θ x)}^{q}},

(11)

where

θ, q, x > 0

and G is given in

(4)

.

Proof.

The result follows from a direct application of the definition of a cdf. □

2.3.1. Reliability analysis

The reliability function

r (t) = 1 - F (t)

and the hazard function

h (t) = \frac{f (t)}{r (t)}

of the SAK distribution are given in the following corollary.

Corollary 1.

Let

T \sim S A K (θ, q)

. Then, the

r (t)

and

h (t)

of T are given by

1.: $r (t) = 1 - \frac{(θ^{2} + 2 G (θ t; 3, 1)) {(θ t)}^{q} - θ^{3} q Γ (q) G (θ t; q, 1) - Γ (q + 3) G (θ t; q + 3, 1)}{(θ^{2} + 2) {(θ t)}^{q}},$
2.: $h (t) = \frac{q^{2} Γ (q) (θ^{2} G (θ t; q + 1, 1) + (q + 1) (q + 2) G (θ t; q + 3, 1))}{t (2 (1 - G (θ t; 3, 1)) {(θ t)}^{q} + θ^{3} q G a m m a (q) G (θ t; q, 1) - Γ (q + 3) G (θ t; q + 3, 1))},$

where

θ, q > 0

.

In Figure 2, we introduce the Hazard function of the SAK distribution for several values of

σ

and q.

2.3.2. Right tail of the SAK distribution

According to Rolski et al. [16] a distribution has a heavy right tail if

\begin{matrix} \underset{t \to \infty}{lim sup} (- \frac{log r (t)}{t}) = 0 . \end{matrix}

The following result shows that the SAK distribution is heavy-tailed.

Proposition 3.

The distribution of the random variable

T \sim S A K (θ, q)

is heavy-tailed.

Proof.

Applying L’Hospital’s rule twice we have,

\begin{matrix} \underset{t \to \infty}{lim sup} (- \frac{log r (t)}{t}) & = & \underset{t \to \infty}{lim sup} (\frac{f_{T} (t; σ, q)}{1 - F_{T} (t; σ, q)}) \\ = & \underset{t \to \infty}{lim sup} (\frac{q + 1}{t} - \frac{θ^{3} g (θ t; q + 1, 1) + (q + 1) (q + 2) θ g (θ t; q + 3, 1)}{θ^{2} G (θ t, q + 1, 1) + (q + 1) (q + 2) G (θ t, q + 3, 1)}) \\ = & 0 . \end{matrix}

□

The following Proposition shows that the SAK distribution can be represented as a scale mixture between the AK and Beta distributions.

Proposition 4.

If

X | Z = z \sim A K (z^{- 1}, θ)

and

Z \sim B e t a (q, 1)

then

X \sim S A K (θ, q)

.

Proof.

The marginal pdf of X is given by

\begin{matrix} f_{X} (x; θ, q) = \int_{0}^{1} f_{X | Z} (x) f_{Z} (z) d z = \frac{θ^{3}}{θ^{2} + 2} \int_{0}^{1} z (1 + z^{2} x^{2}) exp (- θ z x) q z^{q - 1} d z, \end{matrix}

and using

(3)

,

(4)

and

(5)

, this result is obtained. □

The following result shows that when the parameter q tends to infinity, the AK distribution is obtained.

Proposition 5.

Let

X \sim S A K (θ, q)

. If

q \to \infty

then X converges in law to a random variable

Y \sim A K (θ)

.

Proof.

Using its representation

X = \frac{Y}{Z}

we analyze the convergence of this quotient, where

Y \sim A K (θ)

and

Z \sim

Beta

(q, 1)

. In the Beta(

q, 1

) distribution we have,

E [Z] = \frac{q}{1 + q}

and

V a r [Z] = \frac{q}{(q + 2) {(q + 1)}^{2}}

. Then, applying Chebychev’s inequality for Z, we have

\forall ϵ > 0

P [| Z - E [Z] | > ϵ] \leq \frac{V a r (Z)}{ϵ^{2}} = \frac{q}{(q + 2) {(q + 1)}^{2} ϵ^{2}} .

(12)

If

q \to \infty

then the right hand side of (12) tends to zero, i.e.

W = Z - E [Z]

converges in probability to 0. Also

E [Z] = \frac{q}{1 + q} ⟶ 1, q \to \infty

, then we have,

Z = W + E [Z] \overset{P}{⟶} 1, q \to \infty .

Since

Y \sim A K (θ)

, applying Slutsky’s Lemma to

X = \frac{Y}{Z}

, we have

X \overset{L}{⟶} Y \sim A K (θ), q \to \infty .

Thus, for increasing values of q, X converges in law to a

A K (θ)

distribution. □

2.3.3. Moments

In this subsection we obtain the moments of the SAK distribution. To achieve this aim, the next lemma will be useful.

Lemma 1.

Let

Y \sim A K (σ, θ)

with

σ, θ > 0

. For

r > 0

,

E [Y^{r}]

exists if and only if

q > r

and in this case

\begin{matrix} E [Y^{r}] & = & \frac{σ^{r} (r! θ^{2} + (r + 2)!)}{θ^{r} (θ^{2} + 2)} . \end{matrix}

(13)

Proof.

The r-th moment of the random variable

V \sim A K (θ)

is given by Shanker [13], which is

E (V^{r}) = \frac{r! θ^{2} + (r + 2)!}{θ^{r} (θ^{2} + 2)}

, then calculating the r-th moment of the random variable

Y = σ V

, where

σ

is a parameter of scale, the result is obtained. □

The moments of a SAK distribution are given in the following Proposition 6,

Proposition 6.

Let

X \sim S A K (θ, q)

with σ and

q > 0

. For

r > 0

,

E [X^{r}]

exists if and only if

q > r

and in this case

\begin{matrix} μ_{r} = E [X^{r}] & = & \frac{q (r! θ^{2} + (r + 2)!)}{θ^{r} (θ^{2} + 2) (q - r)} . \end{matrix}

(14)

Proof.

Using the representation given in the Proposition 4 and by Lemma 1, we get

\begin{matrix} μ_{r} = E [X^{r}] & = & E [E (X^{r} | Z)] = E [\frac{Z^{- r} (r! θ^{2} + (r + 2)!)}{θ^{r} (θ^{2} + 2)}] \\ = & \frac{r! θ^{2} + (r + 2)!}{θ^{r} (θ^{2} + 2)} E [Z^{- r}] = \frac{r! θ^{2} + (r + 2)!}{θ^{r} (θ^{2} + 2)} \int_{0}^{1} q z^{q - r - 1} d z . \end{matrix}

Solving the integral gives the result. □

From Proposition 6, the explicit expression of the noncentral moments,

μ_{r} = E [X^{r}]

, for

r = 1, 2, 3, 4

and the variance of

X \sim S A K (θ, q)

,

V a r (X)

follow.

Corollary 2.

Let

X \sim S A K (θ, q)

with θ and

q > 0

. From (14), the following noncentral moments and the variance of X,

V a r (X)

, are obtained

\begin{matrix} μ_{1} = \frac{q κ_{6}}{θ κ_{2} (q - 1)}, q > 1, μ_{2} = \frac{2 q κ_{12}}{θ^{2} κ_{2} (q - 2)}, q > 2, \\ μ_{3} = \frac{6 q κ_{20}}{θ^{3} κ_{2} (q - 3)}, q > 3, μ_{4} = \frac{24 q κ_{30}}{θ^{4} κ_{2} (q - 4)}, q > 4, \end{matrix}

\begin{matrix} V a r (X) = \frac{q [2 κ_{12} κ_{2} {(q - 1)}^{2} - q κ_{6}^{2} (q - 2)]}{θ^{2} κ_{2}^{2} {(q - 1)}^{2} (q - 2)}, q > 2 . \end{matrix}

where

κ_{i} = θ^{2} + i .

Remark 1.

Note that when

q \to \infty

,

V a r (X) \to \frac{θ^{4} + 16 θ^{2} + 12}{θ^{2} {(θ^{2} + 2)}^{2}}

, which is the variance of an

A K (θ)

distribution.

The next Corollary gives the asymmetry coefficient,

\sqrt{β_{1}}

, of a

S A K (θ, q)

model.

Corollary 3.

Let

X \sim S A K (θ, q)

with

θ > 0

and

q > 3

. Then the skewness coefficient of X is:

\begin{matrix} \sqrt{β_{1}} & = & \frac{2 \sqrt{q - 2} [3 κ_{20} κ_{2}^{2} {(q - 1)}^{3} (q - 2) - 3 q κ_{2} κ_{6} κ_{12} {(q - 1)}^{2} (q - 3) + q^{2} κ_{6}^{3} (q - 2) (q - 3)]}{\sqrt{q} (q - 3) {[2 κ_{2} κ_{12} {(q - 1)}^{2} - q (q - 2) κ_{6}^{2}]}^{3 / 2}} \end{matrix}

Proof.

Recall that

\begin{matrix} \sqrt{β_{1}} = \frac{E [{(X - E (X))}^{3}]}{{(V a r (X))}^{3 / 2}} = \frac{μ_{3} - 3 μ_{1} μ_{2} + 2 μ_{1}^{3}}{{(μ_{2} - μ_{1}^{2})}^{3 / 2}}, \end{matrix}

where

μ_{1}

,

μ_{2}

and

μ_{3}

were given in Corollary 2. □

Also, the kurtosis coefficient,

β_{2}

, of a

S A K (θ, q)

distribution is given in the following Corollary.

Corollary 4.

Let

X \sim S A K (θ, q)

with

θ > 0

and

q > 4

. Then the kurtosis coefficient of X is

\begin{matrix} β_{2} & = & \frac{3 (q - 2) (8 κ_{2}^{3} κ_{30} q_{1} - 8 q κ_{6} κ_{20} κ_{2}^{2} q_{2} + 4 q^{2} κ_{6}^{2} κ_{12} κ_{2} q_{3} - q^{3} κ_{6}^{4} q_{4})}{q (q - 3) (q - 4) {[2 κ_{12} κ_{2} {(q - 1)}^{2} - q κ_{6}^{2} (q - 2)]}^{2}} . \end{matrix}

where

q_{1} = {(q - 1)}^{4} (q - 2) (q - 3)

,

q_{2} = {(q - 1)}^{3} (q - 2) (q - 4)

,

q_{3} = {(q - 1)}^{2} (q - 3) (q - 4)

and

q_{4} = (q - 2) (q - 3) (q - 4) .

Proof.

Recall that

\begin{matrix} β_{2} = \frac{E [{(X - E (X))}^{4}]}{{(V a r (X))}^{2}} = \frac{μ_{4} - 4 μ_{1} μ_{3} + 6 μ_{1}^{2} μ_{2} - 3 μ_{1}^{4}}{{(μ_{2} - μ_{1}^{2})}^{2}}, \end{matrix}

where

μ_{1}

,

μ_{2}

,

μ_{3}

, and

μ_{4}

were given in Corollary 2. □

Remark 2.

It can be verified that for

q \to \infty

skewness and kurtosis coefficients converge to

\frac{2 (θ^{6} + 30 θ^{4} + 36 θ^{2} + 24)}{{(θ^{4} + 16 θ^{2} + 12)}^{3 / 2}}

and

\frac{3 (3 θ^{8} + 128 θ^{6} + 408 θ^{4} + 576 θ^{2} + 240)}{{(θ^{4} + 16 θ^{2} + 12)}^{2}}

respectively, which coincide with the corresponding coefficients for the AK(θ) distribution (see Shanker, 2015).

The results of Table 2 show that the values of the skewness and kurtosis coefficients depend on the parameters

θ

and q and that as q decreases, the skewness and kurtosis coefficients increase. On the other hand, as q increases, the skewness and kurtosis coefficients are those of the AK(

θ

) distribution (Proposition 5).

3. Inference

In this section we study the estimation the parameters by the method of moments and ML via the EM algorithm. We also carry out some simulations to study the behavior of the ML estimators.

3.1. Method of moment estimators

Let

X_{1}, . . ., X_{n}

be a random sample from

X \sim S A K (θ, q)

. Consider the first two sample moments, denoted by

\bar{X} = \frac{\sum_{i = 1}^{n} X_{i}}{n}

and

\bar{X^{2}} = \frac{\sum_{i = 1}^{n} X_{i}^{2}}{n}

, respectively.

Proposition 7.

Given

X_{1}, \dots, X_{n}

a random sample from

X \sim S A K (θ, q)

with

q > 2

, the moment method estimators of θ and q are

\begin{matrix} {\hat{q}}_{M} = \frac{\bar{X} {\hat{θ}}_{M} ({\hat{θ}}_{M}^{2} + 2)}{{\hat{θ}}_{M} ({\hat{θ}}_{M}^{2} + 2) \bar{X} - {\hat{θ}}_{M}^{2} - 6}, \end{matrix}

(15)

\begin{matrix} \bar{X^{2}} {\hat{θ}}_{M} [2 ({\hat{θ}}_{M}^{2} + 6) - {\hat{θ}}_{M} \bar{X} ({\hat{θ}}_{M}^{2} + 2)] - 2 \bar{X} ({\hat{θ}}_{M}^{2} + 12) = 0, \end{matrix}

(16)

where (16) must be solved numerically to obtain

{\hat{θ}}_{M}

. Then

{\hat{θ}}_{M}

must be replaced in (15) to get

{\hat{q}}_{M}

.

Proof.

Consider the method of moment equations

\begin{matrix} E [X] = \frac{q (θ^{2} + 6)}{θ (θ^{2} + 2) (q - 1)} & = & \bar{X} \end{matrix}

(17)

\begin{matrix} E [X^{2}] = \frac{2 q (θ^{2} + 12)}{θ^{2} (θ^{2} + 2) (q - 2)} & = & \bar{X^{2}} \end{matrix}

(18)

Solving the equation (17) for the parameter q we obtain (15). Then the value of

{\hat{q}}_{M}

is substituted into the equation (18) and the equation given in (16) is obtained. □

3.2. ML estimation

Let

X_{1}, \dots, X_{n}

be a random sample from

X \sim S A K (θ, q)

. Then the log-likelihood function is

\begin{matrix} l (θ, q) & = & c (θ, q) - (q + 1) \sum_{i = 1}^{n} log (x_{i}) + \sum_{i = 1}^{n} log [θ^{2} G (θ x_{i}; q + 1, 1) + (q + 1) (q + 2) G (θ x_{i}; q + 3, 1)] \end{matrix}

where

c (θ, q) = 2 n log (q) + n log (Γ (q)) - n log (θ^{2} + 2) - n q log (θ) .

Taking partial derivatives in

l (θ, q)

with respect to

θ

and q and setting them equal to zero, we get

\begin{matrix} \sum_{i = 1}^{n} \frac{2 θ G (θ x_{i}; q + 1, 1) + θ^{2} J (x_{i}, q + 1) + (q + 1) (q + 2) J (x_{i}, q + 3)}{θ^{2} G (θ x_{i}; q + 1, 1) + (q + 1) (q + 2) G (θ x_{i}; q + 3, 1)} = \frac{2 n θ}{θ^{2} + 2} + \frac{n q}{θ}, \\ \sum_{i = 1}^{n} \frac{θ^{2} H (x_{i}; q + 1) + (2 q + 3) G (θ x_{i}; q + 3, 1) + (q + 1) (q + 2) H (x_{i}; q + 3)}{θ^{2} G (θ x_{i}; q + 1, 1) + (q + 1) (q + 2) G (θ x_{i}; q + 3, 1)} = η (θ, q) - \sum_{i = 1}^{n} log (x_{i}), \end{matrix}

where

J (x_{i}, m) = x_{i} g (θ x_{i}; m, 1)

,

H (x_{i}; v) = \int_{0}^{θ x_{i}} log (t) g (t; v, 1) d t - ψ (v) G (θ x_{i}; v, 1)

and

η (θ, q) = \frac{2 n}{q} + n (ψ (q) - log (θ)) .

Solving numerically this system of equations to find the ML estimates may be a difficult task due to the functions it involves. However, an EM algorithm can be implemented (see Dempster et al. [17]) to obtain the ML estimates. The following subsection is dedicated to achieving this goal.

3.3. EM Algorithm

An alternative stochastic representation for the SAK model is given by

\begin{matrix} X_{i} ∣ U_{i} = u_{i}, Z_{i} = z_{i} & \sim G (1 + 2 u_{i}, θ z_{i}), \\ U_{i} & \sim B e r n (\frac{2}{θ^{2} + 2}), \\ Z_{i} & \sim B e t a (q, 1) . \end{matrix}

where

U_{i}

and

Z_{i}

,

i = 1, \dots, n,

represent non-observable variables. This representation can be used for an alternative estimation procedure based on the EM algorithm (Dempster et al. [17]). In this context, the observed data are given by

D_{o} = x^{⊤}

, where

x^{⊤} = (x_{1}, \dots, x_{n})

. The vectors

z^{⊤} = (z_{1}, \dots, z_{n})

and

u^{⊤} = (u_{1}, \dots, u_{n})

are the latent variables and the vector

D_{c} = {(x^{⊤}, z^{⊤}, u^{⊤})}^{⊤}

are the complete data. Note that the joint distribution of

(X_{i}, U_{i}, Z_{i})

is given by

\begin{matrix} f (x_{i}, u_{i}, z_{i}) & = f (x_{i} ∣ u_{i}, z_{i}) \times f (u_{i}) \times f (z_{i}) \\ = \frac{{(θ z_{i})}^{1 + 2 u_{i}}}{Γ (1 + 2 u_{i})} x_{i}^{2 u_{i}} e^{- θ z_{i} x_{i}} \times {(\frac{2}{θ^{2} + 2})}^{u_{i}} {(\frac{θ^{2}}{θ^{2} + 2})}^{1 - u_{i}} \times q z_{i}^{q - 1} \\ = \frac{q θ^{3} z_{i}^{2 u_{i} + q} 2^{u_{i}}}{(θ^{2} + 2) Γ (1 + 2 u_{i})} x_{i}^{2 u_{i}} e^{- θ z_{i} x_{i}} . \end{matrix}

Therefore, up to a constant that does not depend on the vector of parameters

ψ = (θ, q)

, the complete log-likelihood function for the model is given by

\begin{matrix} ℓ_{c} (ψ; D_{c}) = n [log q + 3 log θ - log (θ^{2} + 2)] + \sum_{i = 1}^{n} [q log z_{i} - θ x_{i} z_{i}] . \end{matrix}

With this, the expected value of

ℓ_{c} (ψ; D_{c})

, given the observed data, is

\begin{matrix} Q (ψ ∣ ψ^{(k)}) = n [log q + 3 log θ - log (θ^{2} + 2)] + \sum_{i = 1}^{n} [q {\hat{κ}}_{i}^{(k)} - θ x_{i} {\hat{z}}_{i}^{(k)}], \end{matrix}

where

{\hat{z}}_{i}^{(k)} = E (Z_{i} ∣ x_{i}, ψ = {\hat{ψ}}^{(k)})

and

{\hat{κ}}_{i}^{(k)} = E (log Z_{i} ∣ x_{i}, ψ = {\hat{ψ}}^{(k)})

. Note that

f (z_{i}, u_{i} ∣ x_{i}) \propto \underset{Z_{i} ∣ u_{i}, x_{i} \sim T G_{(0, 1)} (2 u_{i} + q + 1, θ x_{i})}{\underset{︸}{\frac{{(θ x_{i})}^{2 u_{i} + q + 1}}{Γ (2 u_{i} + q + 1)} \frac{z_{i}^{(2 u_{i} + q + 1) - 1} e^{- θ x_{i} z_{i}}}{G (1; 2 u_{i} + q + 1, θ x_{i})}}} \times \underset{U_{i} ∣ x_{i} \sim B e r n (ν_{i})}{\underset{︸}{\frac{Γ (2 u_{i} + q + 1)}{Γ (2 u_{i} + 1)} {(\frac{2}{θ^{2}})}^{u_{i}} G (1; 2 u_{i} + q + 1, θ x_{i})}},

(19)

where

ν_{i} = Γ (q + 3) G (θ x_{i}; q + 3) / [θ^{2} Γ (q + 1) G (θ x_{i}; q + 1) + Γ (q + 3) G (θ x_{i}; q + 3)]

,

G (x; a) = \int_{0}^{x} \frac{1}{Γ (a)} t^{a - 1} e^{- t} d t

is the cdf for the gamma model and

{TG}_{(0, 1)} (a, b)

denotes de gamma distribution with shape a and rate b truncated in the interval (0,1). Therefore, using properties of conditional expectations, we have

E (Z_{i} ∣ x_{i}) = E (E (Z_{i} ∣ U_{i}, x_{i}) ∣ x_{i})

and by (19) such expectations are simple to be computed. In a similar manner, we can compute

E (log Z_{i} ∣ x_{i})

, obtaining as results

\begin{matrix} E (Z_{i} ∣ x_{i}) & = \frac{ν_{i} (q + 3) G (θ x_{i}, q + 4)}{θ x_{i} G (θ x_{i}, q + 3)} + \frac{(1 - ν_{i}) (q + 1) G (θ x_{i}, q + 2)}{θ x_{i} G (θ x_{i}, q + 1)}, \\ E (log Z_{i} ∣ x_{i}) & = \frac{ν_{i}}{Γ (q + 3) G (1; q + 3, θ x_{i})} \int_{0}^{θ x_{i}} log (\frac{w_{i}}{θ x_{i}}) w_{i}^{q + 2} e^{- w_{i}} d w_{i} \end{matrix}

(20)

\begin{matrix} + \frac{(1 - ν_{i})}{Γ (q + 1) G (1; q + 1, θ x_{i})} \int_{0}^{θ x_{i}} log (\frac{w_{i}}{θ x_{i}}) w_{i}^{q} e^{- w_{i}} d w_{i} . \end{matrix}

(21)

Therefore, the kth iteration of the algorithm comprises the following steps:

E-step: Given ${\hat{θ}}^{(k - 1)}$ and ${\hat{q}}^{(k - 1)}$ , for $i = 1, \dots, n$ compute ${\hat{z}}_{i}^{(k)}$ and ${\hat{κ}}_{i}^{(k)}$ using equations (20) and (21), respectively.
M1-step: Update ${\hat{q}}^{(k)}$ as

${\hat{q}}^{(k)} = \frac{- n}{\sum_{i = 1}^{n} {\hat{κ}}_{i}^{(k)}} .$
M2-step: Update ${\hat{θ}}^{(k)}$ as the solution for the non-linear equation

$\frac{3}{θ} - \frac{2 θ}{θ^{2} + 2} = \frac{1}{n} \sum_{i = 1}^{n} x_{i} {\hat{z}}_{i}^{(k)} .$

The E, M1 and M2 steps are repeated until convergence is obtained, i.e. until the maximum distance between the estimate obtained in two consecutive iterations is less than a specified value.

In the following subsection we run some simulations to study the behavior of the ML estimators.

3.4. Simulation study

Table 3 shows the empirical bias (bias), the average of the standard errors (SE), the root of the empirical mean squared error (RMSE), and the 95% coverage probability (CP) based on the asymptotic distribution for the ML estimators of the parameters. Table 3 shows that the performance of the estimator improves as n increases.

4. Application

In this section we analyze a real data set showing that the SAK distribution can be more appropriate than other commongly used distributions to model heavy right-tailed data. The data correspond to plasma beta-carotene levels (ng/ml) of 314 patients. This data set contains 14 variables and is available online at “http://Lib.stat.cmu.edu/datasets/Plasma Retinol”. In this study, we consider the variable Betaplasma. The medical interest in this variable comes from the fact that low levels of plasma beta-carotene may be associated with higher risk of developing certain types of cancer. In Table 4 we present some descriptive statistics including the sample skewness,

b_{1}

, and sample and kurtosis

b_{2}

. We may observe high kurtosis in this data set.

The moment estimates for the parameters of the SAK distribution are

{\hat{θ}}_{M} = 0.025

and

{\hat{q}}_{M} = 2.810

. These estimates are useful starting values, required to implement maximum likelihood estimation using numerical methods. Table 5 shows the ML estimates for the parameters of the PAD, SMR and SAK models. For each model we report the value of the log-likelihood. It can be seen that the SAK model presented a larger value of log-likelihood than the other models.

In order to compare the fit of the distributions, we considered the usual Akaike information criterion (AIC), introduced by Akaike [18], and the Bayesian information criterion (BIC), proposed by Schwarz [19]. It is known that AIC=

2 k - 2 log l i k

and BIC=

k log n - 2 log l i k

where k is the number of parameters in the model, n is the sample size and

log l i k

is the maximized value of the log-likelihood function. Table 6 shows the AIC and BIC for each model, indicating that the SAK distribution leads to a better fit than the other distributions. Figure 3 presents the histogram for the data together with the fitted densities.

We also calculated the quantile residuals (QR). If the model is appropriate for the data, the QR should be a sample of the standard normal distribution (see Dunn and Smyth, [20]). This assumption can be validated with traditional normality tests, such as the Anderson-Darling (AD), Cramér-von Mises (CVM) and Shapiro-Wilkes (SW) tests. Figure 4 shows the qqplot of the quantile residuals of the three fitted distributions. All three tests suggest that the SAK model provides a better fit for this data set.

5. Discussion

This paper presents an extension of the AK distribution based on the slash methodology. Some properties of this new distribution are derived. It is also compared with two other distributions using a real data set. Estimation is done through ML via the EM algorithm. The new SAK distribution is an alternative to fit heavy-tailed right-skewed data. Additional features of the SAK distribution are:

The distribution has two representations, one based on the quotient of two independent random variables and another based on a scale mixture between the AK and Beta distributions.
The pdf, cdf and hazard function of the SAK distribution are explicit and are represented by the cdf of the gamma distribution.
The distribution has a heavy right tail.
The distribution contains the AK distribution as a limit, that is, when the parameter q tends to infinity in the distribution SAK, the AK distribution is obtained.
The moments and the coefficients of skewness and kurtosis are explicit.
In the application, observing the AIC and BIC and the Anderson-Darling, Cramér-von Mises and Shapiro-Wilkes statistical tests, we may conclude that the SAK distribution fits the Betaplasma data set better than the PAD and SMR distributions, which are also extensions of the AK distribution.

References

Jonhson, N.L., Kotz, S., and Balakrishnan, N.1995.Continuous univariate distributions. Vol 1, 2nd edn. New York: Wiley.
Rogers, W.H., Tukey, J.W. 1972. Understanding some long-tailed symmetrical distributions. Statist. Neerlandica 26 :211-226. [CrossRef]
Mosteller, F., Tukey, J.W. 1977. Data analysis and regression. Addison-Wesley, Reading, MA.
Kafadar, K. 1982. A biweight approach to the one-sample problem. J. Amer. Statist. Assoc. 77 :416-424. [CrossRef]
Wang, J., Genton, M.G. 2006. The multivariate skew-slash distribution. Journal Statistical Planning and Inference 136 :209-220. [CrossRef]
Gómez, H.W., Quintana, F.A., Torres, F. J. 2007. A New Family of Slash-Distributions with Elliptical Contours. Statistics and Probability Letters 77(7):717-725. [CrossRef]
Gómez, H.W., Venegas, O. 2008. Erratum to: A new family of slash-distributions with elliptical contours [Statist. Probab. Lett. 77 (2007) 717-725]. Statistics and Probability Letters 78 (14):2273-2274. [CrossRef]
Gómez, H.W., Olivares-Pacheco, J.F., Bolfarine, H. 2009. An extension of the generalized birnbaun-saunders distribution. Statistics and Probability Letters 79 (3):331-338. [CrossRef]
Olmos, N.M., Varela, H., Gómez, H.W. and Bolfarine, H. 2012. An extension of the half-normal distribution. Statistical Papers 53 :875-886. [CrossRef]
Olmos, N.M., Varela, H., Bolfarine, H., Gómez, H.W. 2014. An extension of the generalized half-normal distribution. Statistical Papers 55:967-981. [CrossRef]
Astorga, J.M., Reyes, J., Santoro, K.I., Venegas, O., Gómez, H.W. 2020. A Reliability Model Based on the Incomplete Generalized Integro-Exponential Function. Mathematics 8:1537. [CrossRef]
Rivera P.A., Barranco-Chamorro I., Gallardo D.I., Gómez H.W. 2020. Scale Mixture of Rayleigh Distribution. Mathematics 8(10):1842. [CrossRef]
Shanker, R. 2015. Akash Distribution and Its Applications. International Journal of Probability and Statistics 4(3):65-75. [CrossRef]
Shanker, R., Shukla, K.K. 2017a. On Two-Parameter Akash Distribution. Biometrics & Biostatistics International Journal 6(5):00178. [CrossRef]
Shanker, R., Shukla, K.K. 2017b. Power Akash Distribution and Its Application. Journal of Applied Quantitative Methods 12(3):1-10.
Rolski, T., H. Schmidli, V. Schmidt, and J. Teugel. 1999. Stochastic Processes for Insurance and Finance. John Wiley & Sons.
Dempster. A.P.; Laird. N.M.; Rubim. D.B. 1977. Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. R. Stat. Soc. Ser. B, 39, 1–38.
Akaike, H. 1974. A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716-723. [CrossRef]
Schwarz, G. 1978. Estimating the dimension of a model. Ann. Statist., 6(2), 461-464. [CrossRef]
Dunn, P.K., Smyth, G.K. 1996. Randomized Quantile Residuals. Journal of Computational and Graphical Statistics, 5(3), 236-244. [CrossRef]

Figure 1. Left side: Examples of the SAK(

1, 1

) (in black), SAK(

1, 5

) (in blue), SAK(

1, 10

) (in red). Right side: Examples of the SAK(

0.5, 1

) (in black), SAK(

0.5, 5

) (in blue), SAK(

0.5, 10

) (in red).

Figure 1. Left side: Examples of the SAK(

1, 1

) (in black), SAK(

1, 5

) (in blue), SAK(

1, 10

) (in red). Right side: Examples of the SAK(

0.5, 1

) (in black), SAK(

0.5, 5

) (in blue), SAK(

0.5, 10

) (in red).

Figure 2. Hazard function of the SAK(

0.5, 1

) distribution (in black), SAK(

0.5, 2

) distribution (in blue), SAK(

0.5, 3

) distribution (in red).

Figure 2. Hazard function of the SAK(

0.5, 1

) distribution (in black), SAK(

0.5, 2

) distribution (in blue), SAK(

0.5, 3

) distribution (in red).

Figure 3. Betaplasma: Histogram and fitted PAD pdf (in red), SMR pdf (in blue) and SAK pdf (in black).

Figure 4. qqplots of the quantile residuals for the fitted models, together with the p-values of the AD, CVM and SW normality tests.

Table 1. Tail comparisons of the AK and SAK distributions.

Distribution	$P (X > 5)$	$P (X > 10)$	Distribution	$P (X > 15)$	$P (X > 20)$
SAK(1,1)	$0.443$	$0.233$	SAK(0.5,1)	$0.367$	$0.278$
SAK(1,5)	$0.162$	$0.015$	SAK(0.5,5)	$0.063$	$0.020$
SAK(1,10)	$0.120$	$0.005$	SAK(0.5,10)	$0.034$	$0.007$
AK(1)	$0.085$	$0.002$	AK(0.5)	$0.018$	$0.003$

Table 2. Skewness and kurtosis of the SAK distribution for various values of the shape parameters.

$θ$	q	$\sqrt{β_{1}}$	$β_{2}$
$0.5$	5	$1.974$	$16.574$
1		$1.952$	$15.180$
$0.5$	6	$1.570$	$9.039$
1		$1.596$	$8.650$
$0.5$	7	$1.391$	$7.009$
1		$1.438$	$6.863$
$0.5$	10	$1.201$	$5.460$
1		$1.271$	$5.470$
$0.5$	100	$1.085$	$4.788$
1		$1.166$	$4.837$
$0.5$	∞	$1.084$	$4.785$
1		$1.165$	$4.834$

Table 3. Estimated bias, SE, RMSE and CP of the ML estimators of the parameters of the SAK model for different values of n

			n = 30				n = 50				n = 100				n = 200				n = 500
$θ$	q	estimator	bias	SE	RMSE	CP	bias	SE	RMSE	CP	bias	SE	RMSE	CP	bias	SE	RMSE	CP	bias	SE	RMSE	CP
0.5	0.5	$\hat{θ}$	-0.002	0.119	0.124	0.914	-0.004	0.092	0.094	0.930	-0.001	0.065	0.066	0.937	0.000	0.046	0.046	0.946	0.000	0.029	0.029	0.947
		$\hat{q}$	0.036	0.122	0.139	0.961	0.025	0.092	0.100	0.958	0.012	0.063	0.065	0.952	0.005	0.043	0.044	0.952	0.001	0.027	0.027	0.951
	1.0	$\hat{θ}$	-0.004	0.110	0.114	0.918	-0.003	0.085	0.086	0.931	-0.002	0.060	0.061	0.940	-0.001	0.043	0.043	0.946	0.000	0.027	0.027	0.946
		$\hat{q}$	-0.159	0.236	0.253	0.924	-0.112	0.161	0.171	0.929	-0.087	0.108	0.115	0.939	-0.059	0.074	0.081	0.948	-0.046	0.046	0.051	0.948
	2.0	$\hat{θ}$	-0.003	0.105	0.107	0.931	-0.003	0.081	0.082	0.939	-0.002	0.057	0.058	0.940	-0.001	0.040	0.041	0.945	0.000	0.025	0.026	0.947
		$\hat{q}$	-0.137	0.597	0.622	0.904	-0.125	0.395	0.420	0.924	-0.077	0.233	0.250	0.932	-0.041	0.151	0.162	0.942	-0.023	0.092	0.095	0.948
3.0	0.5	$\hat{θ}$	0.136	1.063	1.236	0.891	0.095	0.794	0.861	0.915	0.035	0.537	0.556	0.927	0.013	0.373	0.380	0.940	0.005	0.234	0.235	0.947
		$\hat{q}$	0.059	0.156	0.206	0.963	0.030	0.110	0.124	0.958	0.015	0.075	0.079	0.955	0.009	0.052	0.054	0.953	0.003	0.032	0.033	0.952
	1.0	$\hat{θ}$	0.104	0.982	1.112	0.896	0.060	0.729	0.786	0.912	0.028	0.499	0.517	0.929	0.012	0.347	0.354	0.941	0.003	0.218	0.219	0.948
		$\hat{q}$	-0.087	0.398	0.446	0.892	-0.057	0.245	0.296	0.925	-0.021	0.145	0.188	0.938	-0.012	0.097	0.117	0.948	-0.002	0.060	0.066	0.947
	2.0	$\hat{θ}$	0.145	0.976	1.070	0.922	0.068	0.709	0.747	0.929	0.018	0.478	0.491	0.934	0.006	0.332	0.339	0.941	0.000	0.208	0.210	0.946
		$\hat{q}$	-0.105	1.025	1.090	0.915	-0.084	0.724	0.790	0.924	-0.069	0.440	0.485	0.935	-0.048	0.255	0.282	0.942	-0.008	0.140	0.155	0.948
10.0	0.5	$\hat{θ}$	0.595	4.688	5.331	0.882	0.291	3.484	3.709	0.901	0.126	2.400	2.470	0.925	0.088	1.684	1.706	0.942	0.019	1.056	1.049	0.944
		$\hat{q}$	0.069	0.175	0.184	0.964	0.035	0.113	0.128	0.963	0.016	0.075	0.080	0.957	0.007	0.052	0.053	0.951	0.003	0.032	0.033	0.951
	1.0	$\hat{θ}$	0.559	4.440	4.910	0.904	0.222	3.260	3.453	0.910	0.102	2.248	2.328	0.926	0.059	1.574	1.600	0.941	0.009	0.987	0.980	0.948
		$\hat{q}$	-0.097	0.508	0.631	0.899	-0.051	0.284	0.389	0.903	-0.031	0.152	0.199	0.939	-0.023	0.098	0.117	0.948	-0.012	0.060	0.080	0.948
	2.0	$\hat{θ}$	0.885	4.575	4.757	0.935	0.389	3.286	3.316	0.937	0.172	2.209	2.217	0.944	0.035	1.533	1.546	0.947	-0.006	0.955	0.955	0.947
		$\hat{q}$	-0.068	1.224	1.222	0.924	-0.057	0.834	0.950	0.931	-0.037	0.440	0.483	0.935	-0.027	0.305	0.313	0.942	-0.018	0.149	0.159	0.943

Table 4. Descriptive statistics for the data set.

n	$\bar{x}$	$s^{2}$	$b_{1}$	$b_{2}$
314	190.4968	33480.72	3.536562	16.8145

Table 5. ML estimates for PAD, SMR and SAK models.

Parameter estimates	PAD (SE)	SMR (SE)	SAK (SE)
$θ$	0.012 (0.003)	16998.167 (3399.076)	0.027 (0.002)
$α$	1.052 (0.038)	−	−
q	−	2.926 (0.385)	2.331 (0.294)
Log-likelihood	−1953.632	−1910.472	−1908.147

Table 6. AIC and BIC for fitted models.

Criterion	PAD	SMR	SAK
AIC	3911.264	3824.944	3820.294
BIC	3918.763	3832.443	3827.793

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

An Extension of the Akash Distribution: Properties, Inference and Application

Abstract

Keywords:

Subject:

1. Introduction

2. New density and its properties

2.1. Representation

2.2. Density function

2.3. Properties

2.3.1. Reliability analysis

2.3.2. Right tail of the SAK distribution

2.3.3. Moments

3. Inference

3.1. Method of moment estimators

3.2. ML estimation

3.3. EM Algorithm

3.4. Simulation study

4. Application

5. Discussion

References

MDPI Initiatives

Important Links

Subscribe