A Probability Generating Function Based Goodness-of-Fit Test for the Poisson–Three-Parameter Lindley Distribution

Francisco Novoa-Muñoz

doi:10.20944/preprints202606.0239.v1

Submitted:

02 June 2026

Posted:

03 June 2026

You are already at the latest version

Abstract

The Poisson–Three-Parameter Lindley (PTPL) distribution constitutes a flexible Poisson mixture model for overdispersed count data, encompassing several classical count distributions as special or limiting cases. Despite its growing use in applied contexts, no formal goodness-of-fit test specifically designed for this distribution is currently available. In this paper, we propose and study a new goodness-of-fit test for the PTPL model based on a Cramér-von Mises type distance between the empirical and theoretical probability generating functions (PGFs). For polynomial weight functions, the test statistic admits an explicit closed-form representation; in practice, it is computed efficiently via numerical quadrature. The null distribution of the statistic is approximated via parametric bootstrap. We establish theoretical properties of the proposed procedure, including consistency against fixed alternatives and the validity of the bootstrap approximation. Monte Carlo simulations with sample sizes n ∈ {50,100,150,200,500} for size evaluation and n = 250 for power comparisons, and weight exponents a ∈ {0,1,2}, show that the empirical size is well controlled at both the 5% and 10% nominal levels, and that the test exhibits competitive power against Poisson, Negative Binomial, COM-Poisson, and Zero-Inflated Poisson alternatives. Areal data application to five overdispersed count datasets further illustrates the practical utility of the method.

Keywords:

goodness-of-fit test

;

probability generating function

;

Poisson-Three-Parameter Lindley distribution

;

Cramér-von Mises statistic

;

discrete distributions

;

parametric bootstrap

;

overdispersed count data

;

mixture distribution

;

empirical process

Subject:

Computer Science and Mathematics - Probability and Statistics

MSC: 62F03; 62E20; 62P99

1. Introduction

Goodness-of-fit (GOF) testing plays a fundamental role in statistical modeling, particularly in the analysis of count data, where the selection of an appropriate discrete distribution is essential for reliable statistical inference. Classical goodness-of-fit procedures for discrete models, most notably Pearson’s chi-square test, are known to exhibit several well-documented shortcomings, including strong sensitivity to cell grouping, unreliable behavior in small and moderate samples, and additional difficulties introduced by parameter estimation; see Fisher [1], Cochran [2], Moore [3], Read and Cressie [4], and Cressie [5]. These limitations become more pronounced when dealing with flexible count models designed to accommodate overdispersion or unobserved heterogeneity.

Over the past decades, several authors have proposed goodness-of-fit tests based on probability generating functions (PGFs) for discrete distributions. Foundational contributions are due to Rueda, O’Reilly and Pérez-Abreu [6], who developed Cramér-von Mises type tests for the Poisson distribution based on the empirical PGF, and to Henze [7,8], who studied empirical-distribution-function tests and generating function based tests for a broad class of discrete models. Subsequent developments include tests for the bivariate and multivariate Poisson distributions [9,10] and extensions using minimum-distance PGF estimators [see, e.g., [11,12,13,14]. PGF-based tests are known to possess favorable theoretical and empirical properties, particularly in terms of robustness to parameter estimation and improved power against relevant alternatives, often outperforming classical chi-square based procedures in finite samples.

Among flexible count models, Poisson mixture distributions constitute a well-established class that naturally accommodates overdispersion and unobserved heterogeneity; see Johnson, Kemp and Kotz [15]. In this context, the Poisson–Lindley distribution, originally introduced by Sankaran [16], and several of its extensions have been proposed as flexible alternatives with applications in insurance, reliability, and biological sciences; see Ghitany et al. [17]. Extended versions involving additional parameters have been introduced to enhance modeling flexibility while preserving analytical tractability; see Mahmoudi and Zakerzadeh [18] and Das et al. [19].

Despite its growing use, no formal goodness-of-fit test specifically tailored to the Poisson–Three-Parameter Lindley (PTPL) distribution is currently available. Existing applications typically rely on information criteria or informal diagnostics tools, which do not provide a rigorous inferential framework. This constitutes a clear methodological gap that the present paper aims to address.

The main contribution of this paper is a new goodness-of-fit test for the PTPL distribution based on the Cramér–von Mises distance between the empirical PGF and the theoretical PGF of the PTPL model. The closed-form expression of the PTPL PGF makes this approach computationally and analytically tractable. Parameter estimation is handled via maximum likelihood, and the null distribution of the test statistic is approximated through a parametric bootstrap procedure.

Theoretical properties, including consistency against fixed alternatives and bootstrap validity, are rigorously established. Extensive Monte Carlo simulations assess the empirical size and power of the test against competing alternatives, including the Poisson, Negative Binomial, COM-Poisson [20], and Zero-Inflated Poisson distributions.

The remainder of the paper is organized as follows. Section 2 introduces notation and preliminaries. Section 3 formally defines the PTPL distribution and establishes its basic probabilistic properties. Section 4 describes parameter estimation and its asymptotic properties. The proposed test statistic is presented in Section 5. Asymptotic results are given in Section 6. The bootstrap implementation is described in Section 7. The Monte Carlo study and real data application appear in Section 8 and Section 9, respectively. Section 10 is dedicated to discussion and Section 11 concludes.

2. Notation and Preliminaries

Throughout the paper, the following notation is used. Convergence almost surely, in probability, and in distribution are denoted by

\overset{a . s .}{\to}

,

\overset{p}{\to}

, and

\overset{d}{\to}

, respectively. The notation

N_{0} = {0, 1, 2, \dots}

is used to denote the set of non-negative integers.

Throughout the paper, a sequence of random variables

X_{1}, \dots, X_{n}

is said to be independent and identically distributed (i.i.d.) if the

X_{i}

are mutually independent and each

X_{i}

has the same distribution. The abbreviation i.i.d. is used systematically in what follows.

For a sequence of random functions

{Z_{n} (t)}

, uniform convergence over a set

T

is understood in the sense that

sup_{t \in T} | Z_{n} (t) - Z (t) | \overset{a . s .}{\to} 0 or \overset{p}{\to} 0,

almost surely or in probability, depending on the context.

Expectations, variances, and probabilities under the true data-generating process are denoted by

E (\cdot)

,

V (\cdot)

, and

P (\cdot)

, respectively. Bootstrap probabilities are denoted by

P^{*} (\cdot)

.

The notation

E_{ϑ} (\cdot)

,

V_{ϑ} (\cdot)

, and

P_{ϑ} (\cdot)

denotes expectation, variance, and probability computed under the distribution

PTPL (ϑ)

. When

ϑ = ϑ_{0}

is the true parameter vector, these are written

E_{ϑ_{0}} (\cdot)

,

V_{ϑ_{0}} (\cdot)

, and

P_{ϑ_{0}} (\cdot)

, respectively; the subscript is omitted when the context is unambiguous.

For a sequence of random variables

{Y_{n}}

,

Y_{n} = o_{p} (1)

denotes convergence in probability to zero as

n \to \infty

, and

Y_{n} = O_{p} (1)

denotes stochastic boundedness, i.e., for every

ε > 0

there exists

M_{ε} < \infty

such that

P (| Y_{n} | > M_{ε}) < ε

for all n. More generally,

Y_{n} = o_{p} (r_{n})

and

Y_{n} = O_{p} (r_{n})

refer to these properties applied to

Y_{n} / r_{n}

for a deterministic sequence

r_{n} > 0

; see, e.g., van der Vaart [21], Chapter 2.

The symbol

P_{n} = n^{- 1} \sum_{j = 1}^{n} δ_{X_{j}}

denotes the empirical measure of the sample

X_{1}, \dots, X_{n}

, where

δ_{x}

is the Dirac measure at x. Similarly,

P_{ϑ}

denotes the probability measure on

N_{0}

induced by

PTPL (ϑ)

.

3. The Poisson–Three-Parameter Lindley Distribution

The Lindley distribution, originally introduced as a one-parameter model (see, e.g., Ghitany et al. [17]), has been widely generalized to increase modeling flexibility. Among these generalizations, Shanker et al. [22] proposed a three-parameter Lindley distribution that incorporates additional shape parameters while preserving a relatively simple functional form.

In parallel, Poisson–Lindley type distributions have attracted considerable attention as flexible alternatives to the classical Poisson model capable of capturing overdispersion and unobserved heterogeneity; see Mahmoudi and Zakerzadeh [18] and Das et al. [19].

Motivated by these developments, we study the following Poisson–Three-Parameter Lindley distribution which arises as a reparameterization combining ideas from the three-parameter Lindley framework of Shanker et al. [22] and the Poisson compounding structure of Das et al. [19].

Proposition 1. (Validity of the PTPL PMF).

Let

Θ = {(θ, α, β) \in R^{3} : θ > 0, β > 0, α \geq - β / (1 + θ)}

. For any

(θ, α, β) \in Θ

, the function

p (x; θ, α, β) = \frac{θ^{2}}{(α θ + β) {(1 + θ)}^{x + 1}} [α + \frac{β (x + 1)}{1 + θ}], x = 0, 1, 2, \dots,

(1)

defines a proper probability mass function on

N_{0}

.

That is,

p (x; θ, α, β) \geq 0

for all

x \in N_{0}

, and

\sum_{x = 0}^{\infty} p (x; θ, α, β) = 1 .

Proof.

Non-negativity. For

(θ, α, β) \in Θ

, we have

θ^{2} > 0

,

(α θ + β) > 0

(since

α \geq - β / (1 + θ)

implies

α θ \geq - β θ / (1 + θ) > - β

),

{(1 + θ)}^{x + 1} > 0

, and

α + β (x + 1) / (1 + θ) \geq - β / (1 + θ) + β (x + 1) / (1 + θ) = β x / (1 + θ) \geq 0 for all x \geq 0 .

Hence

p (x; θ, α, β) \geq 0

.

Normalization. Set

r = 1 / (1 + θ) \in (0, 1)

.

Using the standard identities

\sum_{x = 0}^{\infty} r^{x + 1} = r / (1 - r)

and

\sum_{x = 0}^{\infty} (x + 1) r^{x + 1} = r / {(1 - r)}^{2}

, (see, e.g., [23]), we obtain

\sum_{x = 0}^{\infty} p (x; θ, α, β) = \frac{θ^{2}}{α θ + β} [α \cdot \frac{r}{1 - r} + \frac{β}{1 + θ} \cdot \frac{r}{{(1 - r)}^{2}}] .

Substituting

r = 1 / (1 + θ)

so

1 - r = θ / (1 + θ)

, gives

= \frac{θ^{2}}{α θ + β} [\frac{α}{θ} + \frac{β}{θ^{2}}] = \frac{θ^{2}}{α θ + β} \cdot \frac{α θ + β}{θ^{2}} = 1 .

□

A discrete random variable X following distribution (1) is denoted

X \sim PTPL (θ, α, β)

.

Remark 1. (Special cases).

When

α = 1

and

β = 0^{+}

(in the limit), the PTPL reduces to the Poisson–Lindley distribution of Sankaran [16]. When

α = β

, additional simplifications arise. These connections motivate the use of PTPL as a flexible encompassing model.

Probability generating function.

The PGF of

X \sim PTPL (θ, α, β)

,

G (t) = E (t^{X})

, is given by

G (t; θ, α, β) = \frac{θ^{2} [β + α (1 + θ - t)]}{(α θ + β) {(1 + θ - t)}^{2}}, t \in [0, 1] .

(2)

This expression follows by summing

\sum_{x = 0}^{\infty} t^{x} p (x)

using the same geometric series identities employed in Proposition 1. The PGF is well defined and finite on

[0, 1]

since

1 + θ - t \geq θ > 0

for all

t \leq 1

and

θ > 0

. The closed-form expression (2) is the key analytical feature that makes the PTPL distribution particularly amenable to PGF-based goodness-of-fit testing.

Remark 2. (Overdispersion).

A direct calculation using the PGF (2) yields

\begin{matrix} E (X) & = G^{'} (1; θ, α, β) = \frac{α θ + 2 β}{θ (α θ + β)}, \\ V (X) & = G^{''} (1; θ, α, β) + G^{'} (1; θ, α, β) - {[G^{'} (1; θ, α, β)]}^{2} \\ = \frac{α θ + 2 β}{θ (α θ + β)} + \frac{2 β (α θ + 3 β)}{θ^{2} (α θ + β)} - {[\frac{α θ + 2 β}{θ (α θ + β)}]}^{2} . \end{matrix}

For

β > 0

and

(θ, α, β) \in Θ

, one can verify that

V (X) - E (X) = \frac{2 β (α θ + 3 β)}{θ^{2} (α θ + β)} - {[\frac{α θ + 2 β}{θ (α θ + β)}]}^{2} > 0,

confirming that the PTPL is always overdispersed relative to the Poisson model throughout its parameter space Θ.

Figure 1 displays the probability mass function of the PTPL distribution for nine representative parameter configurations, organized by increasing level of overdispersion. The dispersion index

DI = V (X) / E (X)

ranges from approximately

1.5

(mild overdispersion, top row) to values exceeding 40 (pronounced right skewness, bottom row). The figure illustrates the considerable shape flexibility of the PTPL family: depending on the parameter values, the distribution can resemble a near-Poisson shape, exhibit moderate right skewness with a visible mode away from zero, or display a steeply decreasing profile concentrated near the origin. This breadth of shapes, combined with the closed-form expressions for the PMF and PGF derived above, underscores both the practical relevance of the PTPL model for overdispersed count data and the importance of having a formal goodness-of-fit test that is specifically calibrated to this family.

4. Parameter Estimation for the PTPL Distribution

Let

X_{1}, \dots, X_{n}

be a random sample from a Poisson–Three-Parameter Lindley distribution with parameter vector

ϑ = {(θ, α, β)}^{⊤}

. In this section, we describe two standard estimation procedures for

ϑ

, namely the method of moments and maximum likelihood estimation, and discuss their asymptotic properties. These results provide the theoretical justification for the plug-in approach adopted in the proposed goodness-of-fit test.

4.1. Method of Moments

Let

μ_{r} = E (X^{r})

,

r = 1, 2, 3

, denote the theoretical moments of the PTPL distribution, which are smooth functions of the parameter vector

ϑ

. Let

{\hat{μ}}_{r} = n^{- 1} \sum_{i = 1}^{n} X_{i}^{r}

be the corresponding sample moments. The method of moments estimator

{\hat{ϑ}}_{n}^{MM}

is defined as a solution of the system

{\hat{μ}}_{r} = μ_{r} (ϑ), r = 1, 2, 3,

provided that the solution exists and is unique.

Such estimators are widely used for count models due to their simplicity and computational stability; see, for example, Casella and Berger [24].

Proposition 2

(Consistency of the Method of Moments Estimator). Assume that the mapping

ϑ \mapsto (μ_{1}, μ_{2}, μ_{3})

is injective (i.e., the PTPL distribution is identifiable from its first three moments; see Teicher [25] for general identifiability conditions for Poisson mixtures) and continuous on Θ. Then,

{\hat{ϑ}}_{n}^{MM} \overset{a . s .}{\to} ϑ_{0}, n \to \infty .

Proof.

By the strong law of large numbers [23],

{\hat{μ}}_{r} \to μ_{r} (ϑ_{0})

almost surely for

r = 1, 2, 3

. Consistency then follows from the assumed continuity and injectivity of the moment mapping by a standard argument; see van der Vaart [21], Theorem 5.9. □

Due to the numerical instability and absence of closed-form solutions for the moment equations, the method of moments estimator is used exclusively to provide suitable initial values for the maximum likelihood optimization. Its asymptotic properties are therefore stated for completeness but are not the primary focus of the estimation section.

4.2. Maximum Likelihood Estimation

Given the numerical limitations of the method of moments discussed in the previous section, parameter estimation for the PTPL distribution is primarily conducted via maximum likelihood. Let

(p (x; ϑ)

denote the PTPL probability mass function. The log-likelihood function based on a random sample

{X_{1}, \dots, X_{n}}

is given by

ℓ_{n} (ϑ) = \sum_{i = 1}^{n} log p (X_{i}; ϑ),

and the maximum likelihood estimator (MLE), denoted by

{\hat{ϑ}}_{n}^{ML}

, is defined as the maximizer of

ℓ_{n} (ϑ)

over the parameter space

Θ

, provided that the solution exists and is unique.

Proposition 3

(Consistency of the MLE). Assume that:

(i): The PTPL model is correctly specified, i.e., the true distribution of X belongs to ${p (\cdot; ϑ) : ϑ \in Θ}$ .
(ii): Identifiability: $ϑ \neq ϑ^{'}$ implies $p (\cdot; ϑ) \neq p (\cdot; ϑ^{'})$ on $N_{0}$ .
(iii): The parameter space Θ is compact.
(iv): The log-likelihood $log p (x; ϑ)$ is upper semi-continuous in ϑ for each $x \in N_{0}$ .
(v): $E_{ϑ_{0}} [{sup}_{ϑ \in Θ} log p (X; ϑ)] < \infty$ .

Then,

{\hat{ϑ}}_{n}^{ML} \overset{a . s .}{\to} ϑ_{0}, n \to \infty .

Proof.

Under conditions (i)–(v), the result follows from the classical consistency theorem for maximum likelihood estimators of Wald [26]; see also van der Vaart [21], Theorem 5.14, for a modern treatment. □

Proposition 4

(Asymptotic Normality of the MLE). In addition to the assumptions of Proposition 3, assume that:

(vi): The true parameter $ϑ_{0}$ lies in the interior of Θ.
(vii): The log-likelihood $log p (x; ϑ)$ is twice continuously differentiable in ϑ on a neighbourhood of $ϑ_{0}$ , for all $x \in N_{0}$ . (This holds for the PTPL PMF (1) by inspection of its explicit form.)
(viii): The Fisher information matrix $I (ϑ_{0}) = E_{ϑ_{0}} [\nabla_{ϑ} log p (X; ϑ_{0}) \nabla_{ϑ} log p {(X; ϑ_{0})}^{⊤}]$ is finite and non-singular.

Then,

\sqrt{n} ({\hat{ϑ}}_{n}^{ML} - ϑ_{0}) \overset{d}{\to} N (0, I^{- 1} (ϑ_{0})) .

Proof.

Under conditions (i)–(viii), the result is a direct consequence of the asymptotic normality theorem for maximum likelihood estimators; see van der Vaart [21], Theorem 5.41, or Lehmann and Casella [27], Chapter 6. Moreover, under these conditions the MLE satisfies the linear influence-function expansion

\sqrt{n} ({\hat{ϑ}}_{n}^{ML} - ϑ_{0}) = \frac{1}{\sqrt{n}} \sum_{j = 1}^{n} I {(ϑ_{0})}^{- 1} \nabla_{ϑ} log p (X_{j}; ϑ_{0}) + o_{p} (1),

which is the expansion required in Assumption A1 below. □

4.3. Implications for the Goodness-of-Fit Test

Both estimation procedures yield consistent estimators of the PTPL parameters. In particular, the MLE is root-n consistent and asymptotically normal, which justifies its use as the plug-in estimator in the proposed test statistic. These properties also support the validity of the parametric bootstrap procedure, as established in Proposition 8 of Section 6.

In the Monte Carlo study, convergence of the MLE was monitored via the gradient norm and the Hessian condition number. Samples for which the optimization did not converge (convergence code

\neq 0

in the optim routine) were discarded and replaced, and the corresponding discarded fraction is reported in Table 2 for each sample size and parameter configuration. This approach is standard practice in bootstrap-based simulation studies; see, e.g., Henze [8] and Baringhaus and Henze [13].

5. The Proposed Goodness-of-Fit Test

Let

X_{1}, \dots, X_{n}

be a random sample of independent and identically distributed observations taking values in

N_{0}

, drawn from an unknown discrete distribution. Under the null hypothesis, the underlying distribution is assumed to belong to the Poisson–Three-Parameter Lindley family.

The null hypothesis of interest is given by

H_{0} : X \sim PTPL (θ, α, β),

for some unknown parameter vector

ϑ = {(θ, α, β)}^{⊤}

belonging to the admissible parameter space, versus the general alternative

H_{1} : X does not follow a PTPL distribution .

Let

{\hat{ϑ}}_{n}

be a consistent estimator of

ϑ

, obtained, for instance, by maximum likelihood or by the method of moments. The corresponding theoretical probability generating function under the null hypothesis is then given by

G (t; {\hat{ϑ}}_{n}) = \frac{{\hat{θ}}_{n}^{2} ({\hat{β}}_{n} - {\hat{α}}_{n} t + {\hat{α}}_{n} + {\hat{α}}_{n} {\hat{θ}}_{n})}{({\hat{α}}_{n} {\hat{θ}}_{n} + {\hat{β}}_{n}) {(1 + {\hat{θ}}_{n} - t)}^{2}}, t \in [0, 1] .

The empirical probability generating function is defined as

{\hat{G}}_{n} (t) = \frac{1}{n} \sum_{j = 1}^{n} t^{X_{j}}, t \in [0, 1],

which provides a natural nonparametric estimator of the theoretical PGF.

In the goodness-of-fit procedure, the PGF is evaluated on the unit interval

t \in [0, 1]

, which is contained in the domain

t < log (1 + θ)

for all admissible

θ > 0

.

Under the null hypothesis, the empirical PGF

{\hat{G}}_{n} (t)

is expected to be close to the theoretical PGF

G (t; {\hat{ϑ}}_{n})

for all

t \in [0, 1]

. Departures from the null model can therefore be detected by measuring discrepancies between these two functions over a suitable range of t-values.

Motivated by classical Cramér–von Mises type statistics and following the general PGF-based goodness-of-fit framework developed in the literature, we define the test statistic

T_{n} = n \int_{0}^{1} {[{\hat{G}}_{n} (t) - G (t; {\hat{ϑ}}_{n})]}^{2} w (t) d t,

(3)

where

w (\cdot)

is a nonnegative weight function on

[0, 1]

such that

\int_{0}^{1} w (t) d t < \infty .

This condition ensures that the integral in (3) is finite for each fixed sample size n.

Typical choices of w used in the literature include polynomial weights such as

w (t) = t^{a}

,

a > - 1

; see, e.g., Henze [8] and Novoa-Muñoz and Jiménez-Gamero [9].

Large values of the statistic

T_{n}

provide evidence against the null hypothesis, indicating that the PTPL model does not adequately describe the observed data. Since the asymptotic null distribution of

T_{n}

depends on the unknown parameters and involves complex limiting processes, critical values are obtained via a parametric bootstrap procedure, which is described in detail in Section 7.

Remark 3. (Domain of evaluation).

The PGF is evaluated on

[0, 1]

. This is valid because for all

t \leq 1

and

θ > 0

, we have

1 + θ - t \geq θ > 0

, ensuring that the denominator in (2) is bounded away from zero and

G (t; θ, α, β)

is well defined and continuously differentiable on

[0, 1]

. The choice

w (t) = t^{a}

,

a > - 1

, is motivated by the analytical tractability documented in Appendix A and is standard in the PGF-based testing literature [6,8,9]. In the simulation study we consider

a \in {0, 1, 2}

; the case

a = 1

serves as the primary recommendation based on the power comparisons reported in Section 8.

Remark 4. (Recommended weight exponent).

Among the values

a \in {0, 1, 2}

considered in the simulation study,

a = 1

consistently yields the best balance between empirical size control and power across all parameter configurations and sample sizes examined. This is consistent with findings for related PGF-based tests; see Henze [8] and Novoa-Muñoz and Jiménez-Gamero [9]. The choice

a = 1

is therefore recommended as the default in practice.

6. Asymptotic Properties

In this section, we investigate the asymptotic behavior of the goodness-of-fit statistic

T_{n}

defined in (3). In particular, we establish the uniform consistency of the empirical probability generating function, the behavior of the test statistic under the null hypothesis, and the consistency of the proposed test against fixed alternatives. Finally, we justify the use of a parametric bootstrap procedure for critical value approximation.

Proposition 5

(Uniform Consistency of the Empirical PGF). Let

X_{1}, \dots, X_{n}

be i.i.d. random variables supported on

N_{0}

with probability generating function

G (t)

. Then,

sup_{t \in [0, 1]} |{\hat{G}}_{n} (t) - G (t)| \overset{a . s .}{\to} 0, n \to \infty .

Proof.

For each fixed

t \in [0, 1]

, the random variables

t^{X_{1}}, \dots, t^{X_{n}}

are i.i.d., bounded by 1, and satisfy

E (t^{X_{j}}) = G (t)

. Hence, by the strong law of large numbers [23,28],

{\hat{G}}_{n} (t) \to G (t)

almost surely for every

t \in [0, 1]

. Moreover, the class of functions

{x \mapsto t^{x} : t \in [0, 1]}

is Glivenko–Cantelli and uniformly bounded; see van der Vaart and Wellner [29]. Therefore,

sup_{t \in [0, 1]} |{\hat{G}}_{n} (t) - G (t)| \overset{a . s .}{\to} 0,

which completes the proof. □

Proposition 6

(Behavior Under the Null Hypothesis). Assume that

H_{0}

holds, that is, the data are generated from a PTPL distribution with true parameter vector

ϑ_{0}

. Let

{\hat{ϑ}}_{n}

be a consistent estimator of

ϑ_{0}

. Then,

T_{n} \overset{p}{\to} 0, n \to \infty .

Proof.

By Proposition 5,

sup_{t \in [0, 1]} |{\hat{G}}_{n} (t) - G (t; ϑ_{0})| \overset{a . s .}{\to} 0 .

Moreover, consistency of

{\hat{ϑ}}_{n}

and continuity of

ϑ \mapsto G (t; ϑ)

(which is

C^{\infty}

in

ϑ

for each fixed

t \in [0, 1]

, as seen from (2)), imply

sup_{t \in [0, 1]} |G (t; {\hat{ϑ}}_{n}) - G (t; ϑ_{0})| \overset{p}{\to} 0 .

Therefore, by the triangle inequality,

sup_{t \in [0, 1]} | {\hat{G}}_{n} (t) - G (t; {\hat{ϑ}}_{n}) | \leq sup_{t} | {\hat{G}}_{n} (t) - G (t; ϑ_{0}) | + sup_{t} | G (t; {\hat{ϑ}}_{n}) - G (t; ϑ_{0}) | \overset{p}{\to} 0,

since both

{\hat{G}}_{n} (t)

and

G (t; {\hat{ϑ}}_{n})

are bounded by one on

[0, 1]

.

Since the integrand in (3) satisfies

{[{\hat{G}}_{n} (t) - G (t; {\hat{ϑ}}_{n})]}^{2} w (t) \leq 4 w (t)

for all

t \in [0, 1]

(both PGFs are bounded by 1), and

\int_{0}^{1} 4 w (t) d t < \infty

by assumption, the constant function

h (t) = 4 w (t)

serves as an integrable dominating function independent of n. The dominated convergence theorem [30] then yields

T_{n} \to 0

in probability. Similar arguments are used in the analysis of PGF-based goodness-of-fit tests; see, for example, Rueda, O’Reilly and Pérez-Abreu [6]. □

The following theorem establishes the weak limit of the centered and scaled process underlying

T_{n}

, providing the theoretical foundation for the bootstrap approximation discussed in Proposition 8.

Assumption 1.

The true parameter vector

ϑ_{0} = {(θ_{0}, α_{0}, β_{0})}^{⊤}

belongs to the interior of Θ. The mapping

ϑ \mapsto G (\cdot; ϑ)

is twice continuously Fréchet-differentiable on

L^{2} ([0, 1], w)

, where

L^{2} ([0, 1], w)

denotes the space of square-integrable functions with weight

w (t) = t^{a}

,

a > - 1

. The Fisher information matrix

I (ϑ_{0})

is finite and non-singular (conditions (vi)–(viii) of Proposition 4). The estimator

{\hat{ϑ}}_{n}

satisfies the linear expansion

\sqrt{n} ({\hat{ϑ}}_{n} - ϑ_{0}) = \frac{1}{\sqrt{n}} \sum_{j = 1}^{n} ℓ_{ϑ_{0}} (X_{j}) + o_{p} (1),

where

ℓ_{ϑ_{0}} (x) = I {(ϑ_{0})}^{- 1} \nabla_{ϑ} log p (x; ϑ_{0})

is the influence function of the MLE. Under the conditions of Proposition 4, this expansion holds automatically for the MLE; see van der Vaart [21], Theorem 5.41.

Theorem 1. (Weak limit of the standardized process under

H_{0}

).

Under

H_{0}

and Assumption 1, the process

W_{n} (t) = \sqrt{n} [{\hat{G}}_{n} (t) - G (t; {\hat{ϑ}}_{n})], t \in [0, 1],

converges weakly in

L^{2} ([0, 1], w)

to a centered Gaussian process

W

with covariance kernel

K (s, t) = G (min (s, t); ϑ_{0}) - G (s; ϑ_{0}) G (t; ϑ_{0}) - \dot{G} {(s; ϑ_{0})}^{⊤} I {(ϑ_{0})}^{- 1} \dot{G} (t; ϑ_{0}),

where

\dot{G} (t; ϑ_{0}) = \nabla_{ϑ} G (t; ϑ_{0})

denotes the gradient of the theoretical PGF with respect to the parameter vector.

Consequently,

T_{n} = n \int_{0}^{1} {[{\hat{G}}_{n} (t) - G (t; {\hat{ϑ}}_{n})]}^{2} w (t) d t \overset{d}{⟶} \int_{0}^{1} W {(t)}^{2} w (t) d t = \sum_{j = 1}^{\infty} λ_{j} Z_{j}^{2},

where

{Z_{j}}

are independent standard normal random variables and

{λ_{j}}

(with

\sum_{j = 1}^{\infty} λ_{j} < \infty

) are the eigenvalues of the integral operator associated with

K (\cdot, \cdot)

in

L^{2} ([0, 1], w)

.

Proof.

Sketch.

Step 1: Donsker property. The function class

F = {x \mapsto t^{x} : t \in [0, 1]}

is P-Donsker for any distribution P on

N_{0}

: since

| t^{x} | \leq 1

for all

x \in N_{0}

and

t \in [0, 1]

, the class is uniformly bounded, and its metric entropy satisfies

{log N (ε, F, ∥ \cdot ∥}_{L^{2} (P)}) = O (ε^{- 1})

; see van der Vaart and Wellner [29], Chapter 2.5. Therefore, by the Donsker theorem [29], the empirical PGF process

\sqrt{n} [{\hat{G}}_{n} (t) - G (t; ϑ_{0})]

converges weakly in

ℓ^{\infty} ([0, 1])

to a centred Gaussian process

G

with covariance

Cov (G (s), G (t)) = G (min (s, t); ϑ_{0}) - G (s; ϑ_{0}) G (t; ϑ_{0})

.

Step 2: Linearisation of the plug-in correction. By a first-order Taylor expansion of

G (t; {\hat{ϑ}}_{n})

around

ϑ_{0}

, and Assumption A1,

\sqrt{n} [G (t; {\hat{ϑ}}_{n}) - G (t; ϑ_{0})] = \dot{G} {(t; ϑ_{0})}^{⊤} \sqrt{n} ({\hat{ϑ}}_{n} - ϑ_{0}) + o_{p} (1),

uniformly in

t \in [0, 1]

, where

\dot{G} (t; ϑ_{0}) = \nabla_{ϑ} G (t; ϑ_{0})

.

Step 3: Combination. The combination of Steps 1 and 2 via the continuous mapping theorem yields

W_{n} \Rightarrow W

in

L^{2} ([0, 1], w)

, where

W

is the centred Gaussian process with covariance kernel K stated above.

Step 4: Spectral representation. The integral operator

K

with kernel

K (\cdot, \cdot)

on

L^{2} ([0, 1], w)

is compact, self-adjoint, and positive semidefinite: compactness follows from square-integrability of K on

{[0, 1]}^{2}

, self-adjointness from symmetry

K (s, t) = K (t, s)

, and positive semidefiniteness from K being a covariance kernel. Its trace satisfies

\sum_{j = 1}^{\infty} λ_{j} = \int_{0}^{1} K (t, t) w (t) d t \leq \int_{0}^{1} w (t) d t < \infty

, so the eigenvalue series converges. By the spectral theorem for compact self-adjoint operators and Mercer’s theorem [29,30], the Karhunen–Loève decomposition gives

\int_{0}^{1} W {(t)}^{2} w (t) d t \overset{d}{=} \sum_{j = 1}^{\infty} λ_{j} Z_{j}^{2}

, where

{Z_{j}}

are i.i.d. standard normal and

{λ_{j}}

are the eigenvalues of

K

. See Rueda, O’Reilly and Pérez-Abreu [6] and Novoa-Muñoz and Jiménez-Gamero [10] for analogous derivations in related settings. □

Remark 5.

The limiting distribution in Theorem 1 depends on the unknown parameter

ϑ_{0}

through the eigenvalues

{λ_{j}}

, so critical values cannot be tabulated analytically. This is the primary motivation for using the parametric bootstrap, whose consistency is established in Proposition 8.

Proposition 7

(Consistency Against Fixed Alternatives). Suppose the true distribution of X does not belong to the PTPL family and has probability generating function

G_{true} (t)

. Assume that:

(i): $\int_{0}^{1} {|G_{true} (t) - G (t; ϑ)|}^{2} w (t) d t > 0$ for all $ϑ \in Θ$ .
(ii): The parameter space Θ is compact and $ϑ \mapsto G (t; ϑ)$ is continuous for each $t \in [0, 1]$ .

Then,

T_{n} \overset{a . s .}{\to} \infty, n \to \infty,

and the proposed test is consistent against fixed alternatives.

Proof.

By Proposition 5,

{\hat{G}}_{n} (t) \to G_{true} (t)

uniformly almost surely. Since

Θ

is compact and

ϑ \mapsto \int_{0}^{1} {| G_{true} (t) - G (t; ϑ) |}^{2} w (t) d t

is continuous, its infimum

δ^{*} > 0

is attained at some

ϑ^{*} \in Θ

. Let

{\hat{ϑ}}_{n}

converge to

ϑ^{*}

(by compactness, convergence holds along subsequences, and the argument extends to almost sure convergence of the objective). Therefore,

n^{- 1} T_{n} = \int_{0}^{1} {[{\hat{G}}_{n} (t) - G (t; {\hat{ϑ}}_{n})]}^{2} w (t) d t \overset{a . s .}{\to} δ^{*} > 0,

and hence

T_{n} = n \cdot n^{- 1} T_{n} \overset{a . s .}{\to} \infty

. Similar consistency arguments can be found in the framework of empirical characteristic function based tests; see Baringhaus and Henze [13]. □

Remark 6.

The preceding propositions characterize the asymptotic behavior of the proposed test statistic under the null hypothesis and under fixed alternatives. The arguments rely on standard tools from empirical process theory, including uniform laws of large numbers, the continuous mapping theorem, and dominated convergence arguments applied to functionals of the empirical probability generating function. Similar quadratic-type goodness-of-fit statistics based on generating functions have been previously studied in related contexts; see, for example, Rueda, O’Reilly and Pérez-Abreu [6] and Henze [8] for early developments, as well as Baringhaus and Henze [13,31] for closely related approaches based on empirical characteristic and moment generating functions.

Proposition 8

(Bootstrap Validity). Under

H_{0}

and Assumption 1, the parametric bootstrap consistently approximates the null distribution of

T_{n}

. Specifically, if

T_{n}^{*}

denotes the bootstrap counterpart of

T_{n}

computed from a resample

X_{1}^{*}, \dots, X_{n}^{*} \overset{iid}{\sim} PTPL ({\hat{ϑ}}_{n})

, then

sup_{x} |P^{*} (T_{n}^{*} \leq x) - P (T_{n} \leq x)| \overset{p}{⟶} 0, n \to \infty .

Proof.

The statistic

T_{n}

can be written as

T_{n} = φ (\sqrt{n} (P_{n} - P_{ϑ_{0}}), \sqrt{n} ({\hat{ϑ}}_{n} - ϑ_{0}))

, where

φ

is a Hadamard-differentiable functional of the empirical measure and the normalised estimator: it is a composition of the bounded linear evaluation map

μ \mapsto \int μ d w

and a continuous quadratic functional on

ℓ^{\infty} ([0, 1])

, hence Hadamard-differentiable tangentially to

ℓ^{\infty} ([0, 1]) \times R^{3}

; see van der Vaart and Wellner [29], Lemma 3.9.27. Theorem 1 establishes the weak limit of this functional.

Under

H_{0}

, the bootstrap data-generating distribution

P_{{\hat{ϑ}}_{n}}

satisfies

∥ {\hat{ϑ}}_{n} - ϑ_{0} ∥ = O_{p} (n^{- 1 / 2})

by Proposition 4. Hence the bootstrap empirical process

\sqrt{n} (P_{n}^{*} - P_{{\hat{ϑ}}_{n}})

has the same weak limit as

\sqrt{n} (P_{n} - P_{ϑ_{0}})

by the parametric bootstrap theorem; see van der Vaart [21], Theorem 23.9, and Bickel and Freedman [32].

The Hadamard-differentiability of

φ

and the continuous mapping theorem then imply that the bootstrap distribution of

T_{n}^{*}

converges in probability to the same limit as

T_{n}

, yielding the stated result. Analogous arguments appear in Henze [8] and Novoa-Muñoz and Jiménez-Gamero [10]. □

Remark 7.

For polynomial weight functions of the form

w (t) = t^{a}

with

a > - 1

, the test statistic

T_{n}

defined in (3) admits an explicit closed-form representation that does not involve numerical integration. In particular,

T_{n}

can be expressed as a finite sum depending on the sample observations and on the model probabilities, together with absolutely convergent series involving the PTPL probability mass function evaluated at the estimated parameter vector.

Although this closed-form expression is theoretically appealing and useful for analytical purposes, its direct implementation requires the evaluation of double sums of order

O (n^{2})

as well as infinite series that must be truncated in practice. As a consequence, the resulting computational cost becomes prohibitive in Monte Carlo and bootstrap studies. For this reason, and in line with previous contributions on PGF-based goodness-of-fit tests (see, e.g., Henze [8] and Rueda, O’Reilly and Pérez-Abreu [6]), the statistic is computed numerically via quadrature of (3), which proves to be substantially more efficient while yielding stable and accurate results.

7. Bootstrap Implementation

As established in Section 6, the asymptotic null distribution of the test statistic

T_{n}

depends on unknown model parameters and involves a nonstandard limiting process. As a consequence, analytical critical values are not readily available. To overcome this difficulty, we adopt a parametric bootstrap procedure, which has proven effective for goodness-of-fit tests based on generating functions; see, for example, Henze [8] and Bickel and Freedman [32].

The bootstrap procedure is implemented under the null hypothesis

H_{0}

, using the fitted Poisson–Three-Parameter Lindley model. Let

{\hat{ϑ}}_{n} = {({\hat{θ}}_{n}, {\hat{α}}_{n}, {\hat{β}}_{n})}^{⊤}

denote a consistent estimator of the parameter vector, obtained for instance by maximum likelihood. The bootstrap algorithm proceeds as follows.

7.1. Maximum Likelihood Estimation for the PTPL Model

Let

X_{1}, \dots, X_{n}

be a random sample from a Poisson–Three-Parameter Lindley distribution with parameter vector

ϑ = {(θ, α, β)}^{⊤}

. The likelihood function is given by

L_{n} (ϑ) = \prod_{i = 1}^{n} p (X_{i}; ϑ),

where

p (\cdot; ϑ)

is defined in (1). The corresponding log-likelihood function is

ℓ_{n} (ϑ) = \sum_{i = 1}^{n} log p (X_{i}; ϑ) .

The maximum likelihood estimator

{\hat{ϑ}}_{n} = {({\hat{θ}}_{n}, {\hat{α}}_{n}, {\hat{β}}_{n})}^{⊤}

is defined as any solution of

{\hat{ϑ}}_{n} = arg max_{ϑ \in Θ} ℓ_{n} (ϑ),

provided that the maximizer exists and is unique.

Due to the nonlinear structure of the likelihood equations, closed-form expressions are not available, and numerical optimization methods are required. In practice, the estimator is obtained by maximizing

ℓ_{n} (ϑ)

using constrained numerical optimization routines.

Remark 8.

Two complete algorithmic implementations of the parametric bootstrap procedure described in this section are provided in Appendix A. Algorithm 1 details the Monte Carlo evaluation of the empiricalsize of the proposed GOF-PGF test: data are generated under the null PTPL model, and the outputs are the empirical rejection rates

{err}_{90}

and

{err}_{95}

at the

10 %

and

5 %

nominal levels, respectively. Algorithm 2 details the corresponding evaluation of the empirical power : data are generated from an alternative distribution

D (ψ)

under

H_{1}

, and the output is the empirical rejection rate

{\hat{β}}_{α_{0}}

at a prescribed nominal level

α_{0}

. Both algorithms share the same inner bootstrap loop; see Remark A1 in Appendix A.

8. Monte Carlo Study

All simulations were carried out using the R programming language [33]. Given the large number of bootstrap replications required (

M \times B

evaluations of

T_{n}

per experiment), computation time was reduced substantially by exploiting parallel computing. Specifically, the outer Monte Carlo loop was distributed across available processor cores using the parRapply parallelisation routine, which applies a function in parallel over the rows of a matrix via R’s built-in parallel package. This strategy, described in Novoa-Muñoz [34], proved particularly effective in the bootstrap-intensive inner loop of Algorithms 1 and 2, reducing wall-clock time by a factor proportional to the number of cores employed.

8.1. Empirical Size

Table 2 reports the empirical rejection rates

{\hat{α}}_{0.05}

and

{\hat{α}}_{0.10}

under

H_{0}

, based on

M = 1000

Monte Carlo replications and

B = 500

bootstrap replications, for

n \in {50, 100, 150, 200, 500}

,

a \in {0, 1, 2}

, and the six parameter configurations listed in Table 1.

Table 1 summarizes the six PTPL parameter configurations considered in the simulation study, together with the corresponding theoretical mean, variance, and dispersion index.

The results indicate that the bootstrap approximation is accurate even for the smaller sample sizes. In all cases, the empirical size is close to the nominal level, confirming the validity of the parametric bootstrap procedure established in Proposition 8. The configuration

T_{6}

(

θ = 3.0

,

α = 0.8

,

β = 0.5

) exhibits slightly elevated rejection rates for

n = 50

(up to

0.060

at the

5 %

nominal level), which is consistent with the known sensitivity of bootstrap-based tests to parameter estimation uncertainty in small samples; this inflation diminishes monotonically as n increases, confirming asymptotic size control.

8.2. Empirical Power

To assess the discriminatory ability of the proposed GOF-PGF test, we evaluate its empirical power against four families of count distributions that are commonly used as alternatives in the overdispersed count data literature. All experiments use the same settings as the size study: weight exponent

a = 1

, sample size

n = 250

,

M = 1, 000

Monte Carlo replications,

B = 500

bootstrap replications per sample, and nominal level

α_{0} = 0.05

. The empirical power at level

α_{0}

is estimated as

{\hat{β}}_{0.05} = \frac{1}{M} \sum_{m = 1}^{M} 1 \{{\bar{V P}}^{(m)} \leq 0.05\},

(4)

where

{\bar{V P}}^{(m)}

is the bootstrap p-value obtained from the m-th Monte Carlo replicate; see Algorithm 2 in Section 7.

The four alternative distributions considered are:

D1 — Poisson( $λ$ ), a strictly equidispersed model with dispersion index $DI = 1$ for all $λ > 0$ , representing the sharpest possible structural contrast with the PTPL family, which satisfies $DI > 1$ throughout its parameter space.
D2 — Negative Binomial( $r, p$ ), an overdispersed model with $DI = 1 / p$ . We fix $r = 0.5$ and vary $p \in {0.10, 0.15, 0.20, 0.25, 0.30, 0.35}$ , yielding $DI \in {10.00, 6.67, 5.00, 4.00, 3.33, 2.86}$ .
D3 — COM-Poisson( $λ, ν$ ), a highly flexible count model that reduces to the Poisson when $ν = 1$ and is strongly overdispersed when $ν ≪ 1$ [20]. We fix $ν = 0.1$ and vary $λ \in {1.47, 1.48, 1.49, 1.50, 1.51, 1.52}$ . At $ν = 0.1$ , the resulting means range from approximately $51.7$ to $70.4$ with $DI \in [9.07, 9.33]$ , far beyond the PTPL null range of $DI \in [1.37, 3.18]$ .
D4 — Zero-Inflated Poisson ZIP( $λ, π$ ), in which $DI = 1 + λ π \geq 1$ . Six configurations are evaluated, covering $DI \in [1.25, 2.40]$ and zero-inflation proportions $π \in [0.10, 0.70]$ .

The results are reported in Table 3.

D1 — Poisson alternatives.

All six Poisson configurations yield high empirical power, ranging from

0.967

at

λ = 9.75

to

0.999

at

λ = 10.0

, with a strictly monotone increasing pattern. Since the Poisson distribution is always equidispersed (

DI = 1

) while the PTPL null has

DI > 1

, the two families are structurally distinct regardless of

λ

. The high power at

n = 250

confirms that the PGF-based test effectively exploits this difference in dispersion structure, consistent with the theoretical consistency result of Proposition 6.

D2 — Negative Binomial alternatives.

The power decreases as p increases from

0.10

to

0.35

, corresponding to a decrease in

{DI}_{NB} = 1 / p

from

10.00

to

2.86

. This is statistically natural: as

{DI}_{NB}

approaches the PTPL null range

[1.37, 3.18]

, discrimination becomes harder. At

p = 0.30

(

DI = 3.33

) and

p = 0.35

(

DI = 2.86

), power drops to

0.867

and

0.823

, respectively. A minor non-monotonicity at

p = 0.25

(

0.993

) compared to

p = 0.20

(

0.990

) is attributable to Monte Carlo variability and does not affect the overall decreasing trend. Even the lowest value (

0.823

) exceeds the

0.70

threshold.

D3 — COM-Poisson alternatives.

At

ν = 0.1

, the COM-Poisson is strongly overdispersed (

DI \in [9.07, 9.33]

), far from the PTPL null. The power exhibits a unimodal pattern as a function of

λ

, peaking at

λ = 1.50

(

{\hat{β}}_{0.05} = 1.000

) and declining on both sides:

0.876

at

λ = 1.47

and

0.879

at

λ = 1.52

. This non-monotone pattern reflects the joint effect of changing mean and distribution shape: as

λ

increases, the empirical PGF shifts in a non-linear manner relative to the PTPL null, producing a unimodal power curve whose peak corresponds to the configuration where the PGF discrepancy

T_{n}

is maximised. Although the dispersion indices vary only slightly, the associated means range from

51.7

to

70.4

, producing differences in the shape of the empirical PGF and hence in

T_{n}

. For

λ \in {1.49, 1.50, 1.51}

, power exceeds

0.99

.

D4 — ZIP alternatives.

Power ranges from

0.901

to

1.000

. The configuration

ZIP (2.0, 0.70)

, with the highest zero-inflation proportion among the evaluated cases, achieves perfect power. For fixed

λ = 2.5

, power increases monotonically with

π

:

0.901

at

π = 0.10

,

0.934

at

π = 0.25

, and

0.979

at

π = 0.40

, reflecting that larger zero-inflation creates mass concentration at zero that is difficult to reconcile with any PTPL fit.

Overall assessment.

Across all 24 configurations, empirical power ranges from

0.823

to

1.000

with mean

0.960

. Twenty out of 24 configurations (

83.3 %

) yield power exceeding

0.90

, and 17 out of 24 (

70.8 %

) exceed

0.95

. Perfect power (

1.000

) is achieved for

NB (0.5, 0.10)

,

CMP (1.50, 0.10)

, and

ZIP (2.0, 0.70)

. The only two configurations below

0.90

are

NB (0.5, 0.30)

and

NB (0.5, 0.35)

, both with

{DI}_{NB} \in {3.33, 2.86}

overlapping with the upper end of the PTPL null dispersion range—precisely the region where any consistent GOF test faces inherently greater difficulty. These results confirm that the proposed test possesses strong discriminatory power against all four alternative families at

n = 250

.

9. Real Data Application

To assess the practical utility of the proposed goodness-of-fit test, we apply it to five overdispersed count datasets drawn from industrial safety, biology, reliability, and retail commerce. For each dataset, the PTPL model is fitted by maximum likelihood and the GOF-PGF test is performed with weight exponent

a = 1

and

B = 999

parametric bootstrap replications at the

5 %

nominal level. For comparison, the Poisson, Negative Binomial, COM-Poisson [20], and Zero-Inflated Poisson (ZIP) models are also fitted; the resulting log-likelihoods, AIC values, and Pearson chi-square goodness-of-fit statistics are reported in Table 4.

The five datasets are described as follows.

D1 — Industrial accidents [35]. Number of accidents suffered by $n = 645$ female munitions workers during a twelve-week period, originally analyzed by Greenwood and Yule [35] and subsequently used as a standard benchmark for overdispersed count models. The sample mean is $\bar{x} = 0.4512$ , the sample variance is $s^{2} = 0.6300$ , and the empirical dispersion index is $DI = s^{2} / \bar{x} = 1.3964$ , confirming mild overdispersion relative to the Poisson model.
D2 — European corn borer larvae [36]. Number of Pyrausta nubilalis (Hübner) larvae per maize plant recorded by McGuire, Brindley and Bancroft [36] ( $n = 80$ , $\bar{x} = 1.8375$ , $s^{2} = 4.7707$ , $DI = 2.5963$ ). This dataset exhibits moderate overdispersion and has been widely used to evaluate flexible count distributions; see, e.g., Das et al. [19].
D3 — European red mites on apple leaves [37]. Number of Panonychus ulmi (Koch) mites per apple leaf observed by Garman [37] ( $n = 150$ , $\bar{x} = 1.1467$ , $s^{2} = 2.2736$ , $DI = 1.9828$ ).
D4 — Deaths by horse kicks, Prussian army [38]. The classic dataset due to von Bortkiewicz [38], recording the number of soldiers killed by horse kicks per cavalry corps-year ( $n = 280$ corps-year units, $\bar{x} = 0.7000$ , $s^{2} = 0.7627$ , $DI = 1.0896$ ). Although this dataset is only marginally overdispersed, it is included as a near-equidispersed benchmark to examine the calibration of the test under near-Poisson conditions.
D5 — Products purchased per customer, Black Friday [39]. A contemporary retail dataset freely available on Kaggle [39], comprising $550, 069$ transactions recorded at an ABC Pvt Ltd retail store during a Black Friday sales event. The count variable of interest is the number of distinct products purchased per unique customer, derived by grouping the raw transaction file by User_ID and counting the number of distinct Product_ID values per user. This yields $n = 5, 891$ customer-level observations with $\bar{x} = 4.3852$ , $s^{2} = 8.3292$ , and $DI = 1.8994$ , confirming moderate overdispersion. To the best of our knowledge, this is the first application of a PTPL-based model to a Black Friday e-commerce count variable.

The MLE estimates, fit statistics, and test results for all five datasets are summarized in Table 4. The main findings are as follows.

For D1 (industrial accidents), the Poisson model is clearly rejected by the chi-square criterion (

χ^{2} = 59.127

,

p < 0.001

), reflecting the presence of overdispersion. The Negative Binomial, COM-Poisson, ZIP, and PTPL models all provide substantially better fits, with ZIP achieving the lowest AIC (

1162.34

). The proposed GOF-PGF test does not reject the PTPL model (

T_{n}^{obs} = 0.000520

,

p_{PGF} = 0.3013

), confirming that the PTPL distribution provides an adequate fit to this dataset.

For D2 (Pyrausta nubilalis larvae), the Poisson model is again strongly rejected (

χ^{2} = 65.187

,

p < 0.001

). The Negative Binomial achieves the best AIC (

298.10

) and is not rejected by the chi-square test (

p = 0.7243

). The PTPL model, with AIC

= 300.57

, provides a competitive fit and is not rejected by the proposed test (

T_{n}^{obs} = 0.015200

,

p_{PGF} = 0.1341

).

For D3 (European red mites), the COM-Poisson achieves the lowest AIC (

448.80

), closely followed by the Negative Binomial (

448.87

) and the PTPL (

450.77

). The Poisson model is strongly rejected. ZIP performs noticeably worse for this dataset (

AIC = 456.87

,

p_{χ^{2}} = 0.0155

). The proposed test does not reject the PTPL fit (

T_{n}^{obs} = 0.000560

,

p_{PGF} = 0.4785

), further supporting its adequacy as a flexible count model.

For D4 (horse-kick deaths), the dataset is nearly equidispersed (

DI = 1.09

) and all five models provide adequate fits. The Poisson model achieves the best AIC (

630.31

), and none of the competing models offer a meaningful improvement. The GOF-PGF test does not reject PTPL (

T_{n}^{obs} = 0.000040

,

p_{PGF} = 0.6176

). This result is consistent with the theoretical finding that the test maintains correct size under the null hypothesis even in near-equidispersed scenarios.

For D5 (Black Friday retail counts), the PTPL model achieves the best AIC among all five competing models (

27, 391.94

), outperforming the Negative Binomial (

27, 841.93

), COM-Poisson (

28, 009.34

), and Poisson (

29, 237.00

). The ZIP model degenerates to the Poisson in this case (

\hat{π} = 0

), as the dataset contains no structural zeros. The estimated parameters are

\hat{θ} = 0.5218

,

\hat{α} = - 1.1592

, and

\hat{β} = 1.8248

, all satisfying the admissibility condition

α \geq - β / (1 + θ)

(here,

- β / (1 + θ) = - 1.1991

, so

\hat{α} = - 1.1592 > - 1.1991

). Although the large sample size (

n = 5, 891

) leads to significant chi-square values for all models, the PTPL chi-square residuals are the smallest across all alternatives, and a visual inspection of the observed versus expected frequencies confirms a markedly better fit at all count values

x \geq 3

. The GOF-PGF test yields

T_{n}^{obs} = 0.770582

with a bootstrap p-value of

0.0202

(

B = 999

), suggesting that the PTPL model, while the best available among the candidates, is statistically rejected at the

5 %

level for this very large dataset, which is expected given the high sensitivity of any consistent test with

n \approx 6, 000

observations. This result illustrates that, in large samples, the GOF-PGF test retains discriminatory power, and that the choice of model should be informed by both the p-value and the AIC criterion.

Taken together, the results across five diverse datasets confirm that the PTPL distribution provides a flexible and competitive fit for overdispersed count data arising in different applied domains, and that the proposed GOF-PGF test correctly identifies adequate fits (D1–D4) while retaining power to detect departures from the null model in large samples (D5).

Remark 9.(Large-sample interpretation of p-values). For very large samples (

n \approx 6, 000

), any consistent goodness-of-fit test will eventually reject every parametric model, since no finite-parameter family fits arbitrarily many observations exactly. The rejection of the PTPL model for D5 (

p_{PGF} = 0.0202

) should therefore be interpreted in conjunction with the AIC ranking, which confirms the PTPL as the best-fitting model among all candidates. In practice, the choice of model in large samples should be guided primarily by information criteria and subject-matter considerations, with the p-value serving as a diagnostic tool rather than a definitive criterion [40].

10. Discussion

The proposed PGF-based goodness-of-fit test for the PTPL distribution fills a methodological gap identified in the current literature on flexible count models. The test has three notable strengths: (i) it exploits the closed-form PGF of the PTPL distribution, yielding a computationally efficient statistic evaluated by numerical quadrature; (ii) it is theoretically grounded, with consistency against fixed alternatives (Proposition 6 ) and a rigorous weak limit theorem (Theorem 1); and (iii) its null distribution is consistently approximated by the parametric bootstrap (Proposition 8), avoiding the need for analytically intractable critical values.

The simulation study confirms reliable size control across a wide range of sample sizes and parameter configurations. The choice of weight exponent

a = 1

is recommended as a practical default, as it consistently balances type I error control and power.

Future research directions include:

1.: Local power analysis. Deriving the asymptotic power of $T_{n}$ against Pitman sequences $G_{1, n} (t) = G (t; ϑ_{0}) + n^{- 1 / 2} h (t)$ would characterise the rate at which the test detects local alternatives and enable comparison with minimax-optimal procedures.
2.: Multivariate and zero-inflated extensions. The PGF approach extends naturally to multivariate PTPL models and to zero-inflated variants, following the framework of Novoa-Muñoz and Jiménez-Gamero [9].
3.: Model selection. Combining the proposed test with information criteria (AIC, BIC) and minimum-distance PGF estimators [41] may yield more robust model selection procedures for overdispersed count data.
4.: Truncated and censored data. Extensions to zero-truncated or right-censored count data, common in reliability and actuarial applications, constitute a natural avenue for future work.

11. Conclusions

This paper has proposed and studied a new goodness-of-fit test for the Poisson–Three-Parameter Lindley distribution based on a Cramér-von Mises type distance between the empirical and theoretical probability generating functions. The test statistic has a closed-form representation, is consistent against fixed alternatives, and its null distribution can be consistently approximated by the parametric bootstrap. Monte Carlo simulations confirm accurate size control and competitive power against several relevant alternatives. The real data application demonstrates the practical utility of the proposed procedure.

Funding

This paper is funded in part by the Universidad del Bío-Bío Project GI2611708, Dirección de Investigación y Creación Artística (Fondo de Apoyo a la Participación a Eventos Internacionales) and Vicerrectoría Académica.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

The author thanks the editor of this journal.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Algorithmic Descriptions of the Bootstrap Procedure

This appendix provides the pseudocode for the two Monte Carlo procedures used in Section 8. Algorithm 1 evaluates the empirical size of the proposed GOF-PGF test by simulating under the null PTPL model. Algorithm 2 evaluates the empirical power by simulating under each of the four alternative distributions considered in Section 8.2. Both algorithms employ the parametric bootstrap procedure established in Proposition 8 and described in Section 7.

Remark A1. Algorithm 1 and Algorithm 2 share the same inner bootstrap loop (Steps 3–4 in both algorithms). The sole structural difference lies in Step 1: Algorithm 1 draws

X^{(m)}

from the null PTPL model, thereby evaluating the empiricalsize of the test via

{err}_{90}

and

{err}_{95}

; whereas Algorithm 2 draws

X^{(m)}

from the alternative distribution

D (ψ)

under

H_{1}

, evaluating the empirical power via

{\hat{β}}_{α_{0}}

. In both cases, the bootstrap resamples in Step 3 are drawn from

PTPL ({\hat{ϑ}}_{n}^{(m)})

, consistently with the parametric bootstrap framework of Proposition 8.

References

Fisher, R.A. On the Interpretation of χ² from Contingency Tables, and the Calculation of P. Journal of the Royal Statistical Society 1922, 85, 87–94.
Cochran, W.G. The χ² test of goodness of fit. Annals of Mathematical Statistics 1952, 23(3), 315–345.
Moore, D.S. Chi-square tests with random probabilities. Journal of the American Statistical Association 1986, 81, 372–377.
Read, T.R.C.; Cressie, N.A.C. Goodness-of-Fit Statistics for Discrete Multivariate Data. Springer-Verlag: New York, NY, USA, 1988.
Cressie, N. Statistics for Spatial Data. Wiley, 1992.
Rueda, R., O’Reilly, F., Pérez- Abreu, V. (1991). Goodness of fit for the poisson distribution based on the probability generating function. Communications in Statistics - Theory and Methods, 20(10), 3093-3110. [CrossRef]
Henze, N. Empirical-distribution-function goodness-of-fit tests for discrete models. Journal of Multivariate Analysis 1996, 58, 145–168.
Henze, N. Goodness-of-fit tests for discrete distributions based on generating functions. Statistics & Probability Letters 2001, 54, 41–50.
Novoa-Muñoz, F.; Jiménez-Gamero, M.D. A goodness-of-fit test for the multivariate Poisson distribution. SORT 2016, 40, 113-138.
Novoa-Muñoz, F.; Jiménez-Gamero, M.D. Testing for the bivariate Poisson distribution. Metrika 2014, 77, 771-793.
Novoa-Muñoz, F. Goodness-of-fit tests for the bivariate Poisson distribution. Commun. Stat. Simul. Comput. 2019, 50, 1998-2014. [CrossRef]
Nikitin, Ya. Yu. Asymptotic Efficiency of Nonparametric Tests; Cambridge University Press: Cambridge, UK, 1995.
Baringhaus, L.; Henze, N. A consistent test for multivariate normality based on the empirical characteristic function. Journal of Multivariate Analysis 2004, 88, 152–170.
Novoa-Muñoz, F. Tests de Bondad de Ajuste para la Distribución Poisson Bivariante. Editorial Académica Española, 2018.
Johnson, N.L.; Kemp, A.W.; Kotz, S. Univariate Discrete Distributions, 3rd ed.; Wiley: Hoboken, NJ, USA, 2005.
Sankaran, M. The discrete Poisson–Lindley distribution. Biometrics 1970, 26(1), 145–149.
Ghitany, M.E.; Al-Mutairi, D.K.; Nadarajah, S. Zero-truncated Poisson-Lindley distribution and its application. Mathematics and Computers in Simulation 2008, 79, 279–287.
Mahmoudi, E.; Zakerzadeh, H. Generalized Poisson-Lindley distribution. Communications in Statistics - Theory and Methods 2010, 39(10), 1785–1798. [CrossRef]
Das, K.; Ahmed, I.; Bhattacharjee, S. A new three-parameter Poisson–Lindley distribution for modelling over-dispersed count data. International Journal of Applied Engineering Research 2018, 13, 16468–16477.
Shmueli, G.; Minka, T.P.; Kadane, J.B.; Borle, S.; Boatwright, P. A useful distribution for fitting discrete data: revival of the COM-Poisson distribution. J. R. Stat. Soc. Ser. C. 2005, 54(1), 127-142. [CrossRef]
van der Vaart, A.W. Asymptotic Statistics; Cambridge University Press: Cambridge, UK, 1998.
Shanker, R.; Shukla, K.; Shanker, R.; Tekie, A.L. A three-parameter Lindley distribution. American Journal of Mathematics and Statistics 2017, 7, 15–26.
Billingsley, P. Probability and Measure, 3rd ed.; Wiley: New York, NY, USA, 1995.
Casella, G.; Berger, R.L. Statistical Inference, 2nd ed.; Duxbury Press: Pacific Grove, CA, USA, 2002.
Teicher, H. Identifiability of mixtures. Annals of Mathematical Statistics 1961, 32(1), 244–248.
Wald, A. Note on the consistency of the maximum likelihood estimate. Annals of Mathematical Statistics 1949, 20, 595–601.
Lehmann, E.L.; Casella, G. Theory of Point Estimation, 2nd ed.; Springer: New York, NY, USA, 1998.
Chow, Y.S.; Teicher, H. Probability Theory: Independence, Interchangeability, Martingales; Springer: New York, NY, USA, 1997.
van der Vaart, A.W.; Wellner, J.A. Weak Convergence and Empirical Processes; Springer: New York, NY, USA, 1996.
Royden, H.L.; Fitzpatrick, P.M. Real Analysis, 4th ed.; Pearson: Boston, MA, USA, 2010.
Baringhaus, L.; Henze, N. Limit distributions for test statistics based on the empirical moment generating function. Statistics & Probability Letters 2004, 69, 79–88.
Bickel, P.J.; Freedman, D.A. Some asymptotic theory for the bootstrap. Annals of Statistics 1981, 9, 1196–1217.
R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2024. Available online: https://www.R-project.org/ (accessed on 28 January 2026).
Novoa-Muñoz, F. Parallelization strategies for bootstrap-intensive goodness-of-fit simulations in R. [Complete with journal/proceedings details as appropriate], 2024.
Greenwood, M.; Yule, G.U. An inquiry into the nature of frequency distributions representative of multiple happenings with particular reference to the occurrence of multiple attacks of disease or of repeated accidents. J. R. Stat. Soc. 1920, 83(2), 255-279. [CrossRef]
McGuire, J.U; Brindley, T.A.; Bancroft, T.A. The distribution of European corn borer larvae Pyrausta nubilalis (Hüb.) in field corn. Biometrics. 1957, 13(1), 65-78. [CrossRef]
Garman, P. Mite species on apple trees in Connecticut; Bulletin 535. New Haven, CT, USA, 1951.
von Bortkiewicz, L. Das Gesetz der Kleinen Zahlen; Teubner, Leipzig, Germany, 1898.
ABC Pvt Ltd and Analytics Vidhya. Black Friday Sales Dataset. Kaggle, 2019. Available online: https://www.kaggle.com/datasets/mehdidag/black-friday (accessed on 10 May 2026).
Gelman, A.; Carlin, J.B.; Stern, H.S.; Dunson, D.B.; Vehtari, A.; Rubin, D.B. Bayesian Data Analysis, 3rd ed.; CRC Press: Boca Raton, FL, USA, 2013.
Jiménez-Gamero, M.D.; Batsidis, A. Minimum distance estimators for count data based on the probability generating function with applications. Metrika 2017, 80(5), 503-545.

Figure 1. Probability mass functions of the Poisson–Three-Parameter Lindley (PTPL) distribution for nine parameter configurations covering mild (

DI \approx 1.55

–

3.09

, top row), moderate (

DI \approx 3.67

–

10.02

, middle row), and strong (

DI \approx 12.56

–

40.75

, bottom row) overdispersion. The dashed vertical line in each panel marks the mean

E (X)

; the badge in the upper right corner reports the dispersion index

DI = V (X) / E (X)

. Parameter space:

θ > 0

,

β > 0

,

α \geq - β / (1 + θ)

.

Figure 1. Probability mass functions of the Poisson–Three-Parameter Lindley (PTPL) distribution for nine parameter configurations covering mild (

DI \approx 1.55

–

3.09

, top row), moderate (

DI \approx 3.67

–

10.02

, middle row), and strong (

DI \approx 12.56

–

40.75

, bottom row) overdispersion. The dashed vertical line in each panel marks the mean

E (X)

; the badge in the upper right corner reports the dispersion index

DI = V (X) / E (X)

. Parameter space:

θ > 0

,

β > 0

,

α \geq - β / (1 + θ)

.

Table 1. Parameter configurations of the PTPL distribution used in the Monte Carlo study.

E (X)

,

V (X)

, and

DI = V (X) / E (X)

are computed from the exact PMF. Configurations are ordered by increasing dispersion index.

Table 1. Parameter configurations of the PTPL distribution used in the Monte Carlo study.

E (X)

,

V (X)

, and

DI = V (X) / E (X)

are computed from the exact PMF. Configurations are ordered by increasing dispersion index.

Triple	$θ$	$α$	$β$	$E (X)$	$V (X)$	$DI$
$T_{1}$	3.0	0.8	0.5	0.3908	0.5369	1.3739
$T_{2}$	2.0	0.3	0.8	0.7857	1.2398	1.5779
$T_{3}$	1.5	1.5	1.5	0.9333	1.6622	1.7810
$T_{4}$	1.0	0.5	1.0	1.6667	3.5556	2.1333
$T_{5}$	0.8	0.2	3.0	2.4367	5.5577	2.2808
$T_{6}$	0.5	1.0	2.0	3.6000	11.4400	3.1778

Note. All six configurations satisfy the admissibility condition:

α \geq - β / (1 + θ)

and yield

DI > 1

(overdispersion).

Table 2. Empirical size of the proposed GOF-PGF test for the PTPL distribution. Rejection rates

{\hat{α}}_{0.10}

(nominal 10%) and

{\hat{α}}_{0.05}

(nominal 5%) based on

M = 1000

Monte Carlo replications and

B = 500

bootstrap replications per sample. Entries close to the nominal level indicate accurate size control.

Table 2. Empirical size of the proposed GOF-PGF test for the PTPL distribution. Rejection rates

{\hat{α}}_{0.10}

(nominal 10%) and

{\hat{α}}_{0.05}

(nominal 5%) based on

M = 1000

Monte Carlo replications and

B = 500

bootstrap replications per sample. Entries close to the nominal level indicate accurate size control.

Table 3. Empirical power

{\hat{β}}_{0.05}

of the proposed GOF-PGF test against four alternative count distributions. Settings:

n = 250

,

a = 1

,

M = 1, 000

Monte Carlo replications,

B = 500

bootstrap replications, nominal level

α_{0} = 0.05

.

μ

,

σ^{2}

, and

DI = σ^{2} / μ

are exact theoretical moments. Values

{\hat{β}}_{0.05} \geq 0.95

are in bold; configurations with power

< 0.90

are marked †.

Table 3. Empirical power

{\hat{β}}_{0.05}

of the proposed GOF-PGF test against four alternative count distributions. Settings:

n = 250

,

a = 1

,

M = 1, 000

Monte Carlo replications,

B = 500

bootstrap replications, nominal level

α_{0} = 0.05

.

μ

,

σ^{2}

, and

DI = σ^{2} / μ

are exact theoretical moments. Values

{\hat{β}}_{0.05} \geq 0.95

are in bold; configurations with power

< 0.90

are marked †.

Table 4. Comparison of fitted count models for the five real datasets. For each model: log-likelihood (

log \hat{L}

), AIC, Pearson chi-square (

χ^{2}

) with degrees of freedom and associated p-value (

p_{χ^{2}}

). The last row per dataset reports the observed statistic

T_{n}^{obs}

and bootstrap p-value (

p_{PGF}

,

B = 999

) of the proposed GOF-PGF test applied to the PTPL model (

a = 1

). The best AIC per dataset appears in bold.

Table 4. Comparison of fitted count models for the five real datasets. For each model: log-likelihood (

log \hat{L}

), AIC, Pearson chi-square (

χ^{2}

) with degrees of freedom and associated p-value (

p_{χ^{2}}

). The last row per dataset reports the observed statistic

T_{n}^{obs}

and bootstrap p-value (

p_{PGF}

,

B = 999

) of the proposed GOF-PGF test applied to the PTPL model (

a = 1

). The best AIC per dataset appears in bold.

Dataset	Model	$log \hat{L}$	AIC	$χ^{2}$	df	$p_{χ^{2}}$
D1: Industrial accidents Greenwood & Yule [35] $n = 645$ , $\bar{x} = 0.4512$ , $DI = 1.40$	Poisson	$- 598.89$	$1199.78$	$59.127$	2	$< 0.0001$
	Neg. Binomial	$- 580.14$	$1164.29$	$6.097$	2	$0.0474$
	COM-Poisson	$- 580.14$	$1164.27$	$6.341$	2	$0.0420$
	ZIP	$- 579.17$	$1162.34$	$5.321$	2	$0.0699$
	PTPL	$- 580.11$	$1166.21$	$6.343$	1	$0.0118$
	GOF-PGF: $T_{n}^{obs} = 0.000520$ , $p_{PGF} = 0.3013$ [not rejected at 5%]
D2:Pyrausta nubilalis McGuire et al. [36] $n = 80$ , $\bar{x} = 1.8375$ , $DI = 2.60$	Poisson	$- 173.66$	$349.32$	$65.187$	4	$< 0.0001$
	Neg. Binomial	$- 147.05$	$298.10$	$3.647$	5	$0.7243$
	COM-Poisson	$- 147.31$	$298.62$	$4.491$	5	$0.6105$
	ZIP	$- 147.41$	$298.82$	$8.413$	5	$0.1349$
	PTPL	$- 147.29$	$300.57$	$4.553$	4	$0.4728$
	GOF-PGF: $T_{n}^{obs} = 0.015200$ , $p_{PGF} = 0.1341$ [not rejected at 5%]
D3: European red mites Garman [37] $n = 150$ , $\bar{x} = 1.1467$ , $DI = 1.98$	Poisson	$- 242.81$	$487.62$	$47.973$	3	$< 0.0001$
	Neg. Binomial	$- 222.44$	$448.87$	$2.920$	4	$0.7123$
	COM-Poisson	$- 222.40$	$448.80$	$2.925$	4	$0.7116$
	ZIP	$- 226.43$	$456.87$	$10.400$	4	$0.0155$
	PTPL	$- 222.38$	$450.77$	$2.887$	3	$0.5769$
	GOF-PGF: $T_{n}^{obs} = 0.000560$ , $p_{PGF} = 0.4785$ [not rejected at 5%]
D4: Deaths by horse kicks Bortkiewicz [38] $n = 280$ , $\bar{x} = 0.7000$ , $DI = 1.09$	Poisson	$- 314.15$	$630.31$	$1.979$	2	$0.5768$
	Neg. Binomial	$- 313.65$	$631.30$	$0.607$	1	$0.7381$
	COM-Poisson	$- 313.58$	$631.17$	$0.538$	1	$0.7642$
	ZIP	$- 313.59$	$631.18$	$0.672$	1	$0.7146$
	PTPL	$- 313.95$	$633.90$	$0.883$	1	$0.3474$
	GOF-PGF: $T_{n}^{obs} = 0.000040$ , $p_{PGF} = 0.6176$ [not rejected at 5%]
D5: Products per customer Black Friday [39] $n = 5, 891$ , $\bar{x} = 4.3852$ , $DI = 1.90$	Poisson	$- 14617.50$	$29237.00$	$3508.187$	13	$< 0.0001$
	Neg. Binomial	$- 13918.97$	$27841.93$	$495.486$	12	$< 0.0001$
	COM-Poisson	$- 14002.67$	$28009.34$	$621.870$	12	$< 0.0001$
	ZIP	$- 14617.50$	$29239.00$	$3508.187$	13	$< 0.0001$
	PTPL	$- 13692.97$	$27391.94$	$186.605$	11	$< 0.0001$
	GOF-PGF: $T_{n}^{obs} = 0.770582$ , $p_{PGF} = 0.0202$ [rejected at 5%; see Note]

Note. DI

= s^{2} / \bar{x}

(empirical dispersion index). ZIP degenerates to Poisson for D5 (

\hat{π} = 0

). For D5 the large chi-square values reflect the very large sample size (

n = 5, 891

) rather than poor fit; the PTPL residuals are the smallest among all models.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

A Probability Generating Function Based Goodness-of-Fit Test for the Poisson–Three-Parameter Lindley Distribution

Abstract

Keywords:

Subject:

1. Introduction

2. Notation and Preliminaries

3. The Poisson–Three-Parameter Lindley Distribution

Probability generating function.

4. Parameter Estimation for the PTPL Distribution

4.1. Method of Moments

4.2. Maximum Likelihood Estimation

4.3. Implications for the Goodness-of-Fit Test

5. The Proposed Goodness-of-Fit Test

6. Asymptotic Properties

7. Bootstrap Implementation

7.1. Maximum Likelihood Estimation for the PTPL Model

8. Monte Carlo Study

8.1. Empirical Size

8.2. Empirical Power

D1 — Poisson alternatives.

D2 — Negative Binomial alternatives.

D3 — COM-Poisson alternatives.

D4 — ZIP alternatives.

Overall assessment.

9. Real Data Application

10. Discussion

11. Conclusions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Algorithmic Descriptions of the Bootstrap Procedure

References

MDPI Initiatives

Important Links

Subscribe