Goodness-of-fit test for the bivariate Hermite distribution

This paper studies the goodness of fit test for the bivariate Hermite distribution. Specifically, we propose and study a Cram\'er-von Mises-type test based on the empirical probability generation function. The bootstrap can be used to consistently estimate the null distribution of the test statistics. A simulation study investigates the goodness of the bootstrap approach for finite sample sizes.


Introduction
The counting data can appear in different circumstances. In the univariate configuration, the Hermite distribution (HD) is a linear combination of the form Y = X 1 + 2X 2 , where X 1 and X 2 are independent Poisson random variables. The properties that distinguish the HD is to be flexible when it comes to modeling counting data that present a multimodality, along with presenting several zeros, which is called zero-inflation. It also allows modeling data in which the overdispersion is moderate; that is, the variance is greater than the expected value. It was McKendrick in [1] who modeled a phagocytic experiment (bacteria count in leukocytes) through the HD, obtaining a more satisfactory model than with the Poisson distribution. However, in practice, the bivariate count data arise in several different disciplines and bivariate Hermite distribution (BHD) plays an important role, having superinflated data. For example, the accident number on two different periods [2].
Testing the goodness of fit (gof) of observations given with a probabilistic model is a crucial aspect of data analysis. For the univariate case, we have only found a single test of gof, but for data that come from a generalized Hermite distribution (for a review, see Meintanis and Bassiakos in [3]), but not from a HD. On the other hand, we did not find literature on gof tests for BHD.
The purpose of this paper is to propose and study a goodness-of-fit test for the bivariate Hermite Distribution that is consistent.
According to Novoa-Muñoz in [4], the probability generating function (pgf) characterizes the distribution of a random vector and can be estimated consistently by the empirical probability generating function (epgf), the proposed test is a function of the epgf. This statistical test compares the epgf of the data with an estimator of the pgf of the BHD. As it is well known, to establish the rejection region, we need to know the distribution of the statistic test.
As for finite sample sizes the resulting test statistic is of the Cramér-Von Mises type, it was not possible to calculate explicitly the distribution of the statistic under null hypothesis. That is why one uses simulation techniques. Therefore, we decided to use a null approximation of the statistic by using a parametric bootstrap.
Because the properties of the proposed test are asymptotic (see, for example, [5]) and with the purpose of evaluating the behavior of the test for sample of finite size, a simulation study was carried out.
The present work is ordered as follows. In section 2 we present some preliminary results that will serve us in the following chapters, the definition of the BHD with some of its properties is also given. In section 3, the proposed statistic is presented. Section 4 is devoted to showing the bootstrap estimator and its approximation to the null distribution of the statistic. Section 5 is dedicated to presenting the results of a simulation study, power of a hypothesis test and the application to a set of real data.
Before ending this section, we introduce some notation: F A ∧ δ F B denotes a mixture (compounding) distribution where F A represents the original distribution and F B the mixing distribution (i.e., the distribution of δ) [6]; all vectors are row vectors and x ⊤ is the transposed of the row vector x; for any vector x, x k denotes its kth coordinate, and x its Euclidean norm; N 0 = {0, 1, 2, 3, ...}; I{A} denotes the indicator function of the set A; P θ denotes the probability law of the BHD with parameter θ; E θ denotes expectation with respect to the probability function P θ ; P * and E * denote the conditional probability law and expectation, given the data (X 1 , Y 1 ), ..., (X n , Y n ), respectively; all limits in this work are taken as n → ∞; L −→ denotes convergence in distribution; a.s. −→ denotes almost sure convergence; let {C n } be a sequence of random variables or random elements and let ǫ ∈ R, then C n = O P (n −ǫ ) means that n ǫ C n is bounded in probability, C n = o P (n −ǫ ) means that n ǫ C n P −→ 0 and C n = o(n −ǫ ) means that n ǫ C n a.s.

Preliminaries
Several definitions for the BHD have been given (see, for example, Kocherlakota and Kocherlakota in [7]). In this paper we will work with the following one, which has received more attention in the statistical literature (see, for example, Papageorgiou et al. in [8]; Kemp et al. in [9]). Let X = (X 1 , X 2 ) has the bivariate Poisson distribution with the parameters δλ 1 , δλ 2 and δλ 3 (for more details of this distribution, see for example, Johnson et al. in [10]), then X ∧ δ N(µ, σ 2 ) has the BHD. Kocherlakota in [11] got its pgf which is given by From the pgf of the BHD, Kocherlakota and Kocherlakota [7] obtained the probability mass function of the BHD, which is given by where M(x) is the moment-generating function of the normal distribution, P r (x) is a polynomial of degree r in x, γ = −(λ 1 + λ 2 + λ 3 ) and ξ = λ 3 λ 1 λ 2 .
Remark 1. If λ 3 = 0 then the probability function is reduced to Remark 2. If X is a random vector that is bivariate Hermite distributed with parameter θ will be denoted X ∼ BH(θ), where θ ∈ Θ, and the parameter space is Let X 1 = (X 11 , X 12 ), X 2 = (X 21 , X 22 ), . . . , X n = (X n1 , X n2 ) be independent and identically distributed (iid) random vectors defined on a probability space (Ω, A, P ) and taking values in 2 denote the epgf of X 1 , X 2 , . . . , X n for some appropriate W ⊆ R 2 .
In the next section we will develop our statistician and for this the result given below will be fundamental, whose proof was presented in [5]. Proposition 1. Let X 1 , . . . , X n be iid from a random vector X = (X 1 ,

The test statistic and its asymptotic null distribution
Let X 1 = (X 11 , X 12 ), X 2 = (X 21 , X 22 ), . . . , X n = (X n1 , X n2 ) be iid from a random vector X = (X 1 , X 2 ) ∈ N 2 0 . Based on the sample X 1 , X 2 , . . . , X n , the objective is to test the hypothesis H 0 : (X 1 , X 2 ) ∼ BH(θ), for some θ ∈ Θ, against the alternative With this purpose, we will recourse to some of the properties of the pgf that allow us to propose the following statistical test.
According to Proposition 1, a consistent estimator of the pgf is the epgf. If H 0 is true andθ n is a consistent estimator of θ, then v(t;θ n ) consistently estimates the population pgf. Since the distribution of X = (X 1 , X 2 ) is uniquely determined by its pgf, v(t), t = (t 1 , t 2 ) ∈ [0, 1] 2 , a reasonable test for testing H 0 should reject the null hypothesis for large values of V n,w (θ n ) defined by where The assumption (3) on w ensures that the double integral in (2) is finite for each fixed n. Now, to determine what are large values of V n,w (θ n ), we must calculate its null distribution, or at least an approximation to it. Since the null distribution of V n,w (θ n ) is unknown, we first try to estimate it by means of its asymptotic null distribution. In order to derive it we will assume that the estimatorθ n satisfies the following regularity condition.
The next result gives the asymptotic null distribution of V n,w (θ n ).
where χ 2 11 , χ 2 12 , . . . are independent χ 2 variates with one degree of freedom and the set {λ j } are the non-null eigenvalues of the operator C(θ) defined on the function space {τ : By Taylor expansion of V (X i ;θ n ; t) aroundθ n = θ, is the vector of the first derivatives and Q (2) (x; ϑ; t) is the matrix of the second derivatives of V (x; ϑ; t) with respect to ϑ.
The asymptotic null distribution of V n,w (θ n ) depends on the unknown true value of the parameter θ, therefore, in practice, they do not provide a useful solution to the problem of estimating the null distribution of the respective statistical tests. This could be solved by replacing θ withθ.
But a greater difficulty is to determine the sets {λ j } j≥1 , most of the cases, calculating the eigenvalues of an operator is not a simple task and in our case, we must also obtain the expression h(x, y; θ), which is not easy to find, since it depends on the function ℓ, which usually does not have a simple expression.
Thus, in the next section we consider another way to approximate the null distribution of the statistical test, the parametric bootstrap method.

The bootstrap estimator
An alternative way to estimate the null distribution is through the parametric bootstrap method.
As stated after Assumption 1, Assumption 2 is not restrictive since it is fulfilled by commonly used estimators.
The next theorem shows that the bootstrap distribution of V n,w (θ n ) consistently estimates its null distribution.
Following similar steps to those given in the proof of Theorem 1 it can be seen that V * n,w (θ * n ) = W * n 2 H + o P * (1), where W * n (t) is defined as W n (t) with X i and θ replaced by X * i andθ n , respectively.
To derive the result, first we will check that assumptions (i)(iii) in Theorem 1.1 of Kundu et al. [14] hold.
Observe that Let K n be the covariance kernel of Y * n , which by SLLN satisfies Moreover, let Z be zero-mean Gaussian process on H whose operator of covariance C is characterized by From the central limit theorem in Hilbert spaces (see, for example, van der Vaart and Wellner [15] Setting a kl = Ce k , e l H in the aforementioned Theorem 1.1, this proves that condition (i) holds. To verify condition (ii), by using monotone convergence theorem, Parsevals relation and dominated convergence theorem, we get To prove condition (iii), we first notice that From the above inequality, for each fixed ε > 0, for sufficiently large n. This proves condition (iii). Therefore, Y * n L −→ Z in H, a.s. Now the result follows from the continuous mapping theorem.

From Theorem 2, the test function
or equivalently, the test that rejects H 0 when p * = P * {V * n,w (θ * n ) ≥ V obs } ≤ α, is asymptotically correct in the sense that when H 0 is true, lim P θ (Ψ * V = 1) = α, where v * n,w,α = inf{x : P * (V * n,w (θ * n ) ≥ x) ≤ α} is the α upper percentile of the bootstrap distribution of V n,w (θ n ) and V obs is the observed value of the test statistic.

Numerical results
According to Novoa-Muoz and Jimnez-Gamero in [5], the properties of the statistic V n,w (θ n ) are asymptotic, that is, such properties describe the behavior of the test proposed for large samples. To study the goodness of the bootstrap approach for samples of finite size, a simulation experiment was carried out. In this section we describe this experiment and provide a summary of the results that have been obtained.
All computer calculations made in this paper were carried out through the use of programs written in the R language [16].
To calculate V n,w (θ n ) it is necessary to give an explicit form to the weight function w. Here the following is taken into account w(t; a 1 , a 2 ) = t a 1 1 t a 2 2 .
Observe that the only restrictions that have been imposed on the weight function are that w be positive almost everywhere in [0, 1] 2 and the established in (3). The function w(t; a 1 , a 2 ) given in (9) meets these conditions whenever a i > −1, i = 1, 2. Hence It was not possible to find an explicit form of the statistic V n,w (θ n ), for which, its calculation was used the curvature package of R [16] to calculate it.
The selected values of λ 1 and λ 2 were not greater than 1 since the Hermite distribution is characterized as being zero-inflated.
The above procedure was repeated 1000 times and the fraction of the estimated p−values that were found to be less than or equal to 0.05 and 0.10, which are the estimates type I error probabilities for α = 0.05 and 0.1.
The results obtained are presented in Tables 1-7 for the different pairs (a 1 , a 2 ). In each table, the established order was growing in µ and σ 2 , and for each new µ increasing values in λ 1 , and in each new λ 1 , increasing values for λ 2 . From these results we can conclude that the parametric bootstrap It is seen that the values of a 1 and a 2 of the weight function affects bootstrap estimates of p−values.
From the tables it is clear that the bootstrap p−values are increasingly approaching the nominal value as n increases. These approximations are better when a 1 = a 2 . In particular, when a 1 = a 2 are small (less than 5), then the bootstrap p-values are approached from the left (below) to the nominal value, otherwise it happens when a 1 = a 2 are fairly large values (greater or equal to 5). Table 4 is the one that shows the best results, being the weight function with a 1 = a 2 = 1 which presents the best p−values estimates.
Unfortunately we could not find a closed form for our statistic V n,w (θ n ), so to calculate it we used the curvature package of the software R [16]. This had a serious impact on the computation time since the simulations were increased in their execution time by at least 30%.

Real data set
Now, the proposed test will be applied to a real data set. The data set comprises the number of accidents in two different years presented in [7]. Where X is the accident number of the first period and Y the accident number of the second period. Table 9 shows the real data set.
The p−value obtained from the statistic V n,w (θ n ) of the proposed test, with a 1 = 1 and a 2 = 0 applied to the real values is 0.838, therefore, we decided not to reject the null hypothesis, that is, the data seem to have a