A dispersion test for the modified Borel-Tanner distribution

Dispersion tests based on the second order component of smooth test statistics are related to Fisher’s Index of Dispersion test, used for testing for the Poisson distribution when there are no covariates present. Such tests have been recommended in [1] to test for the Poisson distribution when covariates are present. The modified Borel-Tanner (MBT) distribution seems suited to data with extra zeroes, a monotonic decline in counts and longer tails. Here we recommend a dispersion test for the MBT distribution for both when covariates are absent and when they are present. AMS Subject Classification: 62G; 62F


Introduction
Dispersion tests for count data have long been used in statistical analysis.As the Poisson is the most well-known count data model we begin with discussing a dispersion test for this model.Suppose we wish to see if the counts y1, y2, ... , yn are Poisson distributed with mean : that is, whether or not the random variable Y has probability function f(y, ) = e -  y /y!, y = 0, 1, … .A dispersion test of whether or not this applies is based on the classic Index of Dispersion statistic D = . Then we can link the classic Poisson Index of Dispersion statistic The statistic 2 V is the second component of the smooth test of fit statistic for the Poisson.See [3].
The discussion of the previous paragraph applies to no covariates Poisson tests of fit.If there are covariates to consider then it is shown in [4]  second order component smooth tests of fit for both the no covariates and the with covariate cases might be usefully generalized to a new one-parameter count distribution: the modified Borel-Tanner (MBT) distribution recently proposed in [5].
Section 2 defines the MBT distribution and gives examples where it is a good model for real count data.Section 3 considers the no covariates case and section 4 the with covariates case.Section 5 gives some concluding comments.

The modified Borel-Tanner distribution
The modified Borel-Tanner (MBT) distribution introduced in [5] has probability function The first four cumulants are and We suppose n observations are available.
The first three orthonormal polynomials on the MBT distribution are To demonstrate that the MBT is an important model for real data we observe that data sets well described by the MBT include * the accident counts in [6, p.115], namely 296*0, 74*1, 26*2, 8*3, 4*4, 4*5, 6 and 8;

The no covariates case
As with the Poisson we use the MBT second order orthonormal polynomial to give a dispersion test for the MBT.This uses the test statistic where, as before, Ti = Yi -̂ and in the cumulants r the maximum likelihood estimators are used.This dispersion test has an approximate .We find 2 2 V = 1.08 which is not significant at any of the usual levels of significance (p-value 0.30) and so we might conclude the MBT describes the data well.It is shown in [3, p.237] that the zero inflated Poisson fits the data well if the observation at y = 7 is removed.The one parameter MBT fits the data well without this data removal.The MBT seems suited to data with extra zeroes, a monotonic decline in counts and longer tails.

The with covariates case
Agresti [10, p.123] considers Yi to be the count of the number of male satellite crabs attracted to the ith female crab during the mating season.In the following the width of the shell of the female, X1i, is taken as the covariate for the ith female.There were n = 173 female crabs considered and so 173 (x1i, yi) observations were available.These are listed in the Appendix.Using the GLM command in R and the disptest routine in the countreg library, namely To calculate a MBT regression with fitted values we follow [5] and find 1  and 2  from the nonlinear equations 1 0 These 1  and 2  are MLEs.We solved these equations for the crab data using IMSL routine NEQNF.Put / (1 )

Comment
Although we have referred to 2 2 V as a dispersion test statistic, we observe that this statistic can be large because higher order effects than the dispersion effect can differ from what is expected.However, it could be said that V is a dispersion test statistic in that it compares sample and population dispersions or variances.

Figure 1 .
Figure 1.Mean Count vs Shell Width The second, third and fourth MBT cumulants are as in section 2 with  replaced by i.We find not take into account the overdispersion in the data.The MBT regression, like the negative binomial regression, does take into account the overdispersion.In Figure1we have plotted fitted values for the regression equation at the shell widths shown (closed circles) and means of satellite counts in the range shell width +/-0.5 (open circles).Figure1suggests the linear model for the covariate is reasonable.Agresti[10, p.126]gives a similar figure to our Figure1which compares link functions whereas our figure compares the log link MBT model fit at mid ranges of shell widths with the mean of satellite counts for the same ranges of shell widths.Our figure attempts to relate the MBT regression model to the data.