1. Introduction and Summary
Suppose that we have a estimate
of an unknown parameter
of a statistical model, and that
the multivariate normal on
, with density and distribution
Here we assume that
is
a standard estimate. That is,
as
, and for
, its
rth order cumulants have magnitude
and can be expanded in powers of
. The class of standard estimates includes smooth functions of sample means or empirical distributions, based on one or more random samples, or on samples from a stationary time series. So it has a huge range of potential applications.
For
non-lattice, the density and distribution of
of (
1) can be expanded in powers of
about those of
X of (
1). These are the Edgeworth expansions. To be self-contained,
Section 2 summarises these expansions.
How fast can
increase with
n, for these expansions to hold?
Section 3 shows that they hold if
.
Section 4 gives a theorem that reduces the number of terms needed for 2nd and 3rd order Edgeworth expansions. For example, if , it reduces the number of terms needed for 2nd and 3rd order Edgeworth expansions by 57% and 94%. This % reduction increases with q. If or 5, it reduces the number of terms by 65% or 69% for the 2nd order Edgeworth expansions, and by 98% for the 3rd order Edgeworth expansions.
Section 5 considers the case when for a smooth function of a sample mean from a distribution on with finite moments. When as , it shows the Edgeworth expansions for remain valid if . For example, this holds for fixed if . These may be the 1st CLTs or Edgeworth expansions when, not 1, but 2 parameters are allowed to increase to ∞ with n.
Earlier work done when dimension increases with sample size n, has been mainly for sample means, including some CLTs and a 2nd order Edgeworth-type expansion. [10,11] showed asymptotic normality for M-estimators with regression parameters when is large. [3] gave a CLT for M-estimates in a linear regression model of dimension when Remarkably, (8) of [2] gave a CLT for the sample mean of bounded random vectors that holds for if . (Substitute into their condition to confirm.) (1.4) of [7], appears to allow for if to suffice for a CLT for the sample mean, when are log-concave, quoting [4]. It will be interesting to see if this bound can be extended to a broader class of estimates than sample means, and if the log-concave condition can be removed.
Section 2.1 of the very recent paper [7] considers the 2nd order Edgeworth expansion for the distribution of a standardized sample mean. His Theorem 2.1 gives conditions for this which hold when for any c, or even when and The bounds (1.4) and (2.4) of [7] give a 2nd order Edgeworth expansion for a sample mean, that allows for of magnitude if , a remarkable result.
[8] considered sampling from vectors, and investigated the simultaneous estimation of the marginal distributions for large . [5] considered the asymptotic distributions of the canonical correlations between and with . They derived asymptotic distributions of the canonical correlations when p is fixed, , and , as . It assumes that and have a joint normal distribution. [9] gave a CLT for the sample mean for large . [6] gave a number of results for large dimensions. [1] and [7] considered the validity and accuracy of Edgeworth expansions of for large q when is a standardized sample mean.
2. Multivariate Edgeworth Expansions
Suppose that
is a
standard estimate of
with respect to
n. (
n is typically the sample size.) That is,
as
, where we use
for expected value, and for
and
the
rth order cumulants of
can be expanded as
≈ indicates an asymptotic expansion, and the
cumulant coefficients may depend on
n but are bounded as
. So
the bar replaces each by k. For example,
and
I reserve
for this bar notation to avoid double subscripts. So, (
1) holds with
.
V may depend on
n, but I assume that
is bounded away from 0.
These are Bell polynomials in the cumulant coefficients,
of (
3), defined and given in [14] for
. Their importance lies in their central role in the Edgeworth expansions of
of (
1): see (8) and (14) below.
The
needed for
are given in (19)–(21) of [14]:
where
is the operator
that symmetrizes
over
.
Set
Probability that
A is true. By [15], or [14], for
non-lattice, the density and distribution of
can be expanded as
and
are
the multivariate Hermite polynomial, and
the integrated multivariate Hermite polynomial. By [12],
I use the tensor summation convention: repetition of
in
of (7) and
of (8) implies their implicit summation over their range,
. [14] gave
explicitly for
, and for
when
.
where
sums
over all
N permutations of
giving distinct values. For example,
for
So the repeated
in (10) implies their repeated summation over
. As
x lies in
,
in (9) and (11), stands for
(6) with the
of [14], give the Edgeworth expansions for the density and distribution of
of (
1) to
.
and
each have
terms, but many are duplicates as
is symmetric in
. This is exploited in
Section 4 to greatly reduce the number of terms in (8) and (14) below.
By (7), the density of
relative to its asymptotic value is
for
For measurable
,
This paper focuses on the three Edgeworth expansions, (6) and (12), when
as
.
If
, then for
r odd,
so that
Examples 3 and 4 of [14] gave
for
, and .
The main take-away here is that for
,
of (7), (8) and (12), and
,
where, for example, by (
4),
These asymptotic expansions generally diverge, as normal moments and Hermite polynomials increase very rapidly with their degree.
3. The Case as
Theorem 1.
Let be a non-lattice estimate of , satisfying , and (3). Set . Take . Suppose that as
Then, for of (7), of (8), and of (13),
PROOF Set
.
and
of (8), and
of (14), each have
terms for
. So
of (7),
of (8), and
of (12), each have
terms, where
So
and
each have magnitude
as
So has magnitude . The theorem follows. □
Example 1.
Let be the sample mean from a distribution on with finite cross cumulants . Then , and only the leading coefficient in (3) is non-zero. As in Example 2 of [14], by (4)–(5), the non-zero Edgeworth coefficients needed for the three 4th order Edgeworth expansions of with , are
Substitution gives for For example, as the coefficients of in the 2nd terms in the Edgeworth expansions for , and , are
Example 2.
Suppose that are independent random vectors in , with mean , and that has finite cross-cumulants
Then for
Suppose also, that are bounded in n, and that for , is bounded away from 0, as n increases. Then the Edgeworth coefficients needed for the three 4th order Edgeworth expansions for , are given by Example 3.1.
4. Further Reduction of Terms
Our next theorem gives a way to reduce the number of terms in
of (7),
of (8), and
of (14), from
, to
as
where
As we do not use
for
q in this section, there is no ambiguity with this different use of
. k
and have k summations over , and so have terms. But many are duplicates as is symmetric in . Set , the multinomial coefficient. For example .
Theorem 2.
Let be symmetric in
where the sums are for distinct . This reduces the number of terms in from to where
For example, that is, Theorem 4.1 reduces the number of terms to 28%.
As a check, where , and is the Stirling number of the 2nd kind tabled on p310 of Comtet (1974). For example, counting the number of terms to calculate in above gives .
Taking
and tensor summation is
not used, gives
of (7), or
of (8), or
of (14). For example,
This shows that we can replace
in the derivation of Theorem 3.1 by
. So for
, the number of terms,
of (18) and (19), in
of (7),
of (8), and
of (12), can be reduced to
where
So,
where the values of and are approximate.
For example, if , Theorem 4.1 reduces the number of terms needed for 2nd and 3rd order Edgeworth expansions by 57% and 94%. If or 5, Theorem 4.1 reduces the number of terms to calculate by 65% or 69% for the 2nd order Edgeworth expansions, and by 98% for the 3rd order Edgeworth expansions.
When , this reduction of terms was used in Section 4 of [14].
Example 3.
In Example 3.1, using the expression for in Theorem 4.1,
As , in (21), and in (22) need to be deleted.
5. Functions of a Vector Sample Mean
Let
be the sample mean from a non-lattice distribution on
with mean
, and finite cross cumulants
. Let
be a function with
th component
having finite derivatives at
,
So we expand the bar convention used earlier using
in
as well as
in
as before. Here we have a notation dilemma. We chose
, as this allows us to keep the notation
and
, as used earlier. However, now
not
, so that implicit summation in, say
, as used earlier. So now, implicit summation in
is over
in
, not in
, and this
has
terms, reducible to
using Theorem 4.1.
If I had chosen , then and would have had to be reinterpreted as and , which would likely be confusing.
Let us use
to mean
has magnitude
from summing
over
.
More generally, for
any partition of
, let
denote the sum of
over all
N permutations
of
, giving distinct values. For example,
I now give the cumulant coefficients,
of (
3), for
, and track their magnitude in
p from summing
over
.
Theorem 3.
For (3) holds, where the cumulant coefficients,k the , needed for the 3rd order Edgeworth expansions of order , are given by
PROOF This is a special case of Theorem 2 of [13] with replaced by □
Lemma 1.
For , and
PROOF Use Theorem 5.1 to check that for each of the Edgeworth coefficients given by (
4)–(5),
The dominant term in
is for
. So, (23) holds. □
Theorem 4.
Set Suppose that is non-lattice, that , and that (3) holds.. Suppose that as and that
Then (15)–(17) hold for with of (24).
PROOF For of (23), . Now take of (20). □ For example, (24) holds for fixed if , and for fixed if .
We now apply Theorem 4.1 to the components of Theorem 5.1.
Let us write these as
For example,
as
, and for
, the number of calculations needed for
is reduced by the factor
One can now work out similar results for
of (20, and so for the terms of the Edgeworth expansions,
and
. For example,
and
without implicit summation,
6. Conclusions
Let be a standard estimate of an unknown . That is, is a consistent estimate, and for , its rth order cumulants have magnitude , and can be expanded in powers of . This is a very large class of estimates. It includes functions of sample moments and empirical distributions, samples of independent but not identically distributed random vectors, and samples from stationary series. Then by §2, for fixed q, and non-lattice estimates, Edgeworth-type expansions hold for the density and distribution of , in terms of the Edgeworth coefficients given in [14] for . For their first s terms of the three Edgeworth expansions, give the density and distribution of , and , to as Theorem 3.1 shows that this remains true when , if the remainder, , is replaced by , where . So the three Edgeworth expansions hold when , that is, when .
Theorem 4.1 gives formulas that dramatically reduce the number of terms needed by 2nd and 3rd order Edgeworth-type expansions, that is, for 1st and 2nd order corrections to the CLT.
When , and , Theorem 5.2 shows that the three 4th order Edgeworth expansions hold if . For example, this holds for fixed if .
7. Discussion
Theorem 3.1, showed that the three Edgeworth expansions considered here, hold for standard estimates when . Is this result optimal? It is certainly not optimal for a sample mean. For, as noted in Section 1, Theorem 2.1 of the recent paper [7] gives conditions for a 2nd order Edgeworth expansion for the distribution of a sample mean, when for any c, or even when and
It will be interesting to see if such weak conditions on can be extended to a wide class of estimates, such as standard estimates.
Theorem 5.2 showed that the three 4th order Edgeworth expansions hold for when . It will be interesting to see how much this condition can be weakened. It should not be hard to extend Theorem 5.2 to where is the empirical distribution of arandom sample of size n from a distribution on .
One can also extend this to a function of K independent sample means,
The method employed here should also be able to extend many of the results in the references from a sample mean to a standard estimate.
References
- Chernozhukov, V., Chetverikov, D., and Kato, K. (2013) Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. Ann. Stat., 41 (6), 2786–2819.
- Chernozhukov, V., Chetverikov, D., and Kato, K. (2017). Central limit theorems and bootstrap in high dimensions. Ann. Probab., 45 (4), 2309–2352. arXiv:1412.3661.
- Donoho, D. and Montanari, A. (2016) High dimensional robust m-estimation: Asymptotic variance via approximate message passing. Probability Theory and Related Fields, Springer.
- Fang, X. and Koike, Y. (2021). High-dimensional central limit theorems by Stein’s method. Ann. Appl. Probab. 31, 1660–1686.
- Fujikoshi, Y. and Sakurai, T. (2009) High-dimensional asymptotic expansions for the distributions of canonical correlations. Journal of Multivariate Analysis, 100 (1), 231–242.
- Fujikoshi, Y., Ulyanov, V. V., and Shimizu, R. (2011). Multivariate statistics: High-dimensional and large-sample approximations. John Wiley and Sons.
- Koike, Yuta (2025). High-dimensional bootstrap and asymptotic expansion. arXiv preprint arXiv:2404.05006.
- Kosorok, M. and Ma, S. (2007). Marginal asymptotics for the large p, small n paradigm: With application to microarray data. Ann. Statist., 35, 1456–1486. MR2351093.
- Kuelbs,J. and Vidyashankar, A.N. (2010). Asymptotic inference for high-dimensional data. Annals of Statistics, 38 (2) 836–869.
- Portnoy, S. (1984). Asymptotic behavior of M-estimators of p regression parameters when p2/n is large. I. Consistency. Ann. Statist., 12, 1298–1309. MR0760690.
- Portnoy, S. (1985). Asymptotic behavior of M-estimators of p regression parameters when p2/n is large. II: normal approximation, Ann. Statist., 13, 1403–1417.
- Withers, C.S. (2000) A simple expression for the multivariate Hermite polynomials. Stat. Prob. Lett., 47, 165–169.
- Withers, C.S. (2024) 5th order multivariate Edgeworth expansions for parametric estimates. Mathematics, 2024, 12 (6), 905, Advances in Applied Probability and Statistical Inference. https://www.mdpi.com/2227-7390/12/6/905/pdf. [CrossRef]
- Withers, C.S. (2025) Edgeworth coefficients for standard multivariate estimates. New Perspectives in Mathematical Statistics, 2nd Edition. Axioms. Page 5 has 2 typos. in (19) should be . Delete /12 is S3, 2 lines before (21).
- Withers, C.S. and Nadarajah, S. (2010) Tilted Edgeworth expansions for asymptotically normal vectors. Annals of the Institute of Statistical Mathematics, 62 (6), 1113–1142. [CrossRef]
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).