Robust Estimations from Distribution Structures: II. Central Moments

Tuobang Li

doi:10.20944/preprints202402.0843.v1

Submitted:

13 February 2024

Posted:

15 February 2024

You are already at the latest version

Abstract

In descriptive statistics, U -statistics arise naturally in producing minimum-variance unbiased estimators. In 1984, Serfling considered the distribution formed by evaluating the kernel of the U -statistics and proposed generalized L-statistics which includes Hodges-Lehamnn estimator and Bickel-Lehmann spread as special cases. However, the structures of the kernel distributions remain unclear. In 1954, Hodges and Lehmann demonstrated that if X and Y are independently sampled from the same unimodal distribution, X − Y will exhibit symmetrical unimodality with its peak centered at zero. Building upon this foundational work, the current study delves into the structure of the kernel distribution. It is shown that the kth central moment kernel distributions (k > 2) derived from a unimodal distribution exhibit location invariance and is also nearly unimodal with the mode and median close to zero. This article provides an approach to study the general structure of kernel distributions.

Keywords:

moments

;

invariant

;

unimodal

;

U -statistics

;

generalized L-statistics

Subject:

Computer Science and Mathematics - Probability and Statistics

Significance Statement In nonparametric statistics, the focus is on the relative differences of robust estimators, which is considered more crucial than their precise values. This principle implies that if the underlying distribution’s parameters shift, then all corresponding nonparametric estimates are expected to uniformly and asymptotically adjust in a consistent direction, provided they target the same characteristic of the distribution. This article discusses the validity of this fundamental principle of nonparametrics in various scenarios. It is found that for the

k

th central moment, kernel distributions generally follow this principle.

The most popular robust scale estimator currently, the median absolute deviation, was popularized by Hampel (1974) [1], who credits the idea to Gauss in 1816 [2]. It can be seen as evaluating the median of a pseudo-sample formed by the absolute deviations of all values related to the sample median. The pseudo-sample size is n. Indeed, most scale estimators can be transformed in such ways. For example, range or interquartile range can be seen as evaluating the mean of a pseudo-sample with two values, and they belong to the class of scale estimators called quantile differences. In 1976, in their landmark series Descriptive Statistics for Nonparametric Models, Bickel and Lehmann [3] generalized a class of estimators as measures of the dispersion of a symmetric distribution around its center of symmetry. Median absolute deviation, sample variance, and average absolute deviation are all belonging to this class. In 1979, the same series, they [4] proposed a class of estimators referred to as measures of spread, which consider the pairwise differences of a random variable, irrespective of its symmetry, throughout its distribution, rather than focusing on dispersion relative to a fixed point. In the final section [4], they explored a version of the trimmed standard deviation based on

n^{2}

pairwise differences, which is modified here for comparison,

\begin{matrix} {[n (\frac{1}{2} - ϵ)]}^{- \frac{1}{2}} {[\sum_{i = \frac{n}{2}}^{n (1 - ϵ)} {[X_{i} - X_{n - i + 1}]}^{2}]}^{\frac{1}{2}}, \end{matrix}

(1)

and

{[(\binom{n}{2}) (1 - ϵ_{0} - γ ϵ_{0})]}^{- \frac{1}{2}} {[\sum_{i = (\binom{n}{2}) γ ϵ_{0}}^{(\binom{n}{2}) (1 - ϵ_{0})} {(X_{i_{1}} - X_{i_{2}})}_{i}^{2}]}^{\frac{1}{2}},

(2)

where

{(X_{i_{1}} - X_{i_{2}})}_{1} \leq \dots \leq {(X_{i_{1}} - X_{i_{2}})}_{(\binom{n}{2})}

are the order statistics of the pseudosample,

X_{i_{1}} - X_{i_{2}}

,

i_{1} < i_{2}

, provided that

(\binom{n}{2}) γ ϵ_{0} \in N

and

(\binom{n}{2}) (1 - ϵ_{0}) \in N

. They showed that, when

ϵ_{0} = 0

, the result obtained using [2] is equal to

\sqrt{2}

times the sample standard deviation. The paper ended with, “We do not know a fortiori which of the measures is preferable and leave these interesting questions open.”

Two examples of the impacts of that series are as follows. Oja (1981, 1983) [5,6] provided a more comprehensive and generalized examination of these concepts, and integrated the measures of location, dispersion, and spread as proposed by Bickel and Lehmann [3,4,7], along with van Zwet’s convex transformation order of skewness and kurtosis (1964) [8] for univariate and multivariate distributions, resulting a greater degree of generality and a broader perspective on these statistical constructs. Rousseeuw and Croux proposed a popular efficient scale estimator based on separate medians of pairwise differences taken over

i_{1}

and

i_{2}

[9] in 1993. However the importance of tackling the symmetry assumption has been greatly underestimated, as will be discussed later.

Here, their open question is addressed in two different aspects [4]. First, since the estimation of scale can be transformed into the location estimation of a pseudo-sample, according to the principle of the central limit theorem, the variances of such scale estimators should be linearly dependent on the standard deviation of the pseudo-sample and inversely dependent on the square root of the pseudo-sample size. Then, [2] based on

n^{2}

pairwise differences is obviously better than [1] since the ratio of its pseudo-sample size over that of [1] is n. So if just considering the size, the variance of [2] is

\frac{1}{\sqrt{n}}

of the variance of [1]. Another factor that needs to be considered is the standard deviation of the pseudo-sample. However, the standard deviation of the pseudo-sample is generally independent of different pseudo-sample sizes. So, no matter how different the standard deviation of the pseudo-samples of [1] and [2] is, as the sample size increases, the variance of [1] will always dominate that of [2]. Second, the nomenclature used in this series is introduced as follows:

Nomenclature. Given a robust estimator,

\hat{θ}

, which has an adjustable breakdown point,

ϵ

, that can approach zero asymptotically, the name of

\hat{θ}

comprises two parts: the first part denotes the type of estimator, and the second part represents the population parameter

θ

, such that

\hat{θ} \to θ

as

ϵ \to 0

. The abbreviation of the estimator combines the initial letters of the first part and the second part. If the estimator is symmetric, the upper asymptotic breakdown point,

ϵ

, is indicated in the subscript of the abbreviation of the estimator, with the exception of the median. For an asymmetric estimator based on quantile average, the associated

γ

follows

ϵ

.

In REDS I [10], it was shown that the bias of a robust estimator with an adjustable breakdown point is often monotonic with respect to the breakdown point in a semiparametric distribution. Naturally, the estimator’s name should reflect the population parameter that it approaches as

ϵ \to 0

. If multiplying all pseudo-samples by a factor of

\frac{1}{\sqrt{2}}

, then [2] is the trimmed standard deviation adhering to this nomenclature, since

ψ_{2} (x_{1}, x_{2}) = \frac{1}{2} {(x_{1} - x_{2})}^{2}

is the kernel function of the unbiased estimation of the second central moment by using U-statistic [11]. This definition should be preferable, not only because it is the square root of a trimmed U-statistic, which is closely related to the minimum-variance unbiased estimator (MVUE), but also because the second

γ

-orderliness of the second central moment kernel distribution is ensured by the next exciting theorem.

Theorem 1.

The second central moment kernel distribution generated from any unimodal distribution is second γ-ordered, provided that

γ \geq 0

.

Proof.

In 1954, Hodges and Lehmann established that if X and Y are independently drawn from the same unimodal distribution,

X - Y

will be a symmetric unimodal distribution peaking at zero [12]. Given the constraint in the pairwise differences that

X_{i_{1}} < X_{i_{2}}

,

i_{1} < i_{2}

, it directly follows from Theorem 1 in [12] that the pairwise difference distribution (

Ξ_{Δ}

) generated from any unimodal distribution is always monotonic increasing with a mode at zero. Since

X - X^{'}

is a negative variable that is monotonically increasing, applying the squaring transformation, the relationship between the original variable

X - X^{'}

and its squared counterpart

{(X - X^{'})}^{2}

can be represented as follows:

X - X^{'} < Y - Y^{'} \Rightarrow {(X - X^{'})}^{2} > {(Y - Y^{'})}^{2}

. In other words, as the negative values of

X - X^{'}

become larger in magnitude (more negative), their squared values

{(X - X^{'})}^{2}

become larger as well, but in a monotonically decreasing manner with a mode at zero. Further multiplication by

\frac{1}{2}

also does not change the monotonicity and mode, since the mode is zero. Therefore, the transformed pdf becomes monotonically decreasing with a mode at zero. In REDS I [10], it was proven that a right-skewed distribution with a monotonic decreasing pdf is always second

γ

-ordered, which gives the desired result. □

In REDS I [10], it was shown that any symmetric distribution is

ν

th U-ordered, suggesting that

ν

th U-orderliness does not require unimodality, e.g., a symmetric bimodal distribution is also

ν

th U-ordered. In the SI Text of REDS I [10], an analysis of the Weibull distribution showed that unimodality does not assure orderliness. Theorem 1 uncovers a profound relationship between unimodality, monotonicity, and second

γ

-orderliness, which is sufficient for

γ

-trimming inequality and

γ

-orderliness.

On the other hand, while robust estimation of scale has been intensively studied with established methods [3,4], the development of robust measures of asymmetry and kurtosis lags behind, despite the availability of several approaches [13,14,15,16,17]. The purpose of this paper is to demonstrate that, in light of previous works, the estimation of all central moments can be transformed into a location estimation problem by using U-statistics and the central moment kernel distributions possess desirable properties.

Robust Estimations of the Central Moments

In 1928, Fisher constructed

k

-statistics as unbiased estimators of cumulants [18]. Halmos (1946) proved that a functional

θ

admits an unbiased estimator if and only if it is a regular statistical functional of degree

k

and showed a relation of symmetry, unbiasness and minimum variance [19]. Hoeffding, in 1948, generalized U-statistics [20] which enable the derivation of a minimum-variance unbiased estimator from each unbiased estimator of an estimable parameter. In 1984, Serfling pointed out the speciality of Hodges-Lehmann estimator, which is neither a simple L-statistic nor a U-statistic, and considered the generalized L-statistics and trimmed U-statistics [21]. Given a kernel function

h_{k}

which is a symmetric function of

k

variables, the

L U

-statistic is defined as:

\begin{matrix} L U_{h_{k}, k, k, ϵ, γ, n} : = L L_{k, ϵ_{0}, γ, n} (sort ({(h_{k} (X_{N_{1}}, \dots, X_{N_{k}}))}_{N = 1}^{(\binom{n}{k})})), \end{matrix}

where

ϵ = 1 - {(1 - ϵ_{0})}^{\frac{1}{k}}

(proven in Subsection ?? in REDS III [22]),

X_{N_{1}}, \dots, X_{N_{k}}

are the n choose

k

elements from the sample,

L L_{k, ϵ_{0}, γ, n} (Y)

denotes the

L L

-statistic with the sorted sequence

sort ({(h_{k} (X_{N_{1}}, \dots, X_{N_{k}}))}_{N = 1}^{(\binom{n}{k})})

serving as an input. In the context of Serfling’s work, the term ‘trimmed U-statistic’ is used when

L L_{k, ϵ_{0}, γ, n}

is

{TM}_{ϵ_{0}, γ, n}

[21].

In 1997, Heffernan [11] obtained an unbiased estimator of the

k

th central moment by using U-statistics and demonstrated that it is the minimum variance unbiased estimator for distributions with the finite first

k

moments. The weighted H-L

k

th central moment (

2 \leq k \leq n

) is thus defined as,

\begin{matrix} WHL k m_{k, ϵ, γ, n} L U_{h_{k} = ψ_{k}, k, k, ϵ, γ, n}, \end{matrix}

where

{WHLM}_{k, ϵ_{0}, γ, n}

is used as the

L L_{k, ϵ_{0}, γ, n}

in

L U

,

ψ_{k} (x_{1}, \dots, x_{k}) = \sum_{j = 0}^{k - 2} {(- 1)}^{j} (\frac{1}{k - j}) \sum (x_{i_{1}}^{k - j} x_{i_{2}} \dots x_{i_{j + 1}}) + {(- 1)}^{k - 1} (k - 1) x_{1} \dots x_{k}

, the second summation is over

i_{1}, \dots, i_{j + 1} = 1

to

k

with

i_{1} \neq i_{2} \neq \dots \neq i_{j + 1}

and

i_{2} < i_{3} < \dots < i_{j + 1}

[11]. Despite the complexity, the following theorem offers an approach to infer the general structure of such kernel distributions.

Theorem 2.

Define a set T comprising all pairs

(ψ_{k} (v), f_{X, \dots, X} (v))

such that

ψ_{k} (v) = ψ_{k} (Q (p_{1}), \dots, Q (p_{k}))

with

Q (p_{1}) < \dots < Q (p_{k})

and

f_{X, \dots, X} (v) = k! f (Q (p_{1})) \dots f (Q (p_{k}))

is the probability density of the

k

-tuple,

v = (Q (p_{1}), \dots, Q (p_{k}))

(a formula drawn after a modification of the Jacobian density theorem).

T_{Δ}

is a subset of T, consisting all those pairs for which the corresponding

k

-tuples satisfy that

Q (p_{1}) - Q (p_{k}) = Δ

. The component quasi-distribution, denoted by

ξ_{Δ}

, has a quasi-pdf

f_{ξ_{Δ}} (\bar{Δ}) = \sum_{\begin{matrix} (ψ_{k} (v), f_{X, \dots, X} (v)) \in T_{Δ} \\ \bar{Δ} = ψ_{k} (v) \end{matrix}} f_{X, \dots, X} (v)

, i.e., sum over all

f_{X, \dots, X} (v)

such that the pair

(ψ_{k} (v), f_{X, \dots, X} (v))

is in the set

T_{Δ}

and the first element of the pair,

ψ_{k} (v)

, is equal to

\bar{Δ}

. The

k

th, where

k > 2

, central moment kernel distribution, labeled

Ξ_{k}

, can be seen as a quasi-mixture distribution comprising an infinite number of component quasi-distributions,

ξ_{Δ}

s, each corresponding to a different value of Δ, which ranges from

Q (0) - Q (1)

to 0. Each component quasi-distribution has a support of

(- {(\binom{k}{\frac{3 + {(- 1)}^{k}}{2}})}^{- 1} {(- Δ)}^{k}, \frac{1}{k} {(- Δ)}^{k})

.

Proof.

The support of

ξ_{Δ}

is the extrema of the function

ψ_{k} (Q (p_{1}), \dots, Q (p_{k}))

subjected to the constraints,

Q (p_{1}) < \dots < Q (p_{k})

and

Δ = Q (p_{1}) - Q (p_{k})

. Using the Lagrange multiplier, the only critical point can be determined at

Q (p_{1}) = \dots = Q (p_{k}) = 0

, where

ψ_{k} = 0

. Other candidates are within the boundaries, i.e.,

ψ_{k} (x_{1} = Q (p_{1}), x_{2} = Q (p_{k}), \dots, x_{k} = Q (p_{k}))

,

\dots

,

ψ_{k} (x_{1} = Q (p_{1}), \dots, x_{i} = Q (p_{1}), x_{i + 1} = Q (p_{k}), \dots, x_{k} = Q (p_{k}))

,

\dots

,

ψ_{k} (x_{1} = Q (p_{1}), \dots, x_{k - 1} = Q (p_{1}), x_{k} = Q (p_{k}))

.

ψ_{k} (x_{1} = Q (p_{1}), \dots, x_{i} = Q (p_{1}), x_{i + 1} = Q (p_{k}), \dots, x_{k} = Q (p_{k}))

can be divided into

k

groups. The gth group has the common factor

{(- 1)}^{g + 1} \frac{1}{k - g + 1}

, if

1 \leq g \leq k - 1

and the final

k

th group is the term

{(- 1)}^{k - 1} (k - 1) Q {(p_{1})}^{i} Q {(p_{k})}^{k - i}

. If

\frac{k + 1 - i}{2} \leq j \leq \frac{k - 1}{2}

and

j + 1 \leq g \leq k - j

, the gth group has

i (\binom{i - 1}{g - j - 1}) (\binom{k - i}{j})

terms having the form

{(- 1)}^{g + 1} \frac{1}{k - g + 1} Q {(p_{1})}^{k - j} Q {(p_{k})}^{j}

. If

\frac{k + 1 - i}{2} \leq j \leq \frac{k - 1}{2}

and

k - j + 1 \leq g \leq i + j

, the gth group has

i (\binom{i - 1}{g - j - 1}) (\binom{k - i}{j}) + (k - i) (\binom{k - i - 1}{j - k + g - 1}) (\binom{i}{k - j})

terms having the form

{(- 1)}^{g + 1} \frac{1}{k - g + 1} Q {(p_{1})}^{k - j} Q {(p_{k})}^{j}

. If

0 \leq j < \frac{k + 1 - i}{2}

and

j + 1 \leq g \leq i + j

, the gth group has

i (\binom{i - 1}{g - j - 1}) (\binom{k - i}{j})

terms having the form

{(- 1)}^{g + 1} \frac{1}{k - g + 1} Q {(p_{1})}^{k - j} Q {(p_{k})}^{j}

. If

\frac{k}{2} \leq j \leq k

and

k - j + 1 \leq g \leq j

, the gth group has

(k - i) (\binom{k - i - 1}{j - k + g - 1}) (\binom{i}{k - j})

terms having the form

{(- 1)}^{g + 1} \frac{1}{k - g + 1} Q {(p_{1})}^{k - j} Q {(p_{k})}^{j}

. If

\frac{k}{2} \leq j \leq k

and

j + 1 \leq g \leq j + i < k

, the gth group has

i (\binom{i - 1}{g - j - 1}) (\binom{k - i}{j}) + (k - i) (\binom{k - i - 1}{j - k + g - 1}) (\binom{i}{k - j})

terms having the form

{(- 1)}^{g + 1} \frac{1}{k - g + 1} Q {(p_{1})}^{k - j} Q {(p_{k})}^{j}

. So, if

i + j = k

,

\frac{k}{2} \leq j \leq k

,

0 \leq i \leq \frac{k}{2}

, the summed coefficient of

Q {(p_{1})}^{i} Q {(p_{k})}^{k - i}

is

{(- 1)}^{k - 1} (k - 1) + \sum_{g = i + 1}^{k - 1} {(- 1)}^{g + 1} \frac{1}{k - g + 1} (k - i) (\binom{k - i - 1}{g - i - 1}) + \sum_{g = k - i + 1}^{k - 1} {(- 1)}^{g + 1} \frac{1}{k - g + 1} i (\binom{i - 1}{g - k + i - 1}) =

{(- 1)}^{k - 1} (k - 1) + {(- 1)}^{k + 1} + (k - i) {(- 1)}^{k} + {(- 1)}^{k} (i - 1) = {(- 1)}^{k + 1}

. The summation identities are

\sum_{g = i + 1}^{k - 1} {(- 1)}^{g + 1} \frac{1}{k - g + 1} (k - i) (\binom{k - i - 1}{g - i - 1}) =

(k - i) \int_{0}^{1} \sum_{g = i + 1}^{k - 1} {(- 1)}^{g + 1} (\binom{k - i - 1}{g - i - 1}) t^{k - g} d t =

(k - i) \int_{0}^{1} ({(- 1)}^{i} {(t - 1)}^{k - i - 1} - {(- 1)}^{k + 1}) d t = (k - i) (\frac{{(- 1)}^{k}}{i - k} + {(- 1)}^{k}) = {(- 1)}^{k + 1} + (k - i) {(- 1)}^{k}

and

\sum_{g = k - i + 1}^{k - 1} {(- 1)}^{g + 1} \frac{1}{k - g + 1} i (\binom{i - 1}{g - k + i - 1}) =

\int_{0}^{1} \sum_{g = k - i + 1}^{k - 1} {(- 1)}^{g + 1} i (\binom{i - 1}{g - k + i - 1}) t^{k - g} d t =

\int_{0}^{1} (i {(- 1)}^{k - i} {(t - 1)}^{i - 1} - i {(- 1)}^{k + 1}) d t = {(- 1)}^{k} (i - 1)

. If

0 \leq j < \frac{k + 1 - i}{2}

and

i = k

,

ψ_{k} = 0

. If

\frac{k + 1 - i}{2} \leq j \leq \frac{k - 1}{2}

and

\frac{k + 1}{2} \leq i \leq k - 1

, the summed coefficient of

Q {(p_{1})}^{i} Q {(p_{k})}^{k - i}

is

{(- 1)}^{k - 1} (k - 1) + \sum_{g = k - i + 1}^{k - 1} {(- 1)}^{g + 1} \frac{1}{k - g + 1} i (\binom{i - 1}{g - k + i - 1}) + \sum_{g = i + 1}^{k - 1} {(- 1)}^{g + 1} \frac{1}{k - g + 1} (k - i) (\binom{k - i - 1}{g - i - 1})

, the same as above. If

i + j < k

, since

(\binom{i}{k - j}) = 0

, the related terms can be ignored, so, using the binomial theorem and beta function, the summed coefficient of

Q {(p_{1})}^{k - j} Q {(p_{k})}^{j}

is

\sum_{g = j + 1}^{i + j} {(- 1)}^{g + 1} \frac{1}{k - g + 1} i (\binom{i - 1}{g - j - 1}) (\binom{k - i}{j}) =

i (\binom{k - i}{j}) \int_{0}^{1} \sum_{g = j + 1}^{i + j} {(- 1)}^{g + 1} (\binom{i - 1}{g - j - 1}) t^{k - g} d t =

(\binom{k - i}{j}) i \int_{0}^{1} ({(- 1)}^{j} t^{k - j - 1} {(\frac{t}{t - 1})}^{1 - i}) d t =

(\binom{k - i}{j}) i \frac{{(- 1)}^{j + i + 1} Γ (i) Γ (k - j - i + 1)}{Γ (k - j + 1)} = \frac{{(- 1)}^{j + i + 1} i! (k - j - i)! (k - i)!}{(k - j)! j! (k - j - i)!} =

{(- 1)}^{j + i + 1} \frac{i! (k - i)!}{k!} \frac{k!}{(k - j)! j!} = {(\binom{k}{i})}^{- 1} {(- 1)}^{1 + i} (\binom{k}{j}) {(- 1)}^{j}

.

According to the binomial theorem, the coefficient of

Q {(p_{1})}^{i} Q {(p_{k})}^{k - i}

in

{(\binom{k}{i})}^{- 1} {(- 1)}^{1 + i} {(Q (p_{1}) - Q (p_{k}))}^{k}

is

{(\binom{k}{i})}^{- 1} {(- 1)}^{1 + i} (\binom{k}{i}) {(- 1)}^{k - i} = {(- 1)}^{k + 1}

, same as the above summed coefficient of

Q {(p_{1})}^{i} Q {(p_{k})}^{k - i}

, if

i + j = k

. If

i + j < k

, the coefficient of

Q {(p_{1})}^{k - j} Q {(p_{k})}^{j}

is

{(\binom{k}{i})}^{- 1} {(- 1)}^{1 + i} (\binom{k}{j}) {(- 1)}^{j}

, same as the corresponding summed coefficient of

Q {(p_{1})}^{k - j} Q {(p_{k})}^{j}

. Therefore,

ψ_{k} (x_{1} = Q (p_{1}), \dots, x_{i} = Q (p_{1}), x_{i + 1} = Q (p_{k}), \dots, x_{k} = Q (p_{k})) =

{(\binom{k}{i})}^{- 1} {(- 1)}^{1 + i} {(Q (p_{1}) - Q (p_{k}))}^{k}

, the maximum and minimum of

ψ_{k}

follow directly from the properties of the binomial coefficient. □

The component quasi-distribution,

ξ_{Δ}

, is closely related to

Ξ_{Δ}

, which is the pairwise difference distribution, since

\sum_{\bar{Δ} = - {(\binom{k}{\frac{3 + {(- 1)}^{k}}{2}})}^{- 1} {(- Δ)}^{k}}^{\frac{1}{k} {(- Δ)}^{k}} f_{ξ_{Δ}} (\bar{Δ}) = f_{Ξ_{Δ}} (Δ)

. Recall that Theorem 1 established that

f_{Ξ_{Δ}} (Δ)

is monotonic increasing with a mode at zero if the original distribution is unimodal,

f_{Ξ_{- Δ}} (- Δ)

is thus monotonic decreasing with a mode at zero. In general, if assuming the shape of

ξ_{Δ}

is uniform,

Ξ_{k}

is monotonic left and right around zero. The median of

Ξ_{k}

also exhibits a strong tendency to be close to zero, as it can be cast as a weighted mean of the medians of

ξ_{Δ}

. When

- Δ

is small, all values of

ξ_{Δ}

are close to zero, resulting in the median of

ξ_{Δ}

being close to zero as well. When

- Δ

is large, the median of

ξ_{Δ}

depends on its skewness, but the corresponding weight is much smaller, so even if

ξ_{Δ}

is highly skewed, the median of

Ξ_{k}

will only be slightly shifted from zero. Denote the median of

Ξ_{k}

as

m k m

, for the five parametric distributions here,

| m k m |

s are all

\leq 0.1 σ

for

Ξ_{3}

and

Ξ_{4}

, where

σ

is the standard deviation of

Ξ_{k}

(SI Dataset S1). Assuming

m k m = 0

, for the even ordinal central moment kernel distribution, the average probability density on the left side of zero is greater than that on the right side, since

\frac{\frac{1}{2}}{{(\binom{k}{2})}^{- 1} {(Q (0) - Q (1))}^{k}} > \frac{\frac{1}{2}}{\frac{1}{k} {(Q (0) - Q (1))}^{k}}

. This means that, on average, the inequality

f (Q (ϵ)) \geq f (Q (1 - ϵ))

holds. For the odd ordinal distribution, the discussion is more challenging since it is generally symmetric. Just consider

Ξ_{3}

, let

x_{1} = Q (p_{i})

and

x_{3} = Q (p_{j})

, changing the value of

x_{2}

from

Q (p_{i})

to

Q (p_{j})

will monotonically change the value of

ψ_{3} (x_{1}, x_{2}, x_{3})

, since

\frac{\partial ψ_{3} (x_{1}, x_{2}, x_{3})}{\partial x_{2}} = - \frac{x_{1}^{2}}{2} - x_{1} x_{2} + 2 x_{1} x_{3} + x_{2}^{2} - x_{2} x_{3} - \frac{x_{3}^{2}}{2}

,

- \frac{3}{4} {(x_{1} - x_{3})}^{2} \leq \frac{\partial ψ_{3} (x_{1}, x_{2}, x_{3})}{\partial x_{2}} \leq - \frac{1}{2} {(x_{1} - x_{3})}^{2} \leq 0

. If the original distribution is right-skewed,

ξ_{Δ}

will be left-skewed, so, for

Ξ_{3}

, the average probability density of the right side of zero will be greater than that of the left side, which means, on average, the inequality

f (Q (ϵ)) \leq f (Q (1 - ϵ))

holds. In all, the monotonic decreasing of the negative pairwise difference distribution guides the general shape of the

k

th central moment kernel distribution,

k > 2

, forcing it to be unimodal-like with the mode and median close to zero, then, the inequality

f (Q (ϵ)) \leq f (Q (1 - ϵ))

or

f (Q (ϵ)) \geq f (Q (1 - ϵ))

holds in general. If a distribution is

ν

th

γ

-ordered and all of its central moment kernel distributions are also

ν

th

γ

-ordered, it is called completely

ν

th

γ

-ordered.

Another crucial property of the central moment kernel distribution, location invariant, is introduced in the next theorem.

Theorem 3.

ψ_{k} (x_{1} = λ x_{1} + μ, \dots, x_{k} = λ x_{k} + μ) = λ^{k} ψ_{k} (x_{1}, \dots, x_{k})

.

Proof.

Recall that for the

k

th central moment, the kernel is

ψ_{k} (x_{1}, \dots, x_{k}) = \sum_{j = 0}^{k - 2} {(- 1)}^{j} (\frac{1}{k - j}) \sum (x_{i_{1}}^{k - j} x_{i_{2}} \dots x_{i_{j + 1}}) + {(- 1)}^{k - 1} (k - 1) x_{1} \dots x_{k}

, where the second summation is over

i_{1}, \dots, i_{j + 1} = 1

to

k

with

i_{1} \neq i_{2} \neq \dots \neq i_{j + 1}

and

i_{2} < i_{3} < \dots < i_{j + 1}

[11].

ψ_{k}

consists of two parts. The first part,

\sum_{j = 0}^{k - 2} {(- 1)}^{j} (\frac{1}{k - j}) \sum (x_{i_{1}}^{k - j} x_{i_{2}} \dots x_{i_{j + 1}})

, involves a double summation over certain terms. The second part,

{(- 1)}^{k - 1} (k - 1) x_{1} \dots x_{k}

, carries an alternating sign

{(- 1)}^{k - 1}

and involves multiplication of the constant

k - 1

with the product of all the x variables,

x_{1} x_{2} \dots x_{k}

. Consider each multiplication cluster

{(- 1)}^{j} (\frac{1}{k - j}) \sum (x_{i_{1}}^{k - j} x_{i_{2}} \dots x_{i_{j + 1}})

for j ranging from 0 to

k - 2

in the first part. Let each cluster form a single group. The first part can be divided into

k - 1

groups. Combine this with the second part

{(- 1)}^{k - 1} (k - 1) x_{1} \dots x_{k}

. Together, the terms of

ψ_{k}

can be divided into a total of

k

groups. From the 1st to

k - 1

th group, the gth group has

(\binom{k}{g}) (\binom{g}{1})

terms having the form

{(- 1)}^{g + 1} \frac{1}{k - g + 1} x_{i_{1}}^{k - g + 1} x_{i_{2}} \dots x_{i_{g}}

. The final

k

th group is the term

{(- 1)}^{k - 1} (k - 1) x_{1} \dots x_{k}

.

There are two ways to divide

ψ_{k}

into

k

groups according to the form of each term. The first choice is, if

k \neq g

, the gth group of

ψ_{k}

has

(\binom{k - l}{g - l})

terms having the form

{(- 1)}^{g + 1} \frac{1}{k - g + 1} x_{i_{1}}^{k - g + 1} x_{i_{2}} \dots x_{i_{l}} x_{i_{l + 1}} \dots x_{i_{g}}

, where

x_{i_{1}}, x_{i_{2}}, \dots, x_{i_{l}}

are fixed,

x_{i_{l + 1}}, \dots, x_{i_{g}}

are selected such that

i_{l + 1}, \dots, i_{g} \neq i_{1}, i_{2}, \dots, i_{l}

and

i_{l + 1} \neq \dots \neq i_{g}

. Define another function

Ψ_{k} (x_{i_{1}}, x_{i_{2}}, \dots, x_{i_{l}}, x_{i_{l + 1}}, \dots, x_{i_{g}}) = {(λ x_{i_{1}} + μ)}^{k - g + 1} (λ x_{i_{2}} + μ) \dots (λ x_{i_{l}} + μ) (λ x_{i_{l + 1}} + μ) \dots (λ x_{i_{g}} + μ)

, the first group of

Ψ_{k}

is

λ^{k} x_{i_{1}} \dots x_{i_{l}} x_{i_{l + 1}} \dots x_{i_{g}}

, the hth group of

Ψ_{k}

,

h > 1

, has

(\binom{k - g + 1}{k - h - l + 2})

terms having the form

λ^{k - h + 1} μ^{h - 1} x_{i_{1}}^{k - h - l + 2} x_{i_{2}} \dots x_{i_{l}}

. Transforming

ψ_{k}

by

Ψ_{k}

, then combing all terms with

λ^{k - h + 1} μ^{h - 1} x_{i_{1}}^{k - h - l + 2} x_{i_{2}} \dots x_{i_{l}}

,

k - h - l + 2 > 1

, the summed coefficient is

{S 1}_{l} = \sum_{g = l}^{h + l - 1} {(- 1)}^{g + 1} \frac{1}{k - g + 1} (\binom{k - g + 1}{k - h - l + 2}) (\binom{k - l}{g - l}) = \sum_{g = l}^{h + l - 1} {(- 1)}^{g + 1} \frac{(k - l)!}{(h + l - g - 1)! (k - h - l + 2)! (g - l)!} = 0,

since the summation is starting from l, ending at

h + l - 1

, the first term includes the factor

g - l = 0

, the final term includes the factor

h + l - g - 1 = 0

, the terms in the middle are also zero due to the factorial property.

Another possible choice is the gth group of

ψ_{k}

has

(k - h) (\binom{h - 1}{g - k + h - 1})

terms having the form

{(- 1)}^{g + 1} \frac{1}{k - g + 1} x_{i_{1}} x_{i_{2}} \dots x_{i_{j}}^{k - g + 1} \dots x_{i_{k - h + 1}} x_{i_{k - h + 2}} \dots x_{i_{g}}

, provided that

k \neq g

,

2 \leq j \leq k - h + 1

, where

x_{i_{1}}, \dots, x_{i_{k - h + 1}}

are fixed,

x_{i_{j}}^{k - g + 1}

and

x_{i_{k - h + 2}}, \dots, x_{i_{g}}

are selected such that

i_{k - h + 2}, \dots, i_{g} \neq i_{1}, i_{2}, \dots, i_{k - h + 1}

and

i_{k - h + 2} \neq \dots \neq i_{g}

. Transforming these terms by

Ψ_{k} (x_{i_{1}}, x_{i_{2}}, \dots, x_{i_{j}}, \dots, x_{i_{k - h + 1}}, x_{i_{k - h + 2}}, \dots, x_{i_{g}}) = (λ x_{i_{1}} + μ) (λ x_{i_{2}} + μ) \dots {(λ x_{i_{j}} + μ)}^{k - g + 1} \dots (λ x_{i_{k - h + 1}} + μ)

(λ x_{i_{k - h + 2}} + μ) \dots (λ x_{i_{g}} + μ)

, then there are

k - g + 1

terms having the form

λ^{k - h + 1} μ^{h - 1} x_{i_{1}} x_{i_{2}} \dots x_{i_{k - h + 1}}

. Transforming the final

k

th group of

ψ_{k}

by

Ψ_{k} (x_{1}, \dots, x_{k}) = (λ x_{1} + μ) \dots (λ x_{k} + μ)

, then, there is one term having the form

{(- 1)}^{k - 1} (k - 1) λ^{k - h + 1} μ^{h - 1} x_{1} x_{2} \dots x_{k - h + 1}

. Another possible combination is that the gth group of

ψ_{k}

contains

(g - k + h - 1) (\binom{h - 1}{g - k + h - 1})

terms having the form

{(- 1)}^{g + 1} \frac{1}{k - g + 1} x_{i_{1}} x_{i_{2}} \dots x_{i_{k - h + 1}} x_{i_{k - h + 2}} \dots x_{i_{j}}^{k - g + 1} \dots x_{i_{g}}

. Transforming these terms by

Ψ_{k} (x_{i_{1}}, x_{i_{2}}, \dots, x_{i_{k - h + 1}}, x_{i_{k - h + 2}}, \dots, x_{i_{j}}, \dots, x_{i_{g}}) = (λ x_{i_{1}} + μ) (λ x_{i_{2}} + μ) \dots (λ x_{i_{k - h + 1}} + μ) (λ x_{i_{k - h + 2}} + μ)

\dots {(λ x_{i_{j}} + μ)}^{k - g + 1} \dots (λ x_{i_{g}} + μ)

, then there is only one term having the form

λ^{k - h + 1} μ^{h - 1} x_{i_{1}} x_{i_{2}} \dots x_{i_{k - h + 1}}

. The above summation

S 1_{l}

should also be included, i.e.,

x_{i_{1}}^{k - h - l + 2} = x_{i_{1}}

,

k = h + l - 1

. So, combing all terms with

λ^{k - h + 1} μ^{h - 1} x_{i_{1}} x_{i_{2}} \dots x_{i_{k - h + 1}}

, according to the binomial theorem, the summed coefficient is

S 2_{l} = \sum_{g = k - h + 1}^{k - 1} {(- 1)}^{g + 1} (\binom{h - 1}{g - k + h - 1}) (k - h + 1 + \frac{g - k + h - 1}{k - g + 1})

+ {(- 1)}^{k - 1} (k - 1) = (k - h + 1) \sum_{g = k - h + 1}^{k - 1} {(- 1)}^{g + 1} (\binom{h - 1}{g - k + h - 1}) + \sum_{g = k - h + 1}^{k - 1} {(- 1)}^{g + 1} (\binom{h - 1}{g - k + h - 1}) (\frac{g - k + h - 1}{k - g + 1})

+ {(- 1)}^{k - 1} (k - 1) = {(- 1)}^{k} (k - h + 1) + (h - 2) {(- 1)}^{k} + {(- 1)}^{k - 1} (k - 1) = 0

. The summation identities required are

\sum_{g = k - h + 1}^{k - 1} {(- 1)}^{g + 1} (\binom{h - 1}{g - k + h - 1}) = {(- 1)}^{k}

and

\sum_{g = k - h + 1}^{k - 1} {(- 1)}^{g + 1} (\binom{h - 1}{g - k + h - 1}) (\frac{g - k + h - 1}{k - g + 1}) = (h - 2) {(- 1)}^{k}

. These two summation identities are proven in Lemma 4 and 5 in the SI Text.

Thus, no matter in which way, all terms including

μ

can be canceled out. The proof is complete by noticing that the remaining part is

λ^{k} ψ_{k} (x_{1}, \dots, x_{k})

. □

A direct result of Theorem 3 is that,

WHL k m

after standardization is invariant to location and scale. So, the weighted H-L standardized

k

th moment is defined to be

\begin{matrix} WHL s k m_{ϵ = min (ϵ_{1}, ϵ_{2}), k_{1}, k_{2}, γ_{1}, γ_{2}, n} \frac{WHL k m_{k_{1}, ϵ_{1}, γ_{1}, n}}{{(WHL v a r_{k_{2}, ϵ_{2}, γ_{2}, n})}^{k / 2}} . \end{matrix}

To avoid confusion, it should be noted that the robust location estimations of the kernel distributions discussed in this paper differ from the approach taken by Joly and Lugosi (2016) [23], which is computing the median of all U-statistics from different disjoint blocks. Compared to bootstrap median U-statistics, this approach can produce two additional kinds of finite sample bias, one arises from the limited numbers of blocks, another is due to the size of the U-statistics (consider the mean of all U-statistics from different disjoint blocks, it is definitely not identical to the original U-statistic, except when the kernel is the Hodges-Lehmann kernel). Laforgue, Clemencon, and Bertail (2019)’s median of randomized U-statistics [24] is more sophisticated and can overcome the limitation of the number of blocks, but the second kind of bias remains unsolved.

Congruent Distribution

In the realm of nonparametric statistics, the relative differences, or orders, of robust estimators are of primary importance. A key implication of this principle is that when there is a shift in the parameters of the underlying distribution, all nonparametric estimates should asymptotically change in the same direction, if they are estimating the same attribute of the distribution. If, on the other hand, the mean suggests an increase in the location of the distribution while the median indicates a decrease, a contradiction arises. It is worth noting that such contradiction is not possible for any

L L

-statistics in a location-scale distribution, as explained in Theorem 2 and 18 in REDS I. However, it is possible to construct counterexamples to the aforementioned implication in a shape-scale distribution. In the case of the Weibull distribution, its quantile function is

Q_{W e i} (p) = λ {(- ln (1 - p))}^{1 / α}

, where

0 \leq p \leq 1

,

α > 0

,

λ > 0

,

λ

is a scale parameter,

α

is a shape parameter, ln is the natural logarithm function. Then,

m = λ \sqrt[α]{ln (2)}

,

μ = λ Γ (1 + \frac{1}{α})

, where

Γ

is the gamma function. When

α = 1

,

m = λ ln (2) \approx 0.693 λ

,

μ = λ

, when

α = \frac{1}{2}

,

m = λ {ln}^{2} (2) \approx 0.480 λ

,

μ = 2 λ

, the mean increases as

α

changes from 1 to

\frac{1}{2}

, but the median decreases. In the last section, the fundamental role of quantile average was demonstrated by using the method of classifying distributions through the signs of derivatives. To avoid such scenarios, this method can also be used. Let the quantile average function of a parametric distribution be denoted as

QA (ϵ, γ, α_{1}, \dots, α_{i}, \dots, α_{k})

, where

α_{i}

represent the parameters of the distribution, then, a distribution is

γ

-congruent if and only if the sign of

\frac{\partial QA}{\partial α_{i}}

remains the same for all

0 \leq ϵ \leq \frac{1}{1 + γ}

. If

\frac{\partial QA}{\partial α_{i}}

is equal to zero or undefined, it can be considered both positive and negative, and thus does not impact the analysis. A distribution is completely

γ

-congruent if and only if it is

γ

-congruent and all its central moment kernel distributions are also

γ

-congruent. Setting

γ = 1

constitutes the definitions of congruence and complete congruence. Replacing the QA with

γ m

HLM gives the definition of

γ

-U-congruence. Chebyshev’s inequality implies that, for any probability distributions with finite second moments, as the parameters change, even if some

L L

-statistics change in a direction different from that of the population mean, the magnitude of the changes in the

L L

-statistics remains bounded compared to the changes in the population mean. Furthermore, distributions with infinite moments can be

γ

-congruent, since the definition is based on the quantile average, not the population mean.

The following theorems show the conditions that a distribution is congruent or

γ

-congruent.

Theorem 4.

A symmetric distribution is always congruent and U-congruent.

Proof.

As shown in Theorem 2 and Theorem 18 in REDS I, for any symmetric distribution, all quantile averages and all

γ m

HLMs conincide. The conclusion follows immediately. □

Theorem 5.

A positive definite location-scale distribution is always γ-congruent.

Proof.

As shown in Theorem 2, for a location-scale distribution, any quantile average can be expressed as

λ {QA}_{0} (ϵ, γ) + μ

. Therefore, the derivatives with respect to the parameters

λ

or

μ

are always positive. By application of the definition, the desired outcome is obtained. □

For the Pareto distribution,

\frac{\partial Q}{\partial α} = \frac{x_{m} {(1 - p)}^{- 1 / α} ln (1 - p)}{α^{2}}

. Since

ln (1 - p) < 0

for all

0 < p < 1

,

{(1 - p)}^{- 1 / α} > 0

for all

0 < p < 1

and

α > 0

, so

\frac{\partial Q}{\partial α} < 0

, and therefore

\frac{\partial QA}{\partial α} < 0

, the Pareto distribution is

γ

-congruent. It is also

γ

-U-congruent, since

γ m

HLM can also express as a function of

Q (p)

. For the lognormal distribution,

\frac{\partial QA}{\partial σ} = \frac{1}{2} (\sqrt{2} {erfc}^{- 1} (2 γ ϵ) (- e^{\frac{\sqrt{2} μ - 2 σ {erfc}^{- 1} (2 γ ϵ)}{\sqrt{2}}}) + (- \sqrt{2}) {erfc}^{- 1} (2 (1 - ϵ)) e^{\frac{\sqrt{2} μ - 2 σ {erfc}^{- 1} (2 (1 - ϵ))}{\sqrt{2}}})

. Since the inverse complementary error function is positive when the input is smaller than 1, and negative when the input is larger than 1, and symmetry around 1, if

0 \leq γ \leq 1

,

{erfc}^{- 1} (2 γ ϵ) \geq - {erfc}^{- 1} (2 - 2 ϵ)

,

e^{μ - \sqrt{2} σ {erfc}^{- 1} (2 - 2 ϵ)} > e^{μ - \sqrt{2} σ {erfc}^{- 1} (2 γ ϵ)}

. Therefore, if

0 \leq γ \leq 1

,

\frac{\partial QA}{\partial σ} > 0

, the lognormal distribution is

γ

-congruent. Theorem 4 implies that the generalized Gaussian distribution is congruent and U-congruent. For the Weibull distribution, when

α

changes from 1 to

\frac{1}{2}

, the average probability density on the left side of the median increases, since

\frac{\frac{1}{2}}{λ ln (2)} < \frac{\frac{1}{2}}{λ {ln}^{2} (2)}

, but the mean increases, indicating that the distribution is more heavy-tailed, the probability density of large values will also increase. So, the reason for non-congruence of the Weibull distribution lies in the simultaneous increase of probability densities on two opposite sides as the shape parameter changes: one approaching the bound zero and the other approaching infinity. Note that the gamma distribution does not have this issue, Numerical results indicate that it is likely to be congruent.

The next theorem shows an interesting relation between congruence and the central moment kernel distribution.

Theorem 6.

The second central moment kernal distribution derived from a continuous location-scale unimodal distribution is always γ-congruent.

Proof.

Theorem 3 shows that the central moment kernel distribution generated from a location-scale distribution is also a location-scale distribution. Theorem 1 shows that it is positively definite. Implementing Theorem 12 in REDS 1 yields the desired result. □

Although some parametric distributions are not congruent, as shown in REDS 1. In REDS 1, Theorem 12 establishes that

γ

-congruence always holds for a positive definite location-scale family distribution and thus for the second central moment kernel distribution generated from a location-scale unimodal distribution as shown in Theorem 6. Theorem 2 demonstrates that all central moment kernel distributions are unimodal-like with mode and median close to zero, as long as they are generated from unimodal distributions. Assuming finite moments and constant

Q (0) - Q (1)

, increasing the mean of a distribution will result in a generally more heavy-tailed distribution, i.e., the probability density of the values close to

Q (1)

increases, since the total probability density is 1. In the case of the

k

th central moment kernel distribution,

k > 2

, while the total probability density on either side of zero remains generally constant as the median is generally close to zero and much less impacted by increasing the mean, the probability density of the values close to zero decreases as the mean increases. This transformation will increase nearly all symmetric weighted averages, in the general sense. Therefore, except for the median, which is assumed to be zero, nearly all symmetric weighted averages for all central moment kernel distributions derived from unimodal distributions should change in the same direction when the parameters change.

Discussion

Moments, including raw moments, central moments, and standardized moments, are the most common parameters that describe probability distributions. Central moments are preferred over raw moments because they are invariant to translation. In 1947, Hsu and Robbins proved that the arithmetic mean converges completely to the population mean provided the second moment is finite [25]. The strong law of large numbers (proven by Kolmogorov in 1933) [26] implies that the

k

th sample central moment is asymptotically unbiased. Recently, fascinating statistical phenomena regarding Taylor’s law for distributions with infinite moments have been discovered by Drton and Xiao (2016) [27], Pillai and Meng (2016) [28], Cohen, Davis, and Samorodnitsky (2020) [29], and Brown, Cohen, Tang, and Yam (2021) [30]. Lindquist and Rachev (2021) raised a critical question in their inspiring comment to Brown et al’s paper [30]: "What are the proper measures for the location, spread, asymmetry, and dependence (association) for random samples with infinite mean?" [31]. From a different perspective, this question closely aligns with the essence of Bickel and Lehmann’s open question in 1979 [4]. They suggested using median, interquartile range, and medcouple [32] as the robust versions of the first three moments. While answering this question is not the focus of this paper, it is almost certain that the estimators proposed in this paper will have a place. Since the estimation of central moments can be transformed into the location estimation of a pseudosample, according to the general principle of central limit theorem, the optimal estimator should always has a combinatorial pseudosample size, which explains, in another aspect, why the theory of U-statistics allows a minimum-variance unbiased estimator to be derived from each unbiased estimator of an estimable parameter. Similar to the robust version of L-moment [33] being trimmed L-moment [17], central moments now also have their robust nonparametric version, weighted Hodges-Lehmann central moments, based on the complete U-congruence of the underlying distribution.

Author Contributions

T.L. designed research, performed research, analyzed data, and wrote the paper.

Conflicts of Interest

The author declares no competing interest.

Software Availability

The codes used to compute the weighted H-L

k

th central moment have been deposited in GitHub

References

Hampel, F.R. The influence curve and its role in robust estimation. Journal of the american statistical association 1974, 69, 383–393. [Google Scholar] [CrossRef]
Gauss, C.F. Bestimmung der genauigkeit der beobachtungen. Ibidem 1816, 129–138. [Google Scholar]
Bickel, P.J.; Lehmann, E.L. Descriptive statistics for nonparametric models. iii. dispersion in Selected works of EL Lehmann. Springer: 2012; pp. 499–518.
Bickel, P.J.; Lehmann, E.L. Descriptive statistics for nonparametric models iv. spread in Selected Works of EL Lehmann. Springer: 2012; pp. 519–526.
Oja, H. On location, scale, skewness and kurtosis of univariate distributions. Scandinavian Journal of statistics 1981, 154–168. [Google Scholar]
Oja, H. Descriptive statistics for multivariate distributions. Statistics & Probability Letters 1983, 1, 327–332. [Google Scholar]
Bickel, P.J.; Lehmann, E.L. Descriptive statistics for nonparametric models ii. location in selected works of EL Lehmann. Springer: 2012; pp. 473–497.
van Zwet, W. Convex transformations: A new approach to skewness and kurtosis in Selected Works of Willem van Zwet. Springer: 2012; pp. 3–11.
Rousseeuw, P.J.; Croux, C. Alternatives to the median absolute deviation. Journal of the American Statistical association 1993, 88, 1273–1283. [Google Scholar] [CrossRef]
Li, T. Robust estimations from distribution structures: Mean. 2023.
Heffernan, P.M. Unbiased estimation of central moments by using u-statistics. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 1997, 59, 861–863. [Google Scholar] [CrossRef]
Hodges, J.; Lehmann, E. Matching in paired comparisons. The Annals of Mathematical Statistics 1954, 25, 787–791. [Google Scholar] [CrossRef]
Bowley, A.L. Elements of statistics. (King) No. 8, (1926).
van Zwet, W.R. Convex Transformations of Random Variables: Nebst Stellingen. 1964.
Groeneveld, R.A.; Meeden, G. Measuring skewness and kurtosis. Journal of the Royal Statistical Society: Series D (The Statistician) 1984, 33, 391–399. [Google Scholar] [CrossRef]
Saw, J. Moments of sample moments of censored samples from a normal population. Biometrika 1958, 45, 211–221. [Google Scholar] [CrossRef]
Elamir, E.A.; Seheult, A.H. Trimmed l-moments. Computational Statistics & Data Analysis 2003, 43, 299–314. [Google Scholar]
Fisher, R.A. Moments and product moments of sampling distributions. Proceedings of the London Mathematical Society 1930, 2, 199–238. [Google Scholar] [CrossRef]
Halmos, P.R. The theory of unbiased estimation. The Annals of Mathematical Statistics 1946, 17, 34–43. [Google Scholar] [CrossRef]
Hoeffding, W. A class of statistics with asymptotically normal distribution. The Annals of Mathematical Statistics 1948, 19, 293–325. [Google Scholar] [CrossRef]
Serfling, R.J. Generalized l-, m-, and r-statistics. The Annals of Statistics 1984, 12, 76–86. [Google Scholar] [CrossRef]
Li, T. Robust estimations from distribution structures: Invariant moments. Zenodo 2023. [CrossRef]
Joly, E.; Lugosi, G. Robust estimation of u-statistics. Stochastic Processes and their Applications 2016, 126, 3760–3773. [Google Scholar] [CrossRef]
Laforgue, P.; Clémençon, S.; Bertail, P. On medians of (randomized) pairwise means in International Conference on Machine Learning. (PMLR), 2019; pp. 1272–1281.
Hsu, P.L.; Robbins, H. Complete convergence and the law of large numbers. Proceedings of the national academy of sciences 1947, 33, 25–31. [Google Scholar] [CrossRef]
Kolmogorov, A. Sulla determinazione empirica di una lgge di distribuzione. Inst. Ital. Attuari, Giorn. 1933, 4, 83–91. [Google Scholar]
Drton, M.; Xiao, H. Wald tests of singular hypotheses. Bernoulli 2016, 22, 38–59. [Google Scholar] [CrossRef]
Pillai, N.S.; Meng, X.L. An unexpected encounter with cauchy and lévy. The Annals of Statistics 2016, 44, 2089–2097. [Google Scholar] [CrossRef]
Cohen, J.E.; Davis, R.A.; Samorodnitsky, G. Heavy-tailed distributions, correlations, kurtosis and taylor’s law of fluctuation scaling. Proceedings of the Royal Society A 2020, 476, 20200610. [Google Scholar] [CrossRef] [PubMed]
Brown, M.; Cohen, J.E.; Tang, C.F.; Yam, S.C.P. Taylor’s law of fluctuation scaling for semivariances and higher moments of heavy-tailed data. Proceedings of the National Academy of Sciences 2021, 118, e2108031118. [Google Scholar] [CrossRef] [PubMed]
Lindquist, W.B.; Rachev, S.T. Taylor’s law and heavy-tailed distributions. Proceedings of the National Academy of Sciences 2021, 118, e2118893118. [Google Scholar] [CrossRef]
Brys, G.; Hubert, M.; Struyf, A. A robust measure of skewness. Journal of Computational and Graphical Statistics 2004, 13, 996–1017. [Google Scholar] [CrossRef]
Hosking, J.R. L-moments: Analysis and estimation of distributions using linear combinations of order statistics. Journal of the Royal Statistical Society: Series B (Methodological) 1990, 52, 105–124. [Google Scholar] [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Robust Estimations from Distribution Structures: II. Central Moments

Abstract

Keywords:

Subject:

Robust Estimations of the Central Moments

Congruent Distribution

Discussion

Author Contributions

Conflicts of Interest

Software Availability

References

MDPI Initiatives

Important Links

Subscribe