Intrinsically Optimal Equivariant Estimator for the Univariate Linear Normal Models

Gloria García; Marta Cubedo; Josep Maria Oller

doi:10.20944/preprints202510.0742.v1

Submitted:

09 October 2025

Posted:

13 October 2025

You are already at the latest version

Abstract

In this paper we prove the existence and unicity of the minimum risk equivariant estimator (MIRE) for the univariate linear normal models, under the action of the subgroup of the affine group which leaves the column space of the design matrix invariant and using the framework of the intrinsic analysis of statistical estimation which uses the square of the Rao distance as the loss function, which expectation behaves very different to the corresponding of the square error loss when we work with small samples, and supplying also an intrinsic bias measure for any equivariant estimator. This estimator is studied and compared with the standard maximum likelihood estimator (MLE) in terms of its intrinsic risk and bias, showing explicitly its differences, apparent with small samples, fortunately small for large sample sizes. Moreover, an approximate and very simple estimator is also suggested, which reduces the majority of bad behaviour of MLE, of intrinsic risk and bias, for small samples.

Keywords:

Equivariant estimator

;

Information metric

;

Riemannian risk

;

Intrinsic bias

Subject:

Computer Science and Mathematics - Probability and Statistics

1. Introduction

The Fisher information matrix of a regular parametric family of probability distributions induces a natural Riemannian structure on the parameter space [1,2,3]. This structure provides the foundation for the intrinsic analysis of statistical estimation, see [4,5] and, beyond statistics, it has also been shown to reveal connections with physical laws [6].

In this intrinsic framework, estimators are assessed through tools that do not depend on any particular parametrization of the model. Consequently, non-intrinsic criteria such as the squared error loss should be replaced by intrinsic measures. A natural choice is the squared Riemannian (Rao) distance, which serves as an intrinsic loss function. The associated risk can behave very differently from that based on squared error loss, especially in small-sample regimes. Similarly, the classical definition of bias must be reformulated in terms of a vector field determined by the geometry of the model, whose squared norm provides a natural intrinsic bias measure.

The estimator itself should also be intrinsic, i.e., independent of parametrization. Consider, for instance, the exponential distribution under two parametrizations,

f (x, θ) = θ \exp (- θ x) 1_{R^{+}} (x) and f (x, λ) = \frac{1}{λ} \exp (- x / λ) 1_{R^{+}} (x)

where

1_{R^{+}} (x)

is the positive real numbers indicator. The UMVU estimator computed under each parametrization

\hat{θ}

and

\hat{λ}

are not related by

\hat{θ} = 1 / \hat{λ}

. This apparent inconsistency arises because UMVU estimators rely on non-intrinsic notions such as unbiasedness and variance. As a result, the UMVU estimator is parametrization-dependent and cannot be regarded as intrinsically defined.

When the statistical model is invariant under the action of a transformation group on the sample space, it is natural to restrict attention to equivariant estimators, i.e., estimators U satisfying

U (g (x)) = \bar{g} (U (x)) for all g \in G_{n}

where

G_{n}

is the transformation group acting on the space of all samples of size n and

\bar{g} \in \bar{G}

is the induced transformation on the parameter space

Θ

. This restriction ensures logical consistency: only equivariant estimators remain coherent under data transformations. It is therefore a requirement that should be imposed prior to any attempt to minimize an intrinsic risk function.

As an illustration, consider the estimation of the unconstrained mean

μ

of a p–variate normal distribution. The MLE of

μ

(which really defines and intrinsic estimator) based on a sample of size n, is just the sample mean vector

{\bar{X}}_{n}

. However, for

p \geq 3

, the James–Stein estimator is known to have smaller quadratic risk [7]. In our view, this does not undermine the MLE. Although the James–Stein estimator enjoys a lower risk under the non-intrinsic quadratic loss, it lacks the essential property of equivariance, which is crucial from the intrinsic perspective.

In this work we focus on classical statistical estimation, as is standard in the absence of prior knowledge. Nonetheless, intrinsic Bayesian approaches based on non-informative priors are also possible [5,8], and represent an interesting avenue for future research. Here, we adopt the squared Rao distance as a natural loss function, since it is the conceptually simplest intrinsic analogue of the quadratic loss, despite potential computational challenges. For a broader background on statistical estimation and its intrinsic developments, see [9,10,11,12].

Specifically, we consider the univariate linear normal model with a fixed design matrix. First, we explicitly characterize the class of equivariant estimators under the action of a suitable subgroup of the affine group–namely, those affine transformations of the data that leave the column space of the design matrix unchanged. This subgroup is the largest one that preserves the model’s structure. Next, we prove the existence and uniqueness of the estimator within this class that minimizes intrinsic risk, i.e., the equivariant estimator that minimizes the mean squared Rao distance. We also derive an explicit expression for the intrinsic bias of any equivariant estimator.

Furthermore, we compare the intrinsic bias and risk of the proposed estimator with those of the MLE, highlighting their differences in small samples and showing how these differences diminish as the sample size grows. Finally, we propose a computable approximation of the optimal estimator, which mitigates most of the intrinsic bias and risk issues that affect the MLE in finite samples.

2. Equivariant estimators for linear models

Let us consider the univariate linear normal model,

y = X β + e

(1)

where

y

is a

n \times 1

random vector,

X

is a

n \times m

matrix of known constants with

0 < rank (X) = m \leq n

,

β

is a

m \times 1

vector of unknown parameters to be estimated and

e

is the fluctuation or error of

y

about

X β

. We assume that the errors are unbiased, independent, with the same variance

σ^{2}

and following a n–variate normal distribution, that is

e \sim N_{n} (0, σ^{2} I_{n})

, where

I_{n}

is the

n \times n

identity matrix. Therefore

y

distributes according to an element of the parametric family of probability distributions

P = {N_{n} (X β, σ^{2} I_{n}) | (β, σ) \in Θ}

with parameter space

Θ = R^{m} \times R_{+}

, a

(m + 1)

– dimensional simply connected real manifold. Hereafter, we shall identify the elements of

R^{m}

with

m \times 1

column vectors, when necessary.

Denote by

O_{E} (n) = {H \in O (n) | y \in E \Rightarrow Hy \in E}

where E is a subspace of

R^{n}

and

O (n)

is the group of

n \times n

orthogonal matrices with entries from

R

. Define F as the subspace of

R^{n}

spanned by the columns of

X

, that is

F = Col (X)

. Observe that

O_{F} (n) = O_{F^{⊥}} (n)

,

I_{n} \in O_{F} (n)

,

Col (X) = Col (HX) \forall H \in O_{F} (n)

and if

H \in O_{F} (n)

then

H^{t} \in O_{F} (n)

. Moreover, every

H \in O_{F} (n)

induces, in F and

F^{⊥}

, two isomorphisms isomorphisms preserving the Euclidean norm in each subspace.

P

is invariant under the action of the subgroup of the affine group in

R^{n}

given by the family of transformations

g_{(a, H, c)} (y) = a H y + c, y \in R^{n}

(2)

where

a > 0, H \in O_{F} (n), c \in F

.

Observe that

G = {g_{(a, H, c)} | a > 0, H \in O_{F} (n), c \in F}

induces an action on the parameter space

θ

given by

\begin{matrix} \bar{g_{(a, H, c)}} (β, σ) & = & (a {(X^{t} X)}^{- 1} X^{t} H X β + {(X^{t} X)}^{- 1} X^{t} c, a σ) \end{matrix}

(3)

where this result is obtained taking into account that

X {(X^{t} X)}^{- 1} X^{t}

is the projection matrix into F and thus, if

w \in F

there exists a unique

η

such that

w = X η

and

X {(X^{t} X)}^{- 1} X^{t} w = w

.

Since the family

P

is invariant under the action of

G

, it is natural to restrict our attention to the class of equivariant estimators

U = (U_{1}, U_{2})

of

(β, σ)

i.e. an estimator satisfying

U (g_{(a, H, c)} (y)) = \bar{g_{(a, H, c)}} (U (y))

for all

g_{(a, H, c)} \in G

.

Proposition 1.

Let

U

be an equivariant estimator of

(β, σ)

. Then

U

belongs to the family

{U_{λ}, λ \in R_{+}}

where,

U_{λ} (y) = ({(X^{t} X)}^{- 1} X^{t} y, λ {\{y^{t} (I_{n} - X {(X^{t} X)}^{- 1} X^{t}) y\}}^{1 / 2})

Proof. Let

U = (U_{1}, U_{2})

. The equivariance condition for

U

involves,

\begin{matrix} U_{1} (a Hy + c) & = & a {(X^{t} X)}^{- 1} X^{t} {H X U}_{1} (y) + {(X^{t} X)}^{- 1} X^{t} c \\ U_{2} (a Hy + c) & = & a U_{2} (y) \end{matrix}

for any

a > 0, H \in O_{F} (n), c \in F

.

Any

y \in R^{n}

can be written in a unique form as

y = y_{F} + y_{F^{⊥}}

where

y_{F} \in F

and

y_{F^{⊥}} \in F^{⊥}

. Specifically

y_{F} = X {(X^{t} X)}^{- 1} X^{t} y a n d y_{F^{⊥}} = (I_{n} - X {(X^{t} X)}^{- 1} X^{t}) y

(4)

If we choose

c = - a {Hy}_{F}

in the previous expressions, we obtain

\begin{matrix} U_{1} (a {Hy}_{F^{⊥}}) & = & a {(X^{t} X)}^{- 1} X^{t} {H X U}_{1} (y) - a {(X^{t} X)}^{- 1} X^{t} {H y}_{F} \end{matrix}

(5)

\begin{matrix} U_{2} (a {Hy}_{F^{⊥}}) & = & a U_{2} (y) \end{matrix}

(6)

First we focus on

U_{1}

. If we let

H^{*} = (I_{n} - 2 X {(X^{t} X)}^{- 1} X^{t})

, we have

H^{*} \in O_{F} (n)

with

H^{*} v = - v \forall v \in F a n d H^{*} v = v \forall v \in F^{⊥}

Now observe that (5) is satisfied for any

a > 0

and

H

in

O_{F} (n)

, in particular for

I_{n}

and

H^{*}

. Therefore

\begin{matrix} U_{1} (a y_{F^{⊥}}) & = & a U_{1} (y) - a {(X^{t} X)}^{- 1} X^{t} y_{F} \\ U_{1} (a H^{*} y_{F^{⊥}}) & = & - a U_{1} (y) + a {(X^{t} X)}^{- 1} X^{t} y_{F} \end{matrix}

But

a H^{*} y_{F^{⊥}} = a y_{F^{⊥}}

which leads to

0 = U_{1} (y) - {(X^{t} X)}^{- 1} X^{t} y_{F}

From (4),

U_{1} (y) = {(X^{t} X)}^{- 1} X^{t} y, y \in R^{n}

Next, we consider

U_{2}

. Let

a = 1

and

H = I_{n}

in (), it follows

U_{2} (y_{F^{⊥}}) = U_{2} (y)

(7)

Accordingly, it is enough to determine

U_{2}

on

F^{⊥}

. Let us take a unit vector

z \in F^{⊥}

such that

U_{2} (z) = λ > 0

. Then ().

U_{2} (Hz) = U_{2} (z)

for any

H \in O_{F} (n)

. Observe that any arbitrary unit vector in

F^{⊥}

can be written as

Hz

for a proper

H \in O_{F} (n)

. Therefore, for any

y_{F^{⊥}} \in F^{⊥}

,

y_{F^{⊥}} \neq 0

, we have

U_{2} (y_{F^{⊥}}) = ∥ y_{F^{⊥}} ∥ U_{2} (\frac{y_{F^{⊥}}}{∥ y_{F^{⊥}} ∥}) = ∥ y_{F^{⊥}} ∥ U_{2} (Hz) = ∥ y_{F^{⊥}} ∥ U_{2} (z) = λ ∥ y_{F^{⊥}} ∥

Observe also that if

y = 0

then

y_{F^{⊥}} = 0

and, choosing

c = 0

, we have that

U_{2} (0) = a U_{2} (0), \forall a \in R_{+}

. This implies

U_{2} (0) = 0

.

Finally, from (7) and (4) we have

U_{2} (y) = λ {\{y^{t} (I_{n} - X {(X^{t} X)}^{- 1} X^{t}) y\}}^{1 / 2}

□

Observe that the standard maximum likelihood estimator, MLE, for the present model is an equivariant estimator, with

λ = \frac{1}{\sqrt{n}}

.

3. Minimum Riemannian risk estimators

In the framework of intrinsic analysis, where the loss function is the square of the Rao distance, the Riemannian distance induced by the information metric in the parameter space

Θ

, once the class of the equivariant estimators has been determined a natural question arises: which is the equivariant estimator that minimizes the risk?

First of all, we summarize the basic geometric results corresponding to the model (1) which are going to be used hereafter. We are going to use a standardized version of the information metric, given by the usual information metric corresponding to this linear model divided by a constant factor n, i.e. the number of rows of matrix

X

. This metric is given by

d s^{2} = \frac{1}{n σ^{2}} (d β^{t} X^{t} X d β + 2 n {(d σ)}^{2}),

(8)

which is, up to a linear coordinate change, the Poincaré hyperbolic metric of the upper half space

R^{m} \times R^{+}

, see [13]. The Riemannian curvature

κ = - \frac{1}{2}

is constant and negative and the unique geodesic, parameterized by the arc–length, which connects two points

θ_{1} = (β_{1}, σ_{1})

and

θ_{2} = (β_{2}, σ_{2}) \in Θ

, when

β_{1} \neq β_{2}

, is given by:

\begin{matrix} β (s) & = & 2 n K^{- 2} tanh (\frac{s}{\sqrt{2}} + ϵ) {(X^{t} X)}^{- 1 / 2} C + D \\ σ (s) & = & K^{- 1} s e c h (\frac{s}{\sqrt{2}} + ϵ) \end{matrix}

(9)

where s is the arc–length,

C

and

D

are

m \times 1

vectors whose components, and also

ϵ

, are convenient real integration constants, such that

β (0) = β_{1}

,

β (ρ_{12}) = β_{2}

,

σ (0) = σ_{1}

,

σ (ρ_{12}) = σ_{2}

being

ρ_{12}

the Riemannian distance between

θ_{1}

and

θ_{2}

. Finally, K is given by

K^{2} = 2 n C^{t} C

. When

β_{1} = β_{2}

, the geodesic is given by

β (s) = D σ (s) = B e^{\pm \frac{s}{\sqrt{2}}}

(10)

where B is a positive integration constant.

The Rao distance

ρ

between the points

θ_{1}

and

θ_{2}

is

ρ_{12} \equiv ρ (θ_{1}, θ_{2}) = \sqrt{2} ln (\frac{1 + δ (θ_{1}, θ_{2})}{1 - δ (θ_{1}, θ_{2})}) = 2 \sqrt{2} a r c t a n h (δ (θ_{1}, θ_{2}))

(11)

where

δ (θ_{1}, θ_{2}) = {(\frac{d_{M}^{2} (β_{1}, β_{2}) + 2 \frac{{(σ_{1} - σ_{2})}^{2}}{σ_{1} σ_{2}}}{d_{M}^{2} (β_{1}, β_{2}) + 2 \frac{{(σ_{1} + σ_{2})}^{2}}{σ_{1} σ_{2}}})}^{1 / 2}

(12)

and

d_{M}^{2} (θ_{1}, θ_{2}) = \frac{1}{n σ_{1} σ_{2}} {(β_{1} - β_{2})}^{t} X^{t} X (β_{1} - β_{2})

(13)

or, equivalently,

ρ (θ_{1}, θ_{2}) = \sqrt{2} a r c c o s h (\frac{1}{4} d_{M}^{2} (θ_{1}, θ_{2}) + \frac{σ_{1}^{2} + σ_{2}^{2}}{2 σ_{1} σ_{2}})

(14)

Let

{e x p}_{θ_{1}}^{- 1} (θ_{2})

be the inverse of the exponential map corresponding to Levi-Civita connection and

W^{1}, \dots, W^{m}, W^{m + 1}

its components corresponding to the basis field

{(\partial / \partial β^{1})}_{θ_{1}},

\dots,

{(\partial / \partial β^{m})}_{θ_{1}}, {(\partial / \partial σ)}_{θ_{1}}

. Then, we have

\begin{matrix} W^{i} & = & \frac{ρ_{12} / \sqrt{2}}{sinh (ρ_{12} / \sqrt{2})} (β_{2}^{i} - β_{1}^{i}) \frac{σ_{1}}{σ_{2}}, i = 1, \dots, m \\ W^{m + 1} & = & \frac{ρ_{12} / \sqrt{2}}{sinh (ρ_{12} / \sqrt{2})} (cosh (ρ_{12} / \sqrt{2}) - \frac{σ_{1}}{σ_{2}}) σ_{1} \end{matrix}

(15)

It is well know that the Riemannian distance induced by the information metric is invariant under equivariant estimator transformations. We shall supply a direct and alternative proof for the linear model setting.

Proposition 2.

The Rao distance ρ given by (11) is invariant under the action of the induced group by

G

on the parameter space,

\bar{G}

. In other words

ρ (\bar{g_{(a, H, c)}} θ_{1}, \bar{g_{(a, H, c)}} θ_{2}) = ρ (θ_{1}, θ_{2})

Proof: Observe that

HX (β_{1} - β_{2}) \in F, \forall H \in O_{F} (n)

and taking into account that

X {(X^{t} X)}^{- 1} X^{t}

is the projection matrix into F, we have

X {(X^{t} X)}^{- 1} X^{t} HX (β_{1} - β_{2}) = HX (β_{1} - β_{2})

Therefore

{(β_{1} - β_{2})}^{t} X^{t} H^{t} X {(X^{t} X)}^{- 1} X^{t} HX (β_{1} - β_{2}) = {(β_{1} - β_{2})}^{t} X^{t} X (β_{1} - β_{2})

and the invariance of

δ

and

ρ

trivially follows. □

Proposition 3.

\bar{G}

acts transitively on Θ.

Proof: The transitivity follows observing that a is an arbitrary positive real number and

X {(X^{t} X)}^{- 1} X^{t}

is the projection matrix into F with

rank X = m

. □

Since

ρ

, and thus

ρ^{2}

, is invariant under the action of

\bar{G}

and

\bar{G}

acts transitively on

Θ

, the distribution of

ρ^{2} (U_{λ} (y), θ)

does not depend on

θ

, and therefore, the risk of any equivariant estimator remains constant and independent of the target parameter provided that this risk is finite. More precisely, observe that if we let

z = \frac{1}{σ} (y_{F} - X β), V = ∥ z ∥ a n d W = \frac{1}{σ} ∥ y_{F^{⊥}} ∥

(16)

from (1) and (4) we clearly have that

z \sim N_{n} (0, X {(X^{t} X)}^{- 1} X^{t})

with a rank m idempotent covariance matrix, and

V^{2}

and

W^{2}

are independent random variables following a chi-square distribution with m and

(n - m)

degrees of freedom, equal to the dimensions of F and

F^{⊥}

since

V^{2}

and

W^{2}

are quadratic forms based on the projection matrices on these subspaces of

R^{n}

and

y_{F}

(or

z

) and

y_{F^{⊥}}

are independent random vectors. Therefore, since

X U_{λ 1} (y) = y_{F}

and

U_{λ 2} (y) = λ ∥ y_{F^{⊥}} ∥

, we have that

d_{M}^{2} (U_{λ} (y), θ) = \frac{1}{n U_{λ 2} (y) σ} {(U_{λ 1} (y) - β)}^{t} X^{t} X (U_{λ 1} (y) - β) = \frac{V^{2}}{n λ W}

(17)

δ (U_{λ} (y), θ) = {(\frac{\frac{V^{2}}{n λ W} + 2 \frac{{(λ W - 1)}^{2}}{λ W}}{\frac{V^{2}}{n λ W} + 2 \frac{{(λ W + 1)}^{2}}{λ W}})}^{1 / 2}

(18)

and

ρ (U_{λ} (y), θ) = 2 \sqrt{2} a r c t a n h ({(\frac{V^{2} + 2 n {(λ W - 1)}^{2}}{V^{2} + 2 n {(λ W + 1)}^{2}})}^{1 / 2})

(19)

or

ρ (U_{λ} (y), θ) = \sqrt{2} a r c c o s h (\frac{1}{4} \frac{V^{2}}{n λ W} + \frac{1}{2} (λ W + \frac{1}{λ W}))

(20)

which have a distribution which depend only on

V^{2}

and

W^{2}

, independent random variables with fixed distribution, whatever the value of

θ

.

Since the risk of any equivariant estimator remains constant on the parameter space, it’s enough to examine it at one point, for instance at the point

(0, 1) \in Θ

. Let us denote the expectation with respect to the n–variate linear normal model

N_{n} (X β, σ^{2} I_{n})

by

E_{(β, σ)}

and by E the

E_{(0, 1)}

. We can prove the following propositions.

Proposition 4.

If

n \geq m + 1

we have:

E_{(β, σ)} (ρ^{2} (U_{λ} (y), (β, σ))) = E (ρ^{2} (U_{λ} (y), (0, 1))) < \infty, \forall λ > 0

for any

(β, σ) \in Θ

.

Proof: From (14) and (13), since

1 + \frac{1}{2!} \frac{ρ^{2}}{2} + \frac{1}{4!} \frac{ρ^{4}}{4} \leq cosh (\frac{ρ}{2}) = \frac{1}{4} d_{M}^{2} (θ_{1}, θ_{2}) + \frac{σ_{1}^{2} + σ_{2}^{2}}{2 σ_{1} σ_{2}}

(21)

we have

0 \leq ρ^{2} (θ_{1}, θ_{2}) \leq 12 (\sqrt{1 + \frac{d_{M}^{2} (θ_{1}, θ_{2})}{6} + \frac{1}{3} \frac{{(σ_{1} - σ_{2})}^{2}}{σ_{1} σ_{2}}} - 1)

(22)

developing the square of the difference and taking into account that the standard Euclidean norm of a vector is less or equal to the absolute value of the sum of its components, we obtain

0 \leq ρ^{2} (θ_{1}, θ_{2}) \leq 2 \sqrt{6} d_{M} (θ_{1}, θ_{2}) + 4 \sqrt{3} (\sqrt{\frac{σ_{1}}{σ_{2}}} + \sqrt{\frac{σ_{2}}{σ_{1}}}) + 4 \sqrt{3} - 12

(23)

Notice that both bounds (22) and (23) are invariant under the action of the induced group on the parameter space.

As we mentioned before, from [14], it is enough to prove that the risk is finite at

(0, 1) \in Θ

. Taking into account (16) it follows, from (23), that

0 \leq ρ^{2} (U_{λ} (y), (0, 1)) \leq 2 \sqrt{6} \sqrt{\frac{V^{2}}{λ W n}} + 4 \sqrt{3} (\sqrt{λ W} + \frac{1}{\sqrt{λ W}}) + 4 \sqrt{3} - 12

(24)

Observe that if Q has a chi-square distribution with k degrees of freedom

E (Q^{α}) = 2^{α} \frac{Γ (\frac{k}{2} + α)}{Γ (\frac{k}{2})} providedthat \frac{k}{2} + α > 0

Therefore, since

V^{2}

and

W^{2}

are independent random variables following a central chi-square distribution with m and

(n - m)

degrees of freedom, we have

E (V) = \sqrt{2} \frac{Γ (\frac{m + 1}{2})}{Γ (\frac{m}{2})}, E (\sqrt{W}) = \sqrt[4]{2} \frac{Γ (\frac{2 (n - m) + 1}{4})}{Γ (\frac{n - m}{2})} p r o v i d e d t h a t n - m > - 1 / 2

and

E (\frac{1}{\sqrt{W}}) = \frac{1}{\sqrt[4]{2}} \frac{Γ (\frac{2 (n - m) - 1}{4})}{Γ (\frac{n - m}{2})} p r o v i d e d t h a t n - m > 1 / 2

Then, taking the average, it follows that

\begin{matrix} E (ρ^{2} (U_{λ} (y), (0, 1))) & < & \frac{2 \sqrt{6}}{\sqrt{λ n}} E (V) E (\frac{1}{\sqrt{W}}) + \\ + 4 \sqrt{3} (\sqrt{λ} E (\sqrt{W}) + \frac{1}{\sqrt{λ}} E (\frac{1}{\sqrt{W}})) + 4 \sqrt{3} - 12 \\ < & + \infty, \forall λ > 0 and n - m > 1 / 2 \end{matrix}

Since n and m are positive integers being

n > m

, we conclude that the risk is finite if

n \geq m + 1

. □

Proposition 4 is a sufficient condition for the existence of the Riemannian risk of the equivariant estimator

U_{λ}

, thus

Φ (λ) = E (ρ^{2} (U_{λ} (y), (0, 1))), λ > 0,

(25)

is well defined for

n \geq m + 1

, which we shall assume hereafter.

Proposition 5.

There exists a unique minimizer

λ_{n m} > 0

of the Riemannian risk given by Φ.

Proof:

Let us consider the Riemannian risk at

λ = e^{\frac{s}{2 \sqrt{2}}}

as a function of s, that is

F (s) = Φ (e^{\frac{s}{2 \sqrt{2}}}), s \in R

The particular selection of

λ

, from which F follows, relies on the Riemannian structure of

Θ

induced by the information metric. The Riemannian curvature is constant and equal to

- \frac{1}{2}

and taking into account (10) we have that

γ (s) = ({(X^{t} X)}^{- 1} X^{t} y, e^{\frac{s}{\sqrt{2}}} {\{y^{t} (I_{n} - X {(X^{t} X)}^{- 1} X^{t}) y\}}^{1 / 2})

is a geodesic in

Θ

; precisely a geodesic parameterized by the arc–length, see [13] for further details.

Then, following [15], the real valued function

s \mapsto ρ^{2} (γ (s), (0, 1))

is strictly convex. Since almost surely convexity of a stochastic process carries over the mean of a process, the map F is strictly convex as well.

On the other hand, from Fatou’s Lemma

F (s) \to + \infty

as

s \to - \infty

or

s \to \infty

. This, together with the strict convexity of F yield the existence of a unique minimizer

s^{*}

of the function F, which depends on n and m.

Finally, since the map

s \mapsto e^{\frac{s}{2 \sqrt{2}}}

is a strictly monotonous function; must exist a unique

λ_{n m}

, namely

λ_{n m} = e^{\frac{s^{*}}{2 \sqrt{2}}}

, such that

Φ (λ_{n m}) = {min}_{λ > 0} Φ (λ)

. □

In fact, this result guarantees the unicity of the MIRE, although a numerical analysis is required to obtain it explicitly (see next section). It could be useful to develop a simple approximate estimator, that shall be referred hereafter a-MIRE, obtained, luckily, minimizing a convenient upper bound of

Φ (λ)

. Since

1 + \frac{1}{2!} \frac{ρ^{2}}{2} \leq cosh (\frac{ρ}{2}) = \frac{1}{4} d_{M}^{2} (θ_{1}, θ_{2}) + \frac{σ_{1}^{2} + σ_{2}^{2}}{2 σ_{1} σ_{2}}

(26)

we shall have

0 \leq ρ^{2} (U_{λ} (y), (0, 1)) \leq \frac{1}{n λ} \frac{V^{2}}{W} + 2 {(\sqrt{λ W} - \frac{1}{\sqrt{λ W}})}^{2}

(27)

and therefore

0 \leq Φ (λ) \leq H (λ) = \frac{1}{λ} \frac{Γ (\frac{n - m}{2} - \frac{1}{2})}{\sqrt{2} Γ (\frac{n - m}{2})} (\frac{m}{n} + 2) + 2 \sqrt{2} \frac{Γ (\frac{n - m}{2} + \frac{1}{2})}{Γ (\frac{n - m}{2})}

(28)

the upper bound

H (λ)

it is clearly a convex functions with an absolute minimum attained when

λ

satisfy

\tilde{λ_{n m}} = (\sqrt{1 + \frac{m}{2 n}}) \frac{1}{\sqrt{n - m - 1}}

(29)

Furthermore, given an arbitrary m, we have

lim_{n \to \infty} {(\tilde{λ_{n m}})}^{2} n = 1

(30)

and, therefore, a-MIRE is very close to MLE for large values of n. Observe also that it is possible to compute a-MIRE for

n > m + 1

, a condition which is slightly stronger that the result required for the existence of MIRE in proposition (4).

A further aspect is the intrinsic bias of the equivariant estimators. In fact connections between minimum risk, bias and invariance have been established, see [14]. Since the action of the group G is not commutative, we cannot guarantee the unbiasedness of the MIRE and an additional analysis must be performed. First of all we are going to compute the vector bias, see [4], a quantitative measure of the bias which is compatible with Lehmann results.

Let

A_{θ} (λ) = {e x p}_{θ}^{- 1} (U_{λ} (y))

and

A_{θ}^{1} (λ), \dots, A_{θ}^{m} (λ), A_{θ}^{m + 1} (λ)

be the components of

A_{θ} (λ)

corresponding to the basis field

{(\partial / \partial β^{1})}_{θ}, \dots,

{(\partial / \partial β^{m})}_{θ}

,

{(\partial / \partial σ)}_{θ}

. With matrix notation,

A_{θ 1} (λ) = {(A_{θ}^{1} (λ), \dots, A_{θ}^{m} (λ))}^{t}

. Furthermore, let us define

h (x) \equiv x / sinh x

for

x \neq 0

and

h (0) \equiv 0

; taking into account (16), (19) and from (11) and (15) we have

\begin{matrix} {XA}_{θ 1} (λ) & = & f_{λ} (V, W)) z \frac{σ}{λ W} \\ A_{θ}^{m + 1} (λ) & = & f_{λ} (V, W) g_{λ} (V, W) σ \end{matrix}

(31)

where

f_{λ} (V, W) \equiv h (ρ (U_{λ} (y), θ) / \sqrt{2})

and

g_{λ} (V, W) \equiv c o s h (ρ (U_{λ} (y), θ) / \sqrt{2}) - \frac{1}{λ W}

Let

B_{θ} (λ) = E_{θ} ({e x p}_{θ}^{- 1} (U_{λ} (y)))

be the intrinsic bias vector corresponding to an equivariant estimator

U_{λ} (y) = (U_{λ 1} (y), U_{λ 2} (y))

evaluated at the point

θ = (β, σ)

and let

B_{θ}^{1} (λ),

\dots,

B_{θ}^{m} (λ),

B_{θ}^{m + 1} (λ)

be their components. In matrix notation,

B_{θ 1} (λ) = {(B_{θ}^{1} (λ), \dots, B_{θ}^{m} (λ))}^{t}

. We have

Proposition 6.

If

n \geq m + 1

, the bias vector is finite and

\begin{matrix} B_{θ 1} (λ) & = & 0 \\ B_{θ}^{m + 1} (λ) & = & E (f_{λ} (V, W) g_{λ} (V, W)) σ \end{matrix}

(32)

where

V^{2}

and

W^{2}

are independent random variables following a chi-square distribution with m and

n - m

degrees of freedom respectively.

Moreover, the square of the norm of the bias vector is constant and given by

∥ B_{θ} {(λ) ∥}_{θ}^{2} = 2 {(E (f_{λ} (V, W) g_{λ} (V, W)))}^{2}

(33)

Proof: Observe that if

n \geq m + 1

we have

\begin{matrix} ∥ B_{θ} {(λ) ∥}_{θ}^{2} & \leq & E_{θ} (∥ {e x p}_{θ}^{- 1} (U_{λ} (y)) ∥_{θ})^{2} \leq E_{θ} (∥ {e x p}_{θ}^{- 1} (U_{λ} (y)) ∥_{θ}^{2}) \\ = & E_{θ} (ρ^{2} (U_{λ} (y), θ)) < \infty \end{matrix}

where

{∥ \cdot ∥}_{θ}

denotes the Riemannian norm at the tangent space at

θ

.

On the other hand taking into account (31) and defining

z

as in (16) observe that

V = ∥ z ∥ = ∥ - z ∥

,

z

is independent of

W

and

z

and

- z

has the same distribution. Then we have

\begin{matrix} X B_{θ 1} (λ) & = & E_{θ} (f_{λ} (V, W) z \frac{σ}{λ W}) \\ = & E_{θ} (f_{λ} (V, W) (- z) \frac{σ}{λ W}) = - E_{θ} (f_{λ} (V, W) z \frac{σ}{λ W}) = 0 \end{matrix}

and since

r a n k (X) = m

it follows that

B_{θ 1} (λ) = 0

.

B_{θ}^{m + 1} (λ)

is obtained directly from (31). The distribution of

z

and

W^{2}

follow from basic properties of multivariate normal distribution. Finally, the norm of the bias vector field follows from (32) and (8).

□

We may remark, finally, that the norm of the bias vector field of any equivariant estimator,

∥ B_{θ} {(λ) ∥}_{θ}

, is invariant under the action of the induced group,

\bar{G}

, on the parameter space and since this group acts transitively on

Θ

, this quantity must be constant, which is clear from (33).

4. Numerical Evaluation

In this section we are going to compare, numerically, the MIRE estimator with the standard MLE. Observe that both estimators only differ in the estimation of the parameter

σ

(or

σ^{2}

). Precisely, the MIRE and the MLE of

σ^{2}

are, respectively,

\hat{σ^{2}} = λ_{n m}^{2} y^{t} (I_{n} - X {(X^{t} X)}^{- 1} X^{t}) y {σ^{2}}^{*} = \frac{1}{n} y^{t} (I_{n} - X {(X^{t} X)}^{- 1} X^{t}) y

(34)

namely, they only differ by the factor

ξ_{n m} = λ_{n m}^{2} n

, since

\hat{σ^{2}} = ξ_{n m} {σ^{2}}^{*}

.

In order to compare MLE with the MIRE we have computed the factor

ξ_{n m}

, the intrinsic risk and the square of the norm of the bias vector for each estimators. All the computations have been performed using Mathematica 10.2.

Moreover, we have suggested a rather simple approximation for

ξ_{n m}

, or

λ_{n m}

, which allow us to approximate MIRE estimator through (29), i.e.

\tilde{ξ_{n m}} = (1 + \frac{m}{2 n}) \frac{n}{n - m - 1}

(35)

The corresponding estimator, that shall be referred hereafter as a-MIRE, has been also compared with MLE and MIRE in terms of intrinsic risk.

The results are summarized in the following figures which summarize the tables given in the Appendix, see [16] for a convincing argument for the use of plots over tables. In Figure 1 numerical results of

ξ_{n m}

(left) and

Φ (λ_{n m}) = Φ (\sqrt{ξ_{n m} / n})

(right) are displayed graphically. Observe that, for m fixed, as n increases

Φ (\sqrt{ξ_{n m} / n})

goes to zero and

ξ_{n m}

goes to one. The exact numerical values are given in the Appendix, Table 1.

Figure 1. Numerical results for

ξ_{n m}

(left) and

Φ (λ_{n m})

(right), with

ξ_{n m} = λ_{n m}^{2} n

.

Figure 1. Numerical results for

ξ_{n m}

(left) and

Φ (λ_{n m})

(right), with

ξ_{n m} = λ_{n m}^{2} n

.

Figure 2. Numerical results for

ξ_{n m}

(left) and

Φ (λ_{n m})

(right), with

ξ_{n m} = λ_{n m}^{2} n

.

Figure 2. Numerical results for

ξ_{n m}

(left) and

Φ (λ_{n m})

(right), with

ξ_{n m} = λ_{n m}^{2} n

.

Figure 3 shows graphically the percentage of intrinsic risk increment for MLE (left) and a-MIRE (right), that is

100 \frac{(Φ (\frac{1}{\sqrt{n}}) - Φ (λ_{n m}))}{Φ (λ_{n m})} and 100 \frac{(Φ (\tilde{λ_{n m}}) - Φ (λ_{n m}))}{Φ (λ_{n m})}

(36)

respectively, for some values of n and m. The exact numerical results are given in the Appendix, Table 2.

Observe that if we approximate the MIRE by the MLE for a certain value of m, the relative difference of risk decreases as n increases. Notice that these relative differences of risks are rather moderate or small (about 10-15%) only if

n > 10 m

. Let us remark also the non appropriate behavior of the MLE for small values of

n - m

. This regards to the intrinsic risk increment: when we use MLE instead of MIRE this increment oscillates between

80, 5 %

and

251, 1 %

when

n - m = 1

with m from 1 to 10. On the other hand, the behavior of the a-MIRE is reasonably good, with its intrinsic risk very similar to the risk of the MIRE estimator: here the percentage on risk increment is less than

1 %

for all studied cases and also this percentage decreases as n increases. In fact this percentage is lower than 1‰ when

n - m \geq 4

, which indicates the extraordinary degree of approximation, being therefore the a-MIRE a reasonable and useful approximation of MIRE. In particular as an example, for a two way analysis of variance, with a and b levels for factors A and B respectively, with a single replicate for treatment we shall have

\tilde{λ} = (\sqrt{1 + \frac{a + b - 1}{2 a b}}) \frac{1}{\sqrt{a b - a - b}}

(37)

with

a > 1

and

b > 1

, while the corresponding quantity for MLE is

λ^{*} = \frac{1}{\sqrt{a b}}

.

In particular if

a = 3

and

b = 5

we shall have

\tilde{λ} = (\sqrt{1 + \frac{7}{30}}) \frac{1}{\sqrt{7}} = 0, 419750

and

λ^{*} = \frac{1}{\sqrt{15}} = 0, 258199

, which is sensibly different. At this moment, it could be useful to recall that the quadratic loss and the squared of the Riemannian distance behaves very different, see [5].

Figure 3. Percentage of intrinsic risk increment for MLE (left) and a-MIRE (right).

Figure 4. Percentage of intrinsic risk increment for MLE (left) and a-MIRE (right).

Figure 5 displays graphically the numerical results of the

m + 1

component of the bias vector divided by

σ

, i.e. the unique non zero physical component of this vector field, for MIRE (left) and MLE (right),

\frac{1}{σ} B_{θ}^{m + 1} (λ_{n m})

and

\frac{1}{\sqrt{n} σ} B_{θ}^{m + 1} (\frac{1}{\sqrt{n}})

respectively, for some values of n and m. Observe the sign differences, meaning that MIRE overestimates, in average,

σ

while MLE underestimates this quantity, in average as well. This follows from the equation of the geodesics of the present model (9). The exact numerical values are given in the Appendix, Table 3.

Figure 5. Numerical results for the unique non–null physical component of bias vector field

\frac{1}{σ} B_{θ}^{m + 1} (λ)

, for MIRE (left, with

λ = λ_{n m}

) and MLE (right, with

λ = 1 / \sqrt{n}

).

Figure 5. Numerical results for the unique non–null physical component of bias vector field

\frac{1}{σ} B_{θ}^{m + 1} (λ)

, for MIRE (left, with

λ = λ_{n m}

) and MLE (right, with

λ = 1 / \sqrt{n}

).

Figure 6. Numerical results for the unique non–null physical component of bias vector field

\frac{1}{σ} B_{θ}^{m + 1} (λ)

, for MIRE (left, with

λ = λ_{n m}

, positive values) and MLE (right, with

λ = 1 / \sqrt{n}

, negative values).

Figure 6. Numerical results for the unique non–null physical component of bias vector field

\frac{1}{σ} B_{θ}^{m + 1} (λ)

, for MIRE (left, with

λ = λ_{n m}

, positive values) and MLE (right, with

λ = 1 / \sqrt{n}

, negative values).

Figure 7 shows graphically the numerical results of the percentage of intrinsic risk due to bias for MIRE and MLE, that is

100 \frac{{∥ B ∥}_{θ}^{2} (λ_{n m})}{Φ (λ_{n m})} a n d 100 \frac{{∥ B ∥}_{θ}^{2} (\frac{1}{\sqrt{n}})}{Φ (λ_{n m})}

(38)

respectively, for some values of n and m. Observe that the bias is moderate, with respect the intrinsic risk, in both estimators. The bias of MIRE estimator is smaller than the bias of MLE for small values of m and the opposite for large values. The exact numerical results are given in the Appendix, Table 4.

Figure 7. Percentage of intrinsic risk due to bias for MIRE (left) and MLE (right), referred to the risk of MIRE.

Figure 8. Percentage of intrinsic risk due to bias for MIRE (left) and MLE (right), referred to the risk of MIRE.

Acknowledgements: We have to thank the referees and the editor comments and suggestions for the improvement of this paper.

Appendix

References

Rao, C. Information and accuracy attainable in the estimation of statistical parameters. Bull. Calcutta Math. Soc. 1945, 37, 81–91. [Google Scholar]
Burbea, J.; Rao, C. Entropy differential metric, distance and divergence measures in probability spaces: a unified approach. J. Multivar. Anal. 1982, 12, 575–596. [Google Scholar] [CrossRef]
Burbea, J. Informative geometry of probability spaces. Expo. Math. 1986, 4, 347–378. [Google Scholar]
Oller, J.; Corcuera, J. Intrinsic Analysis of the Statistical Estimation. Ann. Stat. 1995, 23, 1562–1581. [Google Scholar] [CrossRef]
García, G.; Oller, J. What does intrinsic mean in statistical estimation? Sort 2006, 2, 125–146. [Google Scholar]
Bernal-Casas, D.; Oller, J. Variational Information Principles to Unveil Physical Laws. Mathematics 2024, 12, 3941. [Google Scholar] [CrossRef]
Muirhead, R. Aspects of Multivariate Statistical Theory; John Wiley & Sons Inc.: New York, NY, USA, 1982. [Google Scholar]
Bernardo, J.; Juárez, M. Intrinsic estimation. In Bayesian Statistics 7; Bernardo, J., Bayarri, M., Berger, J., Dawid, A., Hackerman, W., Smith, A., West, M., Eds.; Oxford University Press: Berlin, Germany, 2003; pp. 465–476. [Google Scholar]
Lehmann, E.; Casella, G. Theory of Point Estimation; Springer Science & Business Media: New York, NY, USA, 2006. [Google Scholar]
ichi Amari, S. Information Geometry and Its Applications; Springer: Tokyo, 2016. [Google Scholar]
Ay, N.; Jost, J.; Lê, H.V.; Schwachhöfer, L. Information Geometry; Springer: Cham, 2017. [Google Scholar]
Nielsen, F. An Elementary Introduction to Information Geometry. Entropy 2020, 22, 1100. [Google Scholar] [CrossRef] [PubMed]
Burbea, J.; Oller, J. The information metric for univariate linear elliptic models. Stat. Decis. 1988, 6, 209–221. [Google Scholar] [CrossRef]
Lehmann, E. A general concept of unbiasedness. Ann. Math. Stat. 1951, 22, 587–592. [Google Scholar] [CrossRef]
Karcher, H. Riemannian Center of Mass and Mollifier Smoothing. Commun. Pure Appl. Math. 1977, 30, 509–541. [Google Scholar] [CrossRef]
Gelman, A.; Pasarica, C.; Dodhia, R. Let’s practice what we preach: turning tables into graphs. Am. Stat. 2002, 56, 121–130. [Google Scholar] [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Intrinsically Optimal Equivariant Estimator for the Univariate Linear Normal Models

Abstract

Keywords:

Subject:

1. Introduction

2. Equivariant estimators for linear models

3. Minimum Riemannian risk estimators

4. Numerical Evaluation

Appendix

References

MDPI Initiatives

Important Links

Subscribe