Cartan-Schouten Metrics for Information Geometry and Machine Learning

Andre Diatta; Bakary Manga; Fatimata Sy

doi:10.20944/preprints202408.2120.v1

Submitted:

28 August 2024

Posted:

29 August 2024

You are already at the latest version

Abstract

We study Cartan-Schouten metrics, explore invariant dual connections, and propose them as models for Information Geometry. Based on the underlying Riemannian barycenter and the biinvariant mean of Lie groups, we subsequently propose a new parametric mean for data science and machine learning which comes with several advantages compared to traditional tools such as the arithmetic mean, median, mode, expectation, least square method, maximum likelihood, linear regression. We call a metric on a Lie group, a Cartan-Schouten metric, if its Levi-Civita connection is biinvariant, so every 1-parameter subgroup through the unit is a geodesic. Except for not being left or right invariant in general, Cartan-Schouten metrics enjoy the same geometry as biinvariant metrics, since they share the same Levi-Civita connection. To bypass the non-invariance apparent drawback, we show that Cartan-Schouten metrics are completely determined by their value at the unit. We give an explicit formula for recovering them from their value at the unit, thus making them much less computationally demanding, compared to general metrics on manifolds. Furthermore, Lie groups with Cartan-Schouten metrics are complete Riemannian or pseudo-Riemannian manifolds. We give a complete characterization of Lie groups with Riemannian or Lorentzian Cartan-Schouten metrics. Cartan-Schouten metrics are in abundance on 2-nilpotent Lie groups. Namely, on every 2-nilpotent Lie group, there is a 1-1 correspondence between the set of left invariant metrics and that of Cartan-Schouten metrics.

Keywords:

Cartan-Schouten metric

;

dual connections

;

$\alpha$-connections

;

Fisher information matrix

;

biinvariant metric

;

Lorentz metric

;

exponential barycenter

;

center of mass

;

Amari-Chentov 3-tenor

;

Machine Learning

;

information geometry

Subject:

Computer Science and Mathematics - Mathematics

1. Introduction

Information geometry is a field that sets the geometric framework for a deeper understanding of information theory. It applies the concepts and techniques of differential geometry to statistics and probability theory. Namely, families of probability distributions are studied as Riemannian manifolds (the Riemannian metric, say

μ,

being for instance the Fisher information metric) provided with some additional structure, a connection such that the covariant derivative of the Riemannian metric is a totally symmetric 3-tensor (the Amari-Chentov tensor). Then a parameter family of pairs of connections

(\nabla^{α}, \nabla^{- α})

which are mutually dual with respect to the metric can be deduced such that the covariant derivatives

\nabla^{α} μ,

\nabla^{- α} μ

are totally symmetric 3-tensors. Conversely any totally symmetric 3-tensor gives rise to such connections. See Section 2.1.

Generalizing the above, a statistical manifold is a Riemannian or pseudo-Riemannian manifold

(M, μ)

, together with a (locally) flat torsion free linear connection

\nabla^{1}

such that the covariant derivative

\nabla^{1} μ = : S

is a totally symmetric

(0, 3)

-tensor. There is a unique connection

\nabla^{- 1}

dual to

\nabla^{1}

with respect to

μ .

The mutual duality is equivalent to the following

\begin{matrix} Z \cdot μ (X, Y) = μ (\nabla_{Z}^{1} X, Y) + μ (X, \nabla_{Z}^{- 1} Y), \end{matrix}

(1)

for any vector fields

X, Y

on

M .

Letting ∇ stand for the Levi-Civita connection of

μ,

one deduces a parameter family of pairs of mutually dual torsion free (locally) flat connections

(\nabla^{α}, \nabla^{- α}),

α \in R,

by

\begin{matrix} μ (\nabla_{X}^{1} Y, Z) & : = & μ (\nabla_{X} Y, Z) - \frac{1}{2} S (X, Y, Z) \\ μ (\nabla_{X}^{- 1} Y, Z) & : = & μ (\nabla_{X} Y, Z) + \frac{1}{2} S (X, Y, Z) \end{matrix}

(2)

such that the covariant derivatives

\nabla^{α} μ = - \nabla^{- α} μ

are totally symmetric [20,24]. Such a generalization is not a mere mathematical theory, since it has indeed been proved that every statistical manifold corresponds to a statistical model [21,23]. The notion of dual connections, which was borrowed from affine geometry (see e.g.[20,25]) and introduced into information geometry by Amari [2], is now widely used as a key tool in the applications of information geometry. While metrics such as the Fisher information metric and divergences like the Kullback-Liebler divergence provide ways of measuring distances or differences between different probability distributions, geodesics appear as the shortest distances between probability distributions. All these tools strengthen the central role of information geometry in data science (see e.g.[1,2]).

In the present work, we study Cartan-Schouten metrics and propose them as models for information geometry. A Cartan-Schouten metric on a Lie group G, is a metric whose Levi-Civita connection is the canonical biinvariant Cartan-Schouten connection [7], defined by

\nabla_{x} y = \frac{1}{2} [x, y],

for any

x, y

in the Lie algebra

G

of G. So every 1-parameter subgroup through the unit is a geodesic. Lie groups with Cartan Schouten metrics are geodesically (hence metrically) complete manifolds as well as realizing the 1-parameter subgroups as geodesics, which allows for a natural way of doing statistics and information geometry, using their intrinsic connection (just as is the case for biinvariant metrics, see [29,30,33]). In low dimension, Lie groups with a Cartan-Schouten metric have been classified, up to dimension 6, in [19,36,41]. The study of Cartan-Schouten metrics is also motivated by the use of Riemannian statistics on Lie groups in the traditional fields such as Machine Learning, Anatomy ([29,30]). Here are the main results and organization of the paper. In Section 2, we delve into the discussions on statistical manifolds and information geometry in details, directly deriving the Fisher information metric, dual connections, etc. We also set up the framework, the definitions and discuss the motivations. Another interesting feature is that, although Cartan-Schouten metrics are in general not left nor right invariant, their geometry is biinvariant, since it is conveyed by their Levi-Civita connection which is biinvariant. More interestingly, we provide a formula that uses the sole value of any Cartan-Schouten metric at the unit (neutral element) to give its value everywhere, just like for left (or right) invariant metrics (see Theorem 9 and Theorem 11). This allows to bypass the noninvariance drawback and makes Cartan-Schouten metrics computationally more attractive. In Theorem 4, we describe solvable non-Abelian Lie groups with a Riemannian Cartan-Schouten metric. More precisely we prove that a solvable non-Abelian Lie group admits a positive (or negative) definite Cartan-Schouten metric if and only if it is 2-nilpotent. Section 3.3 is devoted to the discussion and complete characterization of Lie groups with a Cartan-Schouten metric

μ

of Lorentz type (Lorentzian, for short), that is,

μ

is of signature

(1, n - 1) .

Unlike biinvariant Lorentzian metrics which are rather scarce (roughly speaking, they only exit on oscillator groups and on the special linear group SL(2) [26]), Lie groups with Lorentzian Cartan-Schouten metrics are in abundance (see Theorem 7, Theorem 8 and Theorem Section 3.5). We expect Lorentzian Cartan-Schouten metrics studied in Section 3.3 to offer good models for singular statistical learning theory as in [43], and applications to relativity or quantum physics. In any 2-nilpotent Lie group, Cartan-Schouten metrics can have any desired signature, and in fact, they are as many as metrics (of any signature) on the corresponding Lie algebra, see Theorem 10 and Theorem 11. And yet, all these Cartan-Schouten metrics share the same Levi-Civita connection and hence the same geometry. However, they may infer different statistics, since for a fixed totally symmetric 3-tensor taken to be the Amari-Chentsov tensor, the corresponding parameter family of dual

α

-connections

(\nabla^{α}, \nabla^{- α})

, depends of the chosen Cartan-Schouten metric. When the connection coincides with the Levi-Civita connection of the metric, then the information geometry coincides with the Riemannian statistics (or statistics in Riemannian manifolds), which is a field in mathematics. In Theorem 12 we describe all Cartan-Schouten metrics on H-type Carnot groups and supply their explicit expressions. Note that in 2-nilpotent Lie groups, Cartan-Schouten metrics have very simple expressions. Indeed, in global affine (exponential) coordinates, all their coefficients are polynomials of degree 2. We explore and describe left invariant and biinvariant dual connections and statistics in Section 4, see Proposition 6, Proposition 7, Theorem 13. In Section 5, we propose a new model of parametric mean for information geometry, data science and machine learning. On the one hand, it is the common Riemannian barycenter of all the Cartan-Schouten metrics on Carnot groups of H-type. On the other hand, it is also the biinvariant exponential barycenter of those Carnot groups of H-type. So it is at the interplay between Riemannian statistics and Lie group invariant statistic. Furthermore, it combines the arithmetic mean and the variance, and enjoys a manifold of parameters to choose from.

We expect this work to foster news routes of research and applications in several areas of science and technology, within the scope of applications of Information Geometry.

Throughout the present work, unless explicitly stated, the word metric refers to both Riemannian and pseudo-Riemannian metrics. We will let

δ_{j, k}

stand for the Kronecker symbol, with

δ_{j, k} = 1

if

k = j

and 0 otherwise. Unless otherwise explicitly stated, ∇ often stands for the canonical Cartan-Schouten connection, with

\nabla_{x} y : = \frac{1}{2} [x, y]

for

x, y

in the Lie algebra at hand.

2. On Information Geometry on Lie Groups

2.1. Fisher Information Metric, Amari-Chentsov 3-Tensor, $α$ -Connections

Let

Ω

be a mesurable subset of

R^{m}

and

U

a domain in

R^{n} .

In

Ω

, consider a familly of probabilities

p (-, θ) : Ω \to R,

parametrized by

θ = (θ_{1}, \dots θ_{n}) \in U

, such that the following hold : (1) the family

p (-, θ)

is smooth with respect to

θ,

(2)

p (x, θ) > 0,

for any

x \in Ω

and any

θ \in U

and (3)

\int_{Ω} p (x, θ) d x = 1,

for any

θ \in U .

Here we have let

\int_{Ω} p (x, θ) d x

also stand for the sum

\sum_{x \in Ω} p (x, θ)

when

Ω

is a discrete set. The Fisher information matrix associated to the familly

p (-, θ)

is the symmetric matrix

[μ_{i, j} (θ)]

given by

\begin{matrix} μ_{i j} (θ) & : = & \int_{Ω} (\frac{\partial}{\partial θ_{i}} log p (x, θ)) (\frac{\partial}{\partial θ_{j}} log p (x, θ)) p (x, θ) d x \\ = & E_{θ} (\frac{\partial}{\partial θ_{i}} log p (x, θ) \frac{\partial}{\partial θ_{j}} log p (x, θ)), \end{matrix}

(3)

where

E_{θ} (f) : = \int_{Ω} f p (x, θ) d x

is the expectation of

f : Ω \to R,

with respect to

p (-, θ) .

One notes that the matrix

[μ_{i, j} (θ)]

is positive semi-definite, since for any

Y = (y_{1}, \dots, y_{n})

in

R^{n}

, one has:

\begin{matrix} Y [μ_{i, j} (θ)] Y^{T} & = & \sum_{i, j = 1}^{n} μ_{i j} (θ) y_{i} y_{j} \\ = & \int_{Ω} (\sum_{i = 1}^{n} y_{i} \frac{\partial}{\partial θ_{i}} log p (x, θ)) (\sum_{j = 1}^{n} y_{j} \frac{\partial}{\partial θ_{j}} log p (x, θ)) p (x, θ) d x \\ = & E_{θ} {(\sum_{i} y_{i} \frac{\partial}{\partial θ_{i}} log p (x, θ))}^{2} \geq 0 . \end{matrix}

(4)

Hence

μ_{i j} (θ)

gives rise to a (possibly pseudo-) Riemannian metric

μ

defined as

μ (x, y) = X [μ_{i, j} (θ)] Y^{T},

for any tangent vector field

x, y

with (local) components

X, Y \in R^{n}

. The metric (3) is called the Fisher metric on

M

, when it is definite positive [2,20,25].

We further suppose that the derivation with respect to

θ

and the integration with respect to x commute. The latter property is always satisfied under a few conditions. Differentiating the equality

\int_{Ω} p (x, θ) d x = 1,

gives

\begin{matrix} 0 & = & \frac{\partial}{\partial θ_{i}} \int_{Ω} p (x, θ) d x = \int_{Ω} \frac{\partial}{\partial θ_{i}} p (x, θ) d x = \int_{Ω} (\frac{\partial}{\partial θ_{i}} log p (x, θ)) p (x, θ) d x \\ = & E_{θ} (\frac{\partial}{\partial θ_{i}} log p (x, θ)) . \end{matrix}

(5)

The second derivatives lead to

\begin{matrix} 0 & = & \frac{\partial^{2}}{\partial θ_{j} \partial θ_{i}} \int_{Ω} p (x, θ) d x = \int_{Ω} (\frac{\partial^{2}}{\partial θ_{j} \partial θ_{i}} log p (x, θ)) p (x, θ) d x \\ + & \int_{Ω} (\frac{\partial}{\partial θ_{j}} log p (x, θ) \frac{\partial}{\partial θ_{i}} log p (x, θ)) p (x, θ) d x \\ = & E_{θ} (\frac{\partial^{2}}{\partial θ_{j} \partial θ_{i}} log p (x, θ)) + μ_{i j} (θ) . \end{matrix}

(6)

Hence, one gets the following

\begin{matrix} μ_{i j} (θ) = - E_{θ} (\frac{\partial^{2}}{\partial θ_{j} \partial θ_{i}} log p (x, θ)) . \end{matrix}

(7)

Differentiating (3) with respect to

θ_{k},

one gets

\begin{matrix} \frac{\partial}{θ_{k}} μ_{i j} (θ) & : = & \int_{Ω} (\frac{\partial^{2}}{\partial θ_{k} \partial θ_{i}} log p (x, θ)) (\frac{\partial}{\partial θ_{j}} log p (x, θ)) p (x, θ) d x \\ + \int_{Ω} (\frac{\partial}{\partial θ_{i}} log p (x, θ)) (\frac{\partial^{2}}{\partial θ_{k} \partial θ_{j}} log p (x, θ)) p (x, θ) d x \\ + \int_{Ω} (\frac{\partial}{\partial θ_{i}} log p (x, θ)) (\frac{\partial}{\partial θ_{j}} log p (x, θ)) (\frac{\partial}{\partial θ_{k}} log p (x, θ)) p (x, θ) d x . \end{matrix}

(8)

Now we re-write (8) as

\begin{matrix} \frac{\partial}{θ_{k}} μ_{i j} (θ) & = & E_{θ} (\frac{\partial^{2}}{\partial θ_{k} \partial θ_{i}} log p (x, θ) \frac{\partial}{\partial θ_{j}} log p (x, θ)) \\ + E_{θ} (\frac{\partial}{\partial θ_{i}} log p (x, θ) \frac{\partial^{2}}{\partial θ_{k} \partial θ_{j}} log p (x, θ)) + S_{i, j k} (θ), \end{matrix}

(9)

where S stands for the Amari-Chentsov symmetric 3-tensor, with coefficients

\begin{matrix} S_{i, j k} (θ) : = E_{θ} (\frac{\partial}{\partial θ_{i}} log p (x, θ) \frac{\partial}{\partial θ_{j}} log p (x, θ) \frac{\partial}{\partial θ_{k}} log p (x, θ)) . \end{matrix}

(10)

Bringing Levi-Civita connection ∇ of

μ

into play, we use the formula

\begin{matrix} \frac{\partial}{\partial θ_{i}} μ_{j k} & = & μ (\nabla_{\frac{\partial}{\partial θ_{i}}} \frac{\partial}{\partial θ_{j}}, \frac{\partial}{\partial θ_{k}}) + μ (\frac{\partial}{\partial θ_{j}}, \nabla_{\frac{\partial}{\partial θ_{i}}} \frac{\partial}{\partial θ_{k}}), \end{matrix}

(11)

to deduce the following

\begin{matrix} \frac{\partial}{\partial θ_{i}} μ_{j k} + \frac{\partial}{\partial θ_{j}} μ_{i k} - \frac{\partial}{\partial θ_{k}} μ_{i j} & = & 2 μ (\nabla_{\frac{\partial}{\partial θ_{i}}} \frac{\partial}{\partial θ_{j}}, \frac{\partial}{\partial θ_{k}}) . \end{matrix}

(12)

Plugging (9) into the left hand side of (12) we get

\begin{matrix} E_{θ} (\frac{\partial^{2}}{\partial θ_{k} \partial θ_{i}} log p (x, θ) \frac{\partial}{\partial θ_{j}} log p (x, θ)) = μ (\nabla_{\frac{\partial}{\partial θ_{i}}} \frac{\partial}{\partial θ_{j}}, \frac{\partial}{\partial θ_{k}}) - \frac{1}{2} S_{i j k} . \end{matrix}

(13)

The right hand side of (13) defines a connection

\nabla^{1}

by

\begin{matrix} μ (\nabla_{\frac{\partial}{\partial θ_{i}}}^{1} \frac{\partial}{\partial θ_{j}}, \frac{\partial}{\partial θ_{k}}) = μ (\nabla_{\frac{\partial}{\partial θ_{i}}} \frac{\partial}{\partial θ_{j}}, \frac{\partial}{\partial θ_{k}}) - \frac{1}{2} S_{i j k}, \end{matrix}

(14)

i, j, k = 1, \dots, n .

Since S is symmetric,

\nabla^{1}

is torsion-free. The torsion-free connection

\nabla^{- 1}

defined by

\begin{matrix} μ (\nabla_{\frac{\partial}{\partial θ_{i}}}^{- 1} \frac{\partial}{\partial θ_{j}}, \frac{\partial}{\partial θ_{k}}) = μ (\nabla_{\frac{\partial}{\partial θ_{i}}} \frac{\partial}{\partial θ_{j}}, \frac{\partial}{\partial θ_{k}}) + \frac{1}{2} S_{i j k}, \end{matrix}

(15)

satisfies

\begin{matrix} μ (\nabla_{\frac{\partial}{\partial θ_{i}}}^{1} \frac{\partial}{\partial θ_{j}}, \frac{\partial}{\partial θ_{k}}) + μ (\nabla_{\frac{\partial}{\partial θ_{i}}}^{- 1} \frac{\partial}{\partial θ_{j}}, \frac{\partial}{\partial θ_{k}}) = 2 μ (\nabla_{\frac{\partial}{\partial θ_{i}}} \frac{\partial}{\partial θ_{j}}, \frac{\partial}{\partial θ_{k}}) \end{matrix}

(16)

and

\begin{matrix} μ (\nabla_{\frac{\partial}{\partial θ_{i}}}^{1} \frac{\partial}{\partial θ_{j}}, \frac{\partial}{\partial θ_{k}}) + μ (\frac{\partial}{\partial θ_{j}}, \nabla_{\frac{\partial}{\partial θ_{i}}}^{- 1} \frac{\partial}{\partial θ_{k}}) & = & μ (\nabla_{\frac{\partial}{\partial θ_{i}}} \frac{\partial}{\partial θ_{j}}, \frac{\partial}{\partial θ_{k}}) + μ (\frac{\partial}{\partial θ_{j}}, \nabla_{\frac{\partial}{\partial θ_{i}}} \frac{\partial}{\partial θ_{k}}) \\ = & \frac{\partial}{\partial θ_{i}} μ_{j k} . \end{matrix}

(17)

Equality (16) is equivalent to

\begin{matrix} \nabla = \frac{1}{2} (\nabla^{1} + \nabla^{- 1}) \end{matrix}

(18)

whereas Equality (17) is equivalent to

\begin{matrix} X \cdot μ (Y, Z) = μ (\nabla_{X}^{1} Y, Z) + μ (Y, \nabla_{X}^{- 1} Z), \end{matrix}

(19)

for any vector fields

X, Y, Z

on

M .

More generally, the torsion-free connections

\nabla^{α},

\nabla^{- α},

α \in R,

defined for any

i, j, k = 1, \dots, n,

by

\begin{matrix} μ (\nabla_{\frac{\partial}{\partial θ_{i}}}^{α} \frac{\partial}{\partial θ_{j}}, \frac{\partial}{\partial θ_{k}}) & = & μ (\nabla_{\frac{\partial}{\partial θ_{i}}} \frac{\partial}{\partial θ_{j}}, \frac{\partial}{\partial θ_{k}}) - \frac{α}{2} S_{i j k}, \\ μ (\nabla_{\frac{\partial}{\partial θ_{i}}}^{- α} \frac{\partial}{\partial θ_{j}}, \frac{\partial}{\partial θ_{k}}) & = & μ (\nabla_{\frac{\partial}{\partial θ_{i}}} \frac{\partial}{\partial θ_{j}}, \frac{\partial}{\partial θ_{k}}) + \frac{α}{2} S_{i j k} \end{matrix}

(20)

are re mutually dual with respect to

μ,

that is,

\begin{matrix} X \cdot μ (Y, Z) = μ (\nabla_{X}^{α} Y, Z) + μ (Y, \nabla_{X}^{- α} Z), \end{matrix}

(21)

for any vector fields

X, Y, Z

on

M,

and they satisfy

\nabla = \frac{1}{2} (\nabla^{α} + \nabla^{- α}),

for any

α \in R .

Furthermore, the 3-tensors

\nabla^{α} μ

and

\nabla^{- α} μ

are totally symmetric. More precisely, we have the following identities

\begin{matrix} \nabla^{α} μ (X, Y, Z) & = & X \cdot μ (Y, Z) - μ (\nabla_{X}^{α} Y, Z) - μ (Y, \nabla_{X}^{α} Z) \\ = & X \cdot μ (Y, Z) - (μ (\nabla_{X} Y, Z) - \frac{α}{2} S (X, Y, Z)) \\ - (μ (Y, \nabla_{X} Z) - \frac{α}{2} S (X, Y, Z)) = α S (X, Y, Z) \end{matrix}

(22)

and

\begin{matrix} \nabla^{- α} μ (X, Y, Z) & = & X \cdot μ (Y, Z) - μ (\nabla_{X}^{- α} Y, Z) - μ (Y, \nabla_{X}^{- α} Z) \\ = & X \cdot μ (Y, Z) - (μ (\nabla_{X} Y, Z) + \frac{α}{2} S (X, Y, Z)) \\ - (μ (Y, \nabla_{X} Z) + \frac{α}{2} S (X, Y, Z)) = - α S (X, Y, Z) . \end{matrix}

(23)

Along these lines, one defines a statistical manifold as a triplet

(M, μ, \nabla^{1})

where M is a differential manifold,

μ

and

\nabla^{1}

are respectively a Riemannian or pseudo-Riemannian metric and a torsion free linear connection on M such that the covariant derivative

\nabla^{1} μ

of

μ

with respect to

\nabla^{1},

is totally symmetric. The pair

(μ, \nabla^{1})

is then also termed a Codazzi structure or a statistics on

M,

while

μ

and

\nabla^{1}

are said to be compatible. The totally symmetric 3-tensor

S : = \nabla^{1} μ

is termed the Amari-Chentsov tensor. There is a unique torsion free affine connection

\nabla^{- 1}

on

M,

such that

\begin{matrix} X \cdot μ (Y, Z) = μ (\nabla_{X}^{1} Y, Z) + μ (Y, \nabla_{X}^{- 1} Z), \end{matrix}

(24)

for any vector fields

X, Y, Z \in X (M)

. Two connections

\nabla^{1}

and

\nabla^{- 1}

satisfying (24), are said to be mutually dual (or just dual, for short) with respect to

μ .

Conversely, in a (pseudo-) Riemannian manifold (

M, μ

) with Levi-Civita (LC, for short) connection

\nabla,

consider a totally symmetric covariant 3-tensor S. The tensor field A defined by

\begin{matrix} μ (A (X, Y), Z) : = S (X, Y, Z), \end{matrix}

(25)

for any

X, Y, Z \in X (M),

gives rise to the two torsion free linear connections

\nabla^{1} : = \nabla - \frac{1}{2} A

and

\nabla^{- 1} : = \nabla + \frac{1}{2} A

, which satisfy (24) and

\nabla^{1} μ = - \nabla^{- 1} μ = S

. This thus establishes an equivalence between statistical structures and totally symmetric 3-tensors on M. One defines a parameter family of manifolds, the

α

-manifolds

(M, μ, \nabla^{α}, \nabla^{- α}),

where the

α

-connections are given by

\begin{matrix} μ (\nabla_{X}^{α} Y, Z) & = & μ (\nabla_{X} Y, Z) - \frac{α}{2} S (X, Y, Z), \\ μ (\nabla_{X}^{- α} Y, Z) & = & μ (\nabla_{X} Y, Z) + \frac{α}{2} S (X, Y, Z), \end{matrix}

(26)

or equivalently

\begin{matrix} \nabla^{α} = \nabla - \frac{α}{2} A, \nabla^{- α} = \nabla + \frac{α}{2} A, \end{matrix}

(27)

for any

X, Y, Z \in X (M) .

The connections

\nabla^{α}

and

\nabla^{- α}

are dual with respect to

\nabla .

So another equivalent definition of a statistical manifold, is a Riemannian (or more generally pseudo-Riemannian) manifold, together with a totally symmetric tensor

S .

In this generalization, totally geodesic submanifolds now play the role of affine subspaces of Eulidean space in standard statistical models, while geodesics replace straight lines. It has been proved ([21,23]) that every statistical manifold is indeed a statistical model.

2.2. Information Geometry Using Cartan-Schouten Metrics

Let us remind that Cartan-Schouten connections on a Lie group

G,

are the left invariant connections such that every 1-parameter subgroup of G through the identity

ϵ

is a geodesic. The classical +, − and 0 Cartan-Schouten connections ([7]) are respectively given in the Lie algebra

G

of G by

\begin{matrix} \nabla_{x} y = λ [x, y], λ = 1, 0, \frac{1}{2}, \forall x, y \in G . \end{matrix}

(28)

Definition 1.

We will refer to the 0-connection given by

\nabla_{x} y = \frac{1}{2} [x, y],

\forall x, y \in G,

as the Cartan-Schouten canonical connection. It is the unique symmetric (torsion free) Cartan-Schouten which is bi-invariant.

Our models of statistical manifolds of interest are couples (

M, μ

), where

M : = G

is a Lie group and

μ

is a Riemannian or pseudo-Riemannian metric on G, which is covariantly constant (or equivalently, parallel) with respect to the Cartan-Schouten canonical connection ∇. The latter property simply reads

\nabla μ = 0

or, equivalently,

\begin{matrix} x^{+} \cdot μ (y^{+}, z^{+}) = \frac{1}{2} (μ ([x^{+}, y^{+}], z^{+}) + μ (y^{+}, [x^{+}, z^{+}])) \end{matrix}

(29)

for any left invariant vector fields

x^{+}, y^{+}, z^{+}

on

G .

Importantly, when we take the dual connections to coincide with

\nabla,

then the model coincides with both the (pseudo-) Riemannian Statistics and Lie group (bi)invariant statistics as in [29,30,34].

Definition 2.

If a metric μ on a Lie Group, is parallel with respect to the Cartan-Schouten canonical connection, that is

\nabla μ = 0

, then we call it a Cartan-Schouten metric. Equivalently, a Cartan-Schouten metric is a metric whose Levi-Civita connection is the (biinvariant) Cartan-Schouten canonical connection

\nabla .

Note that, although the two approaches of defining a statistical manifold given above are equivalent, in our case the choice of one may influence the direction of the study in the following way. Indeed, since Cartan-Schouten metrics are not left nor right invariant in general, fixing a left invariant Amari-Chentsov 3-tensor induces in general a non-invariant pair of dual connections which thus come with a non-invariant underlying statistical model. In contrast, although it infers a non-invariant 3-tensor, the choice of a left invariant or biinvariant connection compatible with

μ,

leads to an invariant statistical model.

2.3. Some Advantages and More Motivations

As mentioned in the introduction, every 1-parameter subgroup through the unit is a geodesic. Cartan Schouten metrics are geodesically (hence metrically) complete manifolds. So they are a nice model for Riemannian statistics. Their geometries are invariant, since they are conveyed by the biinvariant symmetric connection. The geodesics of ∇ are translates of one parameter subgroups, left or right-invariant vector fields are Killing vector fields or affine collineation of the connection ∇. It has also been proposed, see e.g. [29,30], that Riemannian or pseudo-Riemannian metrics which are biinvariant be used for Riemannian statistics on Lie groups. Cartan-Schouten metrics are a generalization of biinvariant metrics where the invariance assumption on the metric has been dropped. Since biinvariant metrics and Cartan-Shouten metrics share the same Levi-Civita connection (which is biinvariant), they thus share the same geometry. Their common (pseudo-) Riemannian mean coincides with the Lie group biinvariant exponential barycenter. Thus Cartan-Schouten metrics allow for a much larger familly of Lie groups with a biinvariant geometry, as we shall see throughout this work. For example, Heisenberg groups

H_{2 n + 1}

,

n \geq 1,

carry infinitely many non-equivalent Cartan-Schouten metrics, although they do not carry any biinvariant metric (see Example Section 3.5.2). Interestingly, those metrics can have any signature, and in fact, there are as many Cartan-Schouten metrics in any 2-nilpotent Lie group than metrics (of any signature) on its Lie algebra. Another interesting geometric feature is that, there are infinitely many non isometric Cartan Schouten metrics that all share the same geometry (same Riemannian curvature, same holonomy group, same geodesics, same Ricci curvature, same sectional curvature) and yet have different signatures. As a matter of fact, such a set of metrics covers all possible signatures

(p, q),

with no restriction on the the integers

p, q \geq 0

with

p + q = n .

Cartan-Schouten metrics of Lorentz type studied in Section 3.3 may also be of good interest for singular statistical learning theory ([43]) and applications to relativity and quantum physics.

2.4. Hypersurfaces, Totally Geodesic Submanifolds

In standard statistical models, affine spaces play important roles, for example in dimension reduction techniques (principal component analysis and factor analysis), linear models, optimization problems, etc. In the generalization to Riemannian and pseudo-Riemannian statistics, totally geodesic submanifolds now play the role of affine subspaces of Eulidean space in standard statistical models, while geodesics replace straight lines. In that context, geodesics certainly play a crucial role, since they represent the most natural path between probability distributions. A geodesic submanifold

\tilde{M}

of a manifold

(M, \nabla)

endowed with a connection, is a submanifold such that ∇ preserves tangent vectors, so ∇ restricts to a connection in

\tilde{M} .

Thus, any geodesic of

(M, \nabla)

passing through a point of

\tilde{M}

stays entirely in

\tilde{M},

any geodesic in

\tilde{M}

is also a geodesic of

M .

In other words, the geodesic equation of M restricts to the geodesic equation of

\tilde{M} .

Geodesic submanifolds can represent simpler constrained models, the subset of uncorrelated variables in the multivariate gaussian distributions. Totally geodesic submanifold also provide a natural way of dimensionality reduction of parameter space.

In our models, totally geodesic submanifolds are closed subgroups. Non-degenerate subgroup inherit the Cartan-Schouten structure by restriction of the metric. This allows for a natural dimension reduction.

3. Cartan-Schouten Metrics on Lie Groups

3.1. General Results

The following result from [8] covers most of nonsolvable Lie groups, including all semi-simple Lie groups and most of non-decomposable nonsolvable Lie groups.

Theorem 1.

[8] Let G be a perfect Lie group, that is, its Lie algebra

G

satisfies

[G, G] = G .

If G has a Cartan-Schouten metric μ, then μ is necessarily biinvariant.

In particular, every Cartan-Schouten metric on a semi-simple Lie group G or on its cotangent bundle

T^{*} G,

is biinvariant, where

T^{*} G

is endowed with its Lie group structure induced by the right trivialization [8].

We announce the following result whose complete proof will be published elsewhere.

Theorem 2.

A connected Lie group G has a Cartan-Schouten metric μ if and only if its Lie algebra, say

G

, has a metric

\bar{μ}

with same signature as

μ,

satisfying the following

\begin{matrix} 0 & = & \bar{μ} ([[x_{1}, x_{2}], y], z) + μ (y, [[x_{1}, x_{2}], z]) \end{matrix}

(30)

for any

x_{1}, x_{2}, y, z \in G .

Furthermore, at the unit

ϵ \in G

, the two metrics μ and

\bar{μ}

coincide.

For the proof that the Lie algebra of a Lie Group with a Cartan-Schouten metric, has a metric satisfying (30), we will need Proposition 2, which is a corollary of the following well-known Proposition 1, whose proof is provided below using a new method. If a metric on a Lie algebra satisfies (30), we call it a Cartan-Schouten metric on the Lie algebra. Equality (30) is equivalent to the fact that the adjoint operator

{ad}_{u}

of any

u \in [G, G],

is skew symmetric with respect to

μ .

Proposition 1.

Let μ be a Riemannian or pseudo-Riemannian metric on a manifold

M .

If

\nabla^{1}

and

\nabla^{- 1}

are two connections on M which are dual with respect to

μ,

then their respective curvature tensors

R^{\nabla^{1}}

and

R^{\nabla^{- 1}}

satisfy the following equation,

\begin{matrix} 0 & = & μ (R^{\nabla^{1}} (X_{1}, X_{2}) Y, Z) + μ (Y, R^{\nabla^{- 1}} (X_{1}, X_{2}) Z), \end{matrix}

(31)

for any

X_{1}, X_{2}, Y, Z \in X (M) .

Proof.

For any

Y, Z \in X (M),

consider the function

F_{Y, Z} = μ (Y, Z) \in C^{\infty} (M, R)

and denote its differential by

ν : = d F_{Y, Z} .

The relation

\begin{matrix} 0 = \nabla μ (X, Y, Z) = X \cdot μ (Y, Z) - μ (\nabla_{X}^{1} Y, Z) - μ (Y, \nabla_{X}^{- 1} Z)), \end{matrix}

(32)

is thus equivalent to

ν (X) = μ (\nabla_{X} Y^{1}, Z) + μ (Y, \nabla_{X}^{- 1} Z))

for any

X \in χ (M) .

Since

ν

is an exact 1-form, we thus get

\begin{matrix} 0 = d ν (X_{1}, X_{2}) & = & X_{1} \cdot ν (X_{2}) - X_{2} \cdot ν (X_{1}) - ν ([X_{1}, X_{2}]) \\ = & X_{1} \cdot μ (\nabla_{X_{2}}^{1} Y, Z) + X_{1} \cdot μ (Y, \nabla_{X_{2}}^{- 1} Z) - X_{2} \cdot μ (\nabla_{X_{1}}^{1} Y, Z) \\ - X_{2} \cdot μ (Y, \nabla_{X_{1}}^{- 1} Z) - μ (\nabla_{[X_{1}, X_{2}]}^{1} Y, Z) - μ (Y, \nabla_{[X_{1}, X_{2}]}^{- 1} Z), \end{matrix}

(33)

which we now expand, by applying (32) to (33), as

\begin{matrix} 0 & = & μ (\nabla_{X_{1}}^{1} \nabla_{X_{2}}^{1} Y, Z) + μ (\nabla_{X_{2}}^{1} Y, \nabla_{X_{1}}^{- 1} Z) + μ (\nabla_{X_{1}}^{1} Y, \nabla_{X_{2}}^{- 1} Z) + μ (Y, \nabla_{X_{1}}^{- 1} \nabla_{X_{2}}^{- 1} Z) \\ - μ (\nabla_{X_{2}}^{1} \nabla_{X_{1}}^{1} Y, Z) - μ (\nabla_{X_{1}}^{1} Y, \nabla_{X_{2}}^{- 1} Z) - μ (\nabla_{X_{2}}^{1} Y, \nabla_{X_{1}}^{- 1} Z) - μ (Y, \nabla_{X_{2}}^{- 1} \nabla_{X_{1}}^{- 1} Z) \\ - μ (\nabla_{[X_{1}, X_{2}]}^{1} Y, Z) - μ (Y, \nabla_{[X_{1}, X_{2}]}^{- 1} Z) . \end{matrix}

(34)

Equality (34) readily simplifies to

\begin{matrix} 0 & = & μ ((\nabla_{X_{1}}^{1} \nabla_{X_{2}}^{1} - \nabla_{X_{2}}^{1} \nabla_{X_{1}}^{1} - \nabla_{[X_{1}, X_{2}]}^{1}) Y, Z) \\ + μ (Y, (\nabla_{X_{1}}^{- 1} \nabla_{X_{2}}^{- 1} - \nabla_{X_{2}}^{- 1} \nabla_{X_{1}}^{- 1} - \nabla_{[X_{1}, X_{2}]}^{- 1}) Z) \\ = & μ (R^{\nabla^{1}} (X_{1}, X_{2}) Y, Z) + μ (Y, R^{\nabla^{- 1}} (X_{1}, X_{2}) Z) . \end{matrix}

(35)

□

As a direct corollary of Proposition 1, we get the following result, which is also consequence of the Ambrose-Singer holonomy theorem.

Proposition 2.

Let μ be a Riemannian or pseudo-Riemannian metric and

\bar{\nabla}

a (not necessarily torsion free) connection, on a manifold

M .

Suppose

\bar{\nabla} μ = 0 .

Then the curvature tensor

R^{\bar{\nabla}}

of

\bar{\nabla}

, is skew symmetric with respect to

μ :

\begin{matrix} 0 & = & μ (R^{\bar{\nabla}} (X_{1}, X_{2}) Y, Z) + μ (Y, R^{\bar{\nabla}} (X_{1}, X_{2}) Z), \end{matrix}

(36)

for any

X_{1}, X_{2}, Y, Z \in X (M) .

Proof.

In Proposition 1, take

\nabla^{1} = \nabla^{- 1} = \bar{\nabla},

the identity (31 ) then becomes (36). □

The proof that the Lie algebra of a Lie Group with a Cartan-Schouten metric, has a metric satisfying (30) is deduced from Proposition 2. Indeed, if

M = G

is a Lie group,

G

its Lie algebra, we take

X_{1}, X_{2}, Y, Z

to be all left invariant vector fields on G and

\bar{\nabla}

to be the Cartan-Schouten standard connection. Then Equation (36 ) taken at the unit of G, gives Equation (30). The proof that from a metric on

G

satisfying Equation (30), we can construct a Cartan-Schouten metric on any connected Lie group with Lie algebra

G,

will be published elsewhere.

3.2. Riemannian Cartan-Schouten Metrics

In this section, we show that the Lie groups with a Riemannian metric which is also a Cartan-Schouten metric, are essentially all the 2-nilpotent Lie groups, the compact simple Lie groups, the Abelian Lie groups and all their Cartesian (direct) products.

The following result is due to J. Milnor.

Theorem 3.

[28] The only Lie groups with a biinvariant Riemannian metric are the Cartesian product of some compact Lie group and some additive vector group.

Milnor’s result somehow implicitly spelled disappointment for those who expected a lot of Lie groups with biinvariant Riemannian metrics for applications. However, we show here that if we drop the biinvariance property for the Riemannian metric, and only require that its Levi-Civita connection be biinvariant (which characterizes Cartan-Schouten metrics), then it turns out that a much wider family of Lie groups enjoy such a property. We actually further show that Cartan Schouten metrics of any given signature are in abundance on 2-nilpotent Lie groups, as we shall see Section 3.5.

Theorem 4.

For a solvable non-Abelian Lie group G, the following are equivalent. (1) G admits a positive (or negative) definite Cartan-Schouten metric. (2) G is 2-nilpotent.

Proof.

Let

G

be the Lie algebra of G. Since

G

is solvable, the linear transformation

a d_{u}

is nilpotent for any

u \in [G, G] .

Thus, if G admits a Cartan-Schouten metric which is positive (or negative) definite,

a d_{u}

being skew-adjoint (see (30)), is necessarily equal to the zero map, for any

u \in [G, G] .

Thus

G

is 2-nilpotent. Conversely, any 2-nilpotent Lie group possess as many Cartan-Schouten metrics which are positive definite, as left (resp. right) invariant Riemannian metrics. See Theorem 11. □

From Theorem 1 and Theorem 3, we deduce following

Theorem 5.

A perfect Lie group admits a Cartan-Schouten metric which is positive (or negative) definite, if and only if it is semisimple and compact.

Proof.

According to Theorem 1, if a perfect Lie group G admits a Cartan-Schouten metric, then the latter must be biinvariant. Now from Theorem 3, we deduce that G must then be a compact and semisimlple Lie group. □

3.3. Lorentzian Cartan-Schouten Metrics

3.3.1. A General Result on Lorentzian Cartan-Schouten Metrics

Recall that Lie groups with a biinvariant metric are those special Lie groups with a Cartan-Schouten metric

μ

which is invariant under both left and right translations of all the group elements. Equivalently, the Lie algebras of such Lie groups have a metric (the value of

μ

at the unit

ϵ

) which is invariant under the adjoint operator (adjoint-invariant, or ad-invariant for short) of every element of the Lie algebra.

Theorem 6.

[26] The only simply connected and nonsimple Lie groups which admit an indecomposable biinvariant Lorentz metric are the oscillator Lie groups

G_{λ}

.

Unlike biinvariant Lorentz metrics which only exist on the oscillator Lie groups

G_{λ},

the special linear group SL(2), the Abelian Lie groups and their direct products, the more general Cartan-Schouten metrics which are Lorentz metrics, exist in abundance. Indeed, every 2-nilpotent Lie group of dimension

n,

has as many Lorentzian Cartan-Schouten metrics as there are scalar products of signature (

1, n - 1

) in any vector space of dimension n (see Theorem 11).

We announce the following general result whose proof will be published elsewhere.

Theorem 7.

Let G be a nondecomposable solvable Lie group. Suppose G has a Cartan Schouten metric, which is a Lorentzian metric.Then G is one of the following types: (1) either 2-nilpotent or 3-nilpotent, (2) 3-step solvable and the derived ideal of its Lie algebra is a direct sum of a Heisenberg Lie algebra and an Abelian Lie algebra.

3.3.2. The Oscillator Lie algebras and Lie Groups

The oscillator Lie algebras

G_{λ}

(resp. Lie groups

G_{λ}

) are an important family of Lie groups in mathematics and physics with a rich geometry [26]. Let

λ : = (λ_{1}, \dots, λ_{n})

be in

R^{n}

with

0 < λ_{1} \leq λ_{2} \leq \dots \leq λ_{n}

and let

G_{λ}

stand for

R^{2 n + 2} = R \times R \times C^{n}

endowed with the Lie group structure given by the following product. Let

σ = (t, s, z_{1}, \dots z_{n}), τ = (t^{'}, s^{'}, z_{1}^{'}, \dots z_{n}^{'})

\in G_{λ}

,

\begin{matrix} σ τ & = & (t + t^{'}, s + s^{'} + \frac{1}{2} \sum_{j = 1}^{n} I m ({\bar{z}}_{j}) exp (i t λ_{j}) z_{j}^{'}, z_{1} + exp (i t λ_{1}) z_{1}^{'}, \\ \dots, z_{j} + exp (i t λ_{j}) z_{j}^{'}, \dots, z_{n} + exp (i t λ_{n}) z_{n}^{'}) . \end{matrix}

(37)

Set

z_{j} = x_{j} + i y_{j}

, where

x_{j}, y_{j} \in R

and

i^{2} = - 1,

so that

σ

and

τ

are respectively identified with the

(2 n + 2)

-uplets

(t, s, x_{1}, \dots, x_{n}, y_{1}, \dots, y_{n}),

(t^{'}, s^{'}, x_{1}^{'}, \dots x_{n}^{'}, y_{1}^{'}, \dots, y_{n}^{'},) .

Hence, the above product now reads

\begin{matrix} σ τ & = & (t + t^{'}, s + s^{'} + \frac{1}{2} \sum_{j = 1}^{n} (x_{j} y_{j}^{'} - y_{j} x_{j}^{'}) cos (λ_{j} t) + \frac{1}{2} \sum_{j = 1}^{n} (x_{j} x_{j}^{'} + y_{j} y_{j}^{'}) sin (λ_{j} t); \\ x_{1} + x_{1}^{'} cos (λ_{1} t) - y_{1}^{'} sin (λ_{1} t), \dots, x_{n} + x_{n}^{'} cos (λ_{n} t) - y_{n}^{'} sin (λ_{n} t); \\ y_{1} + y_{1}^{'} cos (λ_{1} t) + x_{1}^{'} sin (λ_{1} t), \dots, y_{n} + y_{n}^{'} cos (λ_{n} t) + x_{n}^{'} sin (λ_{n} t)) . \end{matrix}

(38)

The Lie algebra

G_{λ}

of the oscillator group

G_{λ}

has a basis

(e_{- 1}, e_{0}, e_{1}, \dots, e_{2 n})

in which its Lie bracket reads

\begin{matrix} [e_{j}, e_{n + j}] = e_{0}, [e_{- 1}, e_{j}] = λ_{j} e_{n + j}, [e_{- 1}, e_{n + j}] = - λ_{j} e_{j}, j = 1, \dots, n . \end{matrix}

(39)

Following [26], up to a scalar factor, there is a unique adjoint invariant metric

{〈, 〉}_{λ}

on

G_{λ}

given, for

x = \sum_{j = - 1}^{2 n} x_{j} e_{j},

y = \sum_{j = - 1}^{2 n} y_{j} e_{j},

by

\begin{matrix} {〈 x, y 〉}_{λ} & = & x_{- 1} y_{0} + y_{- 1} x_{0} + \sum_{j = 1}^{n} \frac{1}{λ_{j}} (x_{j} y_{j} + x_{n + j} y_{n + j}) . \end{matrix}

(40)

Note that the ad-invariant metric

{〈, 〉}_{λ}

is a Lorentz metric. Following Theorem 2, in order to find all Cartan-Schouten metrics, we just need to directly look at the Lie algebra level. We apply Equation (30) to get all the Cartan-Schouten metrics on

G_{λ} .

Note that the derived ideal

[G_{λ}, G_{λ}]

of

G_{λ},

is spanned by

(e_{0}, e_{1}, \dots, e_{2 n}) .

When Equation (30) is satisfied in a trivial way, so that no information on

μ

can be read off, we simply skip it.

$0 = μ ([e_{j}, e_{- 1}], e_{- 1}) + μ (e_{- 1}, [e_{j}, e_{- 1}]) = - 2 λ_{j} μ (e_{n + j}, e_{- 1}),$
$0 = μ ([e_{j}, e_{- 1}], e_{0}) + μ (e_{- 1}, [e_{j}, e_{0}]) = - λ_{j} μ (e_{n + j}, e_{0}),$
$0 = μ ([e_{j}, e_{- 1}], e_{k}) + μ (e_{- 1}, [e_{j}, e_{k}]) = - λ_{j} μ (e_{n + j}, e_{k}),$
$0 = μ ([e_{j}, e_{- 1}], e_{n + k}) + μ (e_{- 1}, [e_{j}, e_{n + k}]) = - λ_{j} μ (e_{n + j}, e_{n + k}) + δ_{j, k} μ (e_{- 1}, e_{0}),$
$0 = μ ([e_{j}, e_{k}], e_{- 1}) + μ (e_{k}, [e_{j}, e_{- 1}]) = - λ_{j} μ (e_{k}, e_{n + j}),$
$0 = μ ([e_{j}, e_{k}], e_{n + p}) + μ (e_{k}, [e_{j}, e_{n + p}]) = δ_{j, p} μ (e_{k}, e_{0}) = 0,$
$0 = μ ([e_{j}, e_{n + k}], e_{0}) + μ (e_{n + k}, [e_{j}, e_{0}]) = δ_{j, k} μ (e_{0}, e_{0}) = 0,$
$0 = μ ([e_{j}, e_{n + k}], e_{n + p}) + μ (e_{n + k}, [e_{j}, e_{n + p}]) = δ_{j, k} μ (e_{0}, e_{n + p}) + δ_{j, p} μ (e_{n + k}, e_{0}),$
$0 = μ ([e_{n + j}, e_{- 1}], e_{- 1}) + μ (e_{- 1}, [e_{n + j}, e_{- 1}]) = 2 λ_{j} μ (e_{j}, e_{- 1}),$
$0 = μ ([e_{n + j}, e_{- 1}], e_{k}) + μ (e_{- 1}, [e_{n + j}, e_{k}]) = λ_{j} μ (e_{j}, e_{k}) - δ_{j, k} μ (e_{- 1}, e_{0}),$

We summarize the above in the

Theorem 8.

A metric on

G_{λ}

is a Cartan-Schouten metric if and only if its nonzero coefficients are as follows

\begin{matrix} μ (e_{j}, e_{j}) = μ (e_{n + j}, e_{n + j}) = \frac{1}{λ_{j}} μ (e_{- 1}, e_{0}) \neq 0, μ (e_{- 1}, e_{- 1}), j = 1, \dots, n . \end{matrix}

(41)

Every metric as in (41) is a Lorentzian. It is ad-invariant if and only if

μ (e_{- 1}, e_{- 1}) = 0 .

3.4. Case Where the Exponential Map Is a Diffeomorphism

For many Lie groups, the exponential map

exp : G \to G,

is a diffeomorphism. This is the case for every connected and simply connected nilpotent Lie group and more generally, every connected and simply connected completely solvable Lie group. More precisely, following the well known works of Dixmier and Saito in 1957, the exponential map is a diffeomorphism, if and only if G is a connected and simply connected solvable Lie group that does not contain a closed subgroup isomorphic to the circle, the universal cover of the special linear group

S L (2, R)

of 2 by 2 real matrices with determinant

1,

the universal cover of the group

E (2) : = O (2) ⋉ R^{2}

of rigid motions of the Euclidean 2-space, or the 4-dimensional connected and simply connected oscillator Lie group with Lie algebra

\bar{e}

having a basis

(x, y, H, z),

with Lie bracket

[H, x] = y,

[H, y] = - x,

[x, y] = z .

Equivalently, the exponential map is a diffeomorphism, if and only if G is a connected and simply connected solvable Lie group whose Lie algebra

G

does not contain a Lie subalgebra isomorphic the Lie algebra

e

of

E (2) : = O (2) ⋉ R^{2}

or

\bar{e} .

For Lie groups containing a closed subgroup isomorphic to one of the aforementioned subgroups, the exponential map is not injective. The exponential map could be surjective although it may even not be a local diffeomorphism, and may have a cut locus. This is the case for

S U (2) .

We announce the following result whose proof will be published elsewhere and which will be needed for the proof of Theorem 11. It allows to directly get the value

μ_{σ}

of a Cartan-Schouten metric

μ

at any point

σ \in G

from the sole value

μ_{ϵ}

at the unit

ϵ,

exactly like for invariant metrics.

Theorem 9.

Let G be a Lie group,

G

its Lie algebra, ϵ its unit. If

exp : G \to G

is a diffeomorphism and G has a Cartan-Schouten metric μ, then setting

\bar{μ} : = μ_{ϵ}

and

log : = {exp}^{- 1},

we have, for every left invariant vector fields

x^{+},

y^{+}

on G and

σ \in G,

\begin{matrix} (μ (x^{+}, y^{+})) (σ) & = & \sum_{p, q = 0}^{\infty} \frac{1}{p! q! 2^{p + q}} \bar{μ} (a d_{log σ}^{p} x, a d_{log σ}^{q} y) . \end{matrix}

(42)

Conversely, if

exp : G \to G

is a diffeomorphism, then every metric

\bar{μ}

on

G

satisfying (30) gives rise to a Cartan-Schouten metric μ on G, via (42).

3.5. Cartan-Schouten Metrics on 2-nilpotent Lie Groups

Recall that a Lie group G is said to be 2-step nilpotent if its Lie algebra, say

G,

is 2-step nilpotent (2-nilpotent, for short). Equivalently, the derived ideal

[G, G]

is contained in the center

Z

of

G .

We consider here Lie algebras

G

which, as vector spaces, split as a direct sum

G = V \oplus Z

of two subspaces V and

Z,

such that

[G, G] = [G, V] = [V, V] = Z

and

[G, Z] = 0 .

Every nondecomposable 2-nilpotent Lie algebra lies in that category. The 2-nilpotent Lie groups are the nonabelian Lie groups that are the closest possible to being Abelian. In that regard, we expect them to be, in applications to Information Geometry, Statistics, Machine Learning, amongst the most important and handiest Lie groups with Cartan-Schouten metrics. On the other hand, they enjoy a very rich geometry and play special important roles in many areas of Mathematics [11,12,13]. Carnot groups of step 2, H-type Lie groups, Heisenberg groups, are special cases of 2-nilpotent Lie groups [14,15,16].

Theorem 10.

On a connected 2-nilpotent Lie group of dimension n, the set of Cartan-Schouten metrics is a connected and simply connected manifold of dimension

\frac{1}{2} n (n + 1) .

In Theorem 4, we have shown that 2-nilpotent Lie groups are the only non-Abelian solvable Lie groups with at least one positive (or negative) definite Cartan-Schouten metric. Theorem 11 shows that a 2-nilpotent Lie group has infinitely many Cartan-Schouten of any desired signature

(p, n - p),

for any integer

0 \leq p \leq n .

Theorem 11.

Let G be a 2-nilpotent Lie group,

G

its Lie algebra,

dim G = n

. There is a 1-1 correspondance between the set of left invariant metrics of signature

(p, n - p)

and the set of Cartan Schouten metrics of signature

(p, n - p)

on

G,

for any integer

p,

with

0 \leq p \leq n .

Namely, every metric

\bar{μ}

on

G

gives rise to a flat Cartan-Schouten metric μ of the same signature on any connected Lie group with Lie algebra

G

. In particular, on the corresponding connected and simply connected Lie group

\tilde{G}

, μ is given by

\begin{matrix} (μ (x^{+}, y^{+})) (σ) & = & \bar{μ} (x, y) + \frac{1}{2} \bar{μ} ([log σ, x], y) + \frac{1}{2} \bar{μ} (x, [log σ, y]) \\ + \frac{1}{4} \bar{μ} ([log σ, x], [log σ, y]), \end{matrix}

(43)

for every

σ \in \tilde{G}

and

x, y \in G,

where

log : \tilde{G} \to G,

log : = {exp}^{- 1} .

Proof.

This is a direct consequence of more general results (see Theorem 2 and Theorem 9) whose proofs will be published elsewhere. Every left invariant metrics

{\bar{μ}}^{+}

of signature

(p, n - p)

on G is uniquely given by its value

\bar{μ}

at the unit

ϵ .

Since G is 2-nilpotent, every metric

\bar{μ}

in

G

satisfies (30) in Theorem 2. We apply Formula (42) of Theorem 9 to get (43) which defines a unique Cartan-Schouten metric of signature

(p, n - p)

on

G .

Conversely, if

μ

is a Cartan-Schouten metric on G, its value

\bar{μ}

at the unit gives rise to a unique left invariant metric

{\bar{μ}}^{+}

on G which, by construction, has the same signature as

μ .

□

One also has the following

Lemma 1.

[19] Let

D \in gl (n, R) .

Consider the semidirect sum

R D ⋉ R^{n} .

A Lie group with Lie algebra

R D ⋉ R^{n}

has a Cartan-Schouten metric if and only if it is 2-step nilpotent, or equivalently

D^{2} = 0 .

So in particular, the group

O (2, R) ⋉ R^{2}

of rigid displacements of the Euclidean plane, does not have any Cartan-Schouten metric.

3.5.1. Proof of Theorem 10

Let G be a 2-nilpotent Lie group and

G

its Lie algebra. We use Theorem 11 to build a biunivoque correspondence between the set of left invariant metrics and that of Cartan-Schouten metrics on

G .

A left invariant metric and a Cartan-Schouten metric are both uniquely given by their respective value at the unit

ϵ

of

G .

So the correspondence maps any left invariant metric to the unique Cartan-Schouten metric which coincides with it at

ϵ,

and vice versa. Thus, the space of Cartan-Schouten metrics on a 2-nilpotent Lie group of dimenion n, can be identified with the set of nonsingular symmetric matrices in

gl (n, R),

which is a smooth manifold of dimension

\frac{n (n + 1)}{2} .

3.5.2. The Heisenberg Lie Group $H_{2 n + 1}$

Consider the

(2 n + 1)

-dimensional Heisenberg group

H_{2 n + 1} : = \{σ = (\begin{matrix} 1 & x & z \\ 0 & 1 & y^{T} \\ 0 & 0 & 1 \end{matrix}), x, y \in R^{n}, z \in R\}

. Its Lie algebra

H_{2 n + 1}

is spanned by the

(n + 2) \times (n + 2)

elementary matrices

e_{j} : = E_{1, j + 1},

e_{n + j} : = E_{j + 1, n + 2},

e_{2 n + 1} : = E_{1, n + 2},

j = 1, \dots, n .

So the Lie bracket reads

[e_{j}, e_{n + j}] = e_{2 n + 1} .

The fact that the exponential map is a diffeomorphism can be written explicitly as

σ = exp (\sum_{i = 1}^{n} (x_{i} e_{i} + y_{i} e_{n + i}) + (z - \frac{1}{2} x y^{T}) e_{2 n + 1}),

so

log σ = \sum_{i = 1}^{n} (x_{i} e_{i} + y_{i} e_{n + i}) + (z - \frac{1}{2} x y^{T}) e_{2 n + 1} .

We identify

H_{2 n + 1}

with

R^{2 n + 1}

, with the multiplication

(x, y, z) (x^{'}, y^{'}, z^{'}) = (x + x^{'}, y + y^{'}, z + z^{'} + x y^{' T}) .

The left-invariant vector fields corresponding to

e_{j}

,

e_{n + j},

e_{2 n + 1},

are respectively

\begin{matrix} e_{j}^{+} = \frac{\partial}{\partial_{x_{j}}}, e_{n + j}^{+} = \frac{\partial}{\partial_{y_{j}}} + x_{j} \frac{\partial}{\partial_{z}}, e_{2 n + 1}^{+} = \frac{\partial}{\partial_{z}}, \end{matrix}

(44)

so we have

\frac{\partial}{\partial_{y_{j}}} = e_{n + j}^{+} - x_{j} e_{2 n + 1}^{+},

whereas the left-invariant 1-forms associated to the dual basis

e_{j}^{*},

e_{n + j}^{*},

e_{2 n + 1}^{*},

are:

{(e_{j}^{*})}^{+} = d x_{j}, {(e_{n + j}^{*})}^{+} = d y_{j}, {(e_{2 n + 1}^{*})}^{+} = d z - x_{j} d y_{j},

j = 1, \dots, n .

The Heisenberg Lie group does not have any biinvariant metric, since

[H_{2 n + 1}, H_{2 n + 1}] = R e_{2 n + 1}

is also the center of

H_{2 n + 1} .

However, it does possess infinitely many Cartan-Schouten metrics

μ

of any desired signature. On applying Formula (43) of Theorem 11, we explicitly give the expression of any such metrics

μ

on

H_{2 n + 1} .

We denote the constants

\bar{μ} (e_{p}, e_{q})

by

k_{p, q},

p, q = 1, \dots, 2 n + 1,

where again

\bar{μ} : = μ_{ϵ} .

Next we give the explicit expression

[log σ, x],

for every

σ \in H_{2 n + 1}

and x in the basis (

e_{1}, \dots, e_{2 n + 1}

) of

H_{2 n + 1} :

\begin{matrix} [log σ, e_{i}] & = & \sum_{k = 1}^{n} [(x_{k} e_{k} + y_{k} e_{n + k}) + (z - \frac{1}{2} x y^{T}) e_{2 n + 1}, e_{i}] = - y_{i} e_{2 n + 1}, \\ [log σ, e_{n + i}] & = & \sum_{k = 1}^{n} [(x_{k} e_{k} + y_{k} e_{n + k}) + (z - \frac{1}{2} x y^{T}) e_{2 n + 1}, e_{n + i}] = x_{i} e_{2 n + 1} . \end{matrix}

(45)

So we now get:

(μ (e_{2 n + 1}^{+}, e_{2 n + 1}^{+})) (σ) = \bar{μ} (e_{2 n + 1}, e_{2 n + 1}) = k_{2 n + 1, 2 n + 1},

(μ (e_{2 n + 1}^{+}, e_{i}^{+})) (σ) = \bar{μ} (e_{2 n + 1}, e_{i}) + \frac{1}{2} \bar{μ} ([log σ, e_{i}], e_{2 n + 1}) = k_{i, 2 n + 1} - \frac{1}{2} k_{2 n + 1, 2 n + 1} y_{i},

(μ (e_{2 n + 1}^{+}, e_{n + i}^{+})) (σ) = \bar{μ} (e_{2 n + 1}, e_{n + i}) + \frac{1}{2} \bar{μ} ([log σ, e_{n + i}], e_{2 n + 1})

= k_{n + i, 2 n + 1} + \frac{1}{2} k_{2 n + 1, 2 n + 1} x_{i},

In the same way, we obtain

\begin{matrix} (μ (e_{i}^{+}, e_{j}^{+})) (σ) & = & \bar{μ} (e_{i}, e_{j}) - \frac{1}{2} y_{i} \bar{μ} (e_{2 n + 1}, e_{j}) - \frac{1}{2} y_{j} \bar{μ} (e_{i}, e_{2 n + 1}) + \frac{1}{4} y_{i} y_{j} \bar{μ} (e_{2 n + 1}, e_{2 n + 1}) \\ = & \frac{1}{4} k_{2 n + 1, 2 n + 1} y_{i} y_{j} - \frac{1}{2} k_{j, 2 n + 1} y_{i} - \frac{1}{2} k_{i, 2 n + 1} y_{j} + k_{i, j}, \\ (μ (e_{i}^{+}, e_{n + j}^{+})) (σ) & = & \bar{μ} (e_{i}, e_{n + j}) - \frac{1}{2} y_{i} \bar{μ} (e_{2 n + 1}, e_{n + j}) + \frac{1}{2} x_{j} \bar{μ} (e_{i}, e_{2 n + 1}) \\ - \frac{1}{4} y_{i} x_{j} \bar{μ} (e_{2 n + 1}, e_{2 n + 1}) \\ = & - \frac{1}{4} k_{2 n + 1, 2 n + 1} y_{i} x_{j} - \frac{1}{2} k_{n + j, 2 n + 1} y_{i} + \frac{1}{2} k_{i, 2 n + 1} x_{j} + k_{i, n + j} \end{matrix}

(46)

Similarly, we compute the following:

\begin{matrix} (μ (e_{n + i}^{+}, e_{n + j}^{+})) (σ) & = & \bar{μ} (e_{n + i}, e_{n + j}) + \frac{1}{2} x_{i} \bar{μ} (e_{2 n + 1}, e_{n + j}) + \frac{1}{2} x_{j} \bar{μ} (e_{n + i}, e_{2 n + 1}) \\ + \frac{1}{4} x_{i} x_{j} \bar{μ} (e_{2 n + 1}, e_{2 n + 1}) \\ = & \frac{1}{4} k_{2 n + 1, 2 n + 1} x_{i} x_{j} + \frac{1}{2} k_{n + j, 2 n + 1} x_{i} + \frac{1}{2} k_{n + i, 2 n + 1} x_{j} + k_{n + i, n + j} . \end{matrix}

(47)

We use the above to deduce the following

\begin{matrix} μ (\frac{\partial}{\partial x_{i}}, \frac{\partial}{\partial x_{j}}) & = & μ (e_{i}^{+}, e_{j}^{+}), μ (\frac{\partial}{\partial x_{i}}, \frac{\partial}{\partial x_{n + j}}) = μ (e_{i}^{+}, e_{n + j}^{+}) - x_{j} μ (e_{i}^{+}, e_{2 n + 1}^{+}), \\ μ (\frac{\partial}{\partial x_{i}}, \frac{\partial}{\partial x_{2 n + 1}}) & = & μ (e_{i}^{+}, e_{2 n + 1}^{+}) = k_{i, 2 n + 1} - \frac{1}{2} y_{i} k_{2 n + 1, 2 n + 1}, \\ μ (\frac{\partial}{\partial x_{n + i}}, \frac{\partial}{\partial x_{n + j}}) & = & μ (e_{n + i}^{+}, e_{n + j}^{+}) - x_{j} μ (e_{n + i}^{+}, e_{2 n + 1}^{+}) + x_{i} x_{j} μ (e_{2 n + 1}^{+}, e_{2 n + 1}^{+}) \\ - x_{i} μ (e_{n + j}^{+}, e_{2 n + 1}^{+}), \\ μ (\frac{\partial}{\partial x_{n + j}}, \frac{\partial}{\partial x_{2 n + 1}}) & = & μ (e_{n + j}^{+}, e_{2 n + 1}^{+}) - x_{j} μ (e_{2 n + 1}^{+}, e_{2 n + 1}^{+}), \\ μ (\frac{\partial}{\partial x_{2 n + 1}}, \frac{\partial}{\partial x_{2 n + 1}}) & = & μ (e_{2 n + 1}^{+}, e_{2 n + 1}^{+}) . \end{matrix}

(48)

Plugging the

f_{p, q}

in, we finally get the following coefficients

\begin{matrix} μ (\frac{\partial}{\partial x_{i}}, \frac{\partial}{\partial x_{j}}) & = & k_{i, j} - \frac{1}{2} y_{i} k_{2 n + 1, j} - \frac{1}{2} y_{j} k_{i, 2 n + 1} + \frac{1}{4} y_{i} y_{j} k_{2 n + 1, 2 n + 1}, \end{matrix}

\begin{matrix} μ (\frac{\partial}{\partial x_{i}}, \frac{\partial}{\partial x_{n + j}}) & = & k_{i, n + j} - \frac{1}{2} y_{i} k_{n + j, 2 n + 1} - \frac{1}{2} x_{j} k_{i, 2 n + 1} + \frac{1}{4} y_{i} x_{j} k_{2 n + 1, 2 n + 1}, \end{matrix}

\begin{matrix} μ (\frac{\partial}{\partial x_{i}}, \frac{\partial}{\partial x_{2 n + 1}}) & = & k_{i, 2 n + 1} - \frac{1}{2} y_{i} k_{2 n + 1, 2 n + 1}, \end{matrix}

\begin{matrix} μ (\frac{\partial}{\partial x_{n + i}}, \frac{\partial}{\partial x_{n + j}}) & = & k_{n + i, n + j} - \frac{1}{2} (x_{i} k_{n + j, 2 n + 1} + x_{j} k_{n + i, 2 n + 1}) + \frac{1}{4} x_{i} x_{j} k_{2 n + 1, 2 n + 1}, \end{matrix}

\begin{matrix} μ (\frac{\partial}{\partial x_{n + j}}, \frac{\partial}{\partial x_{2 n + 1}}) & = & k_{n + j, 2 n + 1} - \frac{1}{2} x_{j} k_{2 n + 1, 2 n + 1}, μ (\frac{\partial}{\partial x_{2 n + 1}}, \frac{\partial}{\partial x_{2 n + 1}}) = k_{2 n + 1, 2 n + 1} . \end{matrix}

In summary, we have the following.

Proposition 3.

Any Cartan-Schouten metric on

H_{2 n + 1}

is of the following form

\begin{matrix} μ & = & \sum_{i, j = 1}^{n} [(k_{i, j} - \frac{1}{2} y_{i} k_{2 n + 1, j} - \frac{1}{2} y_{j} k_{i, 2 n + 1} + \frac{1}{4} y_{i} y_{j} k_{2 n + 1, 2 n + 1}) d x_{i} d x_{j} \\ + (k_{i, n + j} - \frac{1}{2} y_{i} k_{n + j, 2 n + 1} - \frac{1}{2} x_{j} k_{i, 2 n + 1} + \frac{1}{4} y_{i} x_{j} k_{2 n + 1, 2 n + 1}) d x_{i} d x_{n + j} \\ + (k_{n + i, n + j} - \frac{1}{2} x_{i} k_{n + j, 2 n + 1} - \frac{1}{2} x_{j} k_{n + i, 2 n + 1} + \frac{1}{4} x_{i} x_{j} k_{2 n + 1, 2 n + 1}) d x_{n + i} d x_{n + j}] \\ + \sum_{j = 1}^{n} [(k_{j, 2 n + 1} - \frac{1}{2} y_{j} k_{2 n + 1, 2 n + 1}) d x_{j} + (k_{n + j, 2 n + 1} - \frac{1}{2} x_{j} k_{2 n + 1, 2 n + 1}) d x_{n + j}] d z \\ + k_{2 n + 1, 2 n + 1} d z^{2}, \end{matrix}

(49)

where the

k_{p, q}

’s are (constant) real parameters,

p, q = 1, \dots, 2 n + 1 .

3.5.3. Cartan-Schouten Metrics on Carnot Groups

Here, we consider Carnot groups of order 2 and their Lie algebras, namely the so-called on H-type Lie groups and Lie algebras. Let G be a simply connected Lie group whose Lie algebra

G

is graded

G = V \oplus Z

such that

[G, G] = [V, V] = Z

and

[G, Z] = 0 .

One endows

G

with an inner product

〈, 〉

such that the orthogonal

V^{⊥}

of V is

Z .

This induces, for each

z \in Z,

a linear map

J_{Z} : V \to V,

given by

〈 J_{Z} (w), w^{'} 〉 : = 〈 Z, [w, w^{'}] 〉

for every

w, w^{'} \in V .

The map

J_{Z}

is skew-symmetric or equivalently

〈 J_{Z} (w), w^{'} 〉 = - 〈 w, J_{Z} (w^{'}) 〉

. The Lie algebra

G

is said to be of H-type if for every v in V with unit length,

a d_{v}

is an isometry from

{(ker (a d_{v}))}^{⊥}

to

Z .

The endomorphisms

J_{Z}

satisfy

J_{Z} J_{Z^{'}} + J_{Z^{'}} J_{Z} = - 2 〈 Z, Z 〉 I_{V},

in particular

J_{Z}^{2} = - 〈 Z, Z 〉 I_{V},

where

I_{V} : V \to V

is the identity map of

V .

If

(Z_{1}, \dots, Z_{m})

is an orthonormal basis of

Z

, that is,

〈 Z_{i}, Z_{j} 〉 = δ_{i, j},

then

[v, w] = \sum_{i = 1}^{m} 〈 J_{Z_{i}} (v), w 〉 Z_{i}

for any

v, w \in V .

We consider an orthogonal basis

(X_{1}, \dots, X_{n}, Z_{1}, \dots, Z_{m})

of

G

such that

(X_{1}, \dots, X_{n})

is a basis of V and

(Z_{1}, \dots, Z_{m})

a basis of

Z

such that

〈 Z_{i}, Z_{i} 〉 = 1

for every

i = 1, \dots, m

. The Lie bracket reads

[X_{i}, X_{j}] = \sum_{k = 1}^{m} 〈 J_{Z_{k}} (X_{i}), X_{j} 〉 Z_{k} .

As a manifold, G is identified with

R^{n + m}

. We denote elements

σ = exp (x_{1} X_{1} + \dots + x_{n} X_{n} + z_{1} Z_{1} + \dots + z_{m} Z_{m})

of G, by

(x, z)

where

x = (x_{1}, x_{2}, \dots, x_{n}) \in R^{n}

and

z = (z_{1}, z_{2}, \dots, z_{m}) \in R^{m}

, correspond to normal coordinates associated with

exp : G \to G

, so the product reads

\begin{matrix} (x, z) (x^{'}, z^{'}) & = & (x + x^{'}, z + z^{'} + \frac{1}{2} 〈 γ x, x^{'} 〉) \\ = & (x_{1} + x_{1}^{'}, \dots, x_{n} + x_{n}^{'}, z_{1} + z_{1}^{'} + \frac{1}{2} 〈 γ^{1} x, x^{'} 〉, \dots, \\ z_{m} + z_{m}^{'} + \frac{1}{2} 〈 γ^{n} x, x^{'} 〉) \end{matrix}

(50)

where

〈 γ^{k} x, x^{'} 〉 = \sum_{i = 1}^{n} \sum_{l = 1}^{n} γ_{i l}^{k} x_{l} x_{i}^{'}

and

γ^{k} = (γ_{i l}^{k})

is the matrix of the linear map

J_{Z_{k}} .

Note that we have the equality

\begin{matrix} γ_{j l}^{k} = - C_{j l}^{n + k}, with [X_{j}, X_{l}] = \sum_{k = 1}^{m} C_{j l}^{n + k} Z_{k} . \end{matrix}

(51)

The left invariant vector fields

X_{j}^{+}

,

j = 1, \dots, n

corresponding to

X_{j}

are

\begin{matrix} X_{j}^{+} = \frac{\partial}{\partial x_{j}} + \frac{1}{2} \sum_{k = 1}^{m} (〈 Z_{k}, [X, X_{j}] 〉) \frac{\partial}{\partial z_{k}} = \frac{\partial}{\partial x_{j}} - \frac{1}{2} \sum_{k = 1}^{m} (〈 J_{Z_{k}} (X_{j}), X 〉) \frac{\partial}{\partial z_{k}} \end{matrix}

(52)

where

X = \frac{1}{2} \sum_{k = 1}^{n} x_{j} X_{j} .

In other words,

\begin{matrix} X_{j}^{+} = \frac{\partial}{\partial x_{j}} + \frac{1}{2} \sum_{k = 1}^{m} \sum_{l = 1}^{n} γ_{j l}^{k} x_{l} \frac{\partial}{\partial z_{k}} . \end{matrix}

(53)

On the other hand, for

j = 1, \dots, m,

we have

\begin{matrix} Z_{j}^{+} = \frac{\partial}{\partial z_{j}} . \end{matrix}

(54)

Let be

(X_{1}^{*}, \dots, X_{n}^{*}, Z_{1}^{*}, \dots, Z_{m}^{*})

the dual basis of

(X_{1}, \dots, X_{n}, Z_{1}, \dots, Z_{m})

, with corresponding left invariant 1-forms

\begin{matrix} η_{j} = {(X_{j}^{*})}^{+} = d x_{j}, j = 1, \dots, n \end{matrix}

(55)

and

\begin{matrix} ν_{j} = {(Z_{j}^{*})}^{+} = d z_{j} + \frac{1}{2} \sum_{k = 1}^{n} (〈 J_{Z_{j}} (X_{k}), X 〉) d x_{k} = d z_{j} - \frac{1}{2} \sum_{i = 1}^{n} \sum_{l = 1}^{n} γ_{i l}^{j} x_{l} d x_{i} . \end{matrix}

(56)

To look for Cartan-Schouten metrics, we apply Formula (43) of Theorem 11, and we follow the steps as in Example Section 3.5.2. We summarize our results as follows.

Theorem 12.

Let G be an

(n + m)

-dimensional H-type Lie group whose Lie algebra

G

decomposes as

G = V \oplus Z

such that

[G, G] = [V, V] = Z

and

[G, Z] = 0

, where

dim V = n

and

dim Z = m

. Any Cartan-Schouten metric μ on G is of the form:

\begin{array}{l} μ = & \sum_{i = 1}^{n} [\frac{1}{4} \sum_{α, β = 1}^{m} \sum_{p, l = 1}^{n} C_{p i}^{n + α} C_{l i}^{n + β} d_{n + α, n + β} x^{p} x^{l} + \sum_{α = 1}^{m} \sum_{l = 1}^{n} γ_{i l}^{α} d_{i, n + α} x^{l} + d_{i i}] {(d x^{i})}^{2} \\ + \sum_{α = 1}^{m} d_{n + α, n + α} {(d z^{α} - \frac{1}{2} \sum_{l, p = 1}^{n} γ_{p l}^{α} x^{l} d x^{p})}^{2} \\ + \sum_{1 \leq i < j \leq n} [\frac{1}{4} \sum_{α, β = 1}^{m} \sum_{p, l = 1}^{n} C_{p i}^{n + α} C_{l j}^{n + β} d_{n + α, n + β} x^{p} x^{l} \\ + \frac{1}{2} \sum_{α = 1}^{m} \sum_{l = 1}^{n} (C_{l i}^{n + α} d_{j, n + α} + C_{l j}^{n + α} d_{i, n + α}) x^{l} + d_{i j}] d x^{i} d x^{j} \\ + \sum_{i = 1}^{n} \sum_{α = 1}^{m} [\frac{1}{2} \sum_{β = 1}^{m} \sum_{k = 1}^{n} C_{k i}^{n + β} d_{n + α, n + β} x^{k} + d_{i, n + α}] d x^{i} d z^{α} \\ - \frac{1}{2} \sum_{i, l, p = 1}^{n} \sum_{j = 1}^{m} [\frac{1}{2} \sum_{l = n + 1}^{n + m} \sum_{k = 1}^{n} C_{k i}^{n + l} d_{j l} x_{k} + d_{i, n + j}] γ_{p l}^{j} x^{l} d x^{i} d x^{p} \\ + \sum_{1 \leq α < β \leq m} d_{n + α, n + β} (d z^{α} - \frac{1}{2} \sum_{p = 1}^{n} \sum_{l = 1}^{n} γ_{p l}^{α} x^{l} d x^{p}) (d z^{β} - \frac{1}{2} \sum_{p = 1}^{n} \sum_{l = 1}^{n} γ_{p l}^{β} x^{l} d x^{p}), \end{array}

(57)

where the

d_{α β}

’s are real constants numbers such that the determinant of the matrix

{(d_{α β})}_{α, β = 1, \dots, m + n}

is nonzero, and

γ_{i j}^{k} = C_{i j}^{n + k} = - C_{j i}^{n + k}

are real numbers considered as parameters.

Note that each parameter

γ = (γ_{i j}^{k})

defines a 2-nilpotent Lie algebra and Lie group of H-type, as in (50). All the parameters

(γ_{i j}^{k})

belong to the manifold of 2-nilpotent Lie algebras, which in turn, sits inside that of nilpotent Lie algebras. Fixing

(γ_{i j}^{k})

and varying

{(d_{α β})}_{α, β = 1, \dots, m + n}

amounts to fixing the Lie algebra and looking for all the Cartan-Schouten metrics on it. Note also that the coefficients of the metrics are degree 2 polynomials.

3.5.4. Biinvariant Metrics on 2-nilpotent Lie Groups

We summarize some of the results gathered from [6,32] as follows.

Proposition 4.

[6,32]Let

G

be a 2-nilpotent Lie algebra with center

Z

satisfying

[G, G] = Z

. Suppose

G

has an ad-invariant metric, say μ. Then

G

must be even dimensional. Moreover, there exists a subspace V such that,

Z

and V are Lagrangian subspaces of

G

in duality with respect to μ. More precisely, we have the following,

\begin{matrix} μ (Z, Z) & = & μ (V, V) = 0, 2 dim V = 2 dim Z = dim G, \end{matrix}

(58)

\begin{matrix} G & = & V \oplus Z (direct sum of vector spaces) . \end{matrix}

(59)

So, every ad-invariant metric on a 2-nilpotent Lie algebra has signature

(n, n) .

We get the following more precise result.

Proposition 5.

Let

G

be a 2-nilpotent Lie algebra of dimensions

2 n,

with center

Z

satisfying

[G, G] = Z

. Suppose

G

has an ad-invariant metric, say μ. Then we can choose a subspace V complementary to

Z

and a basis

(e_{1}, \dots, e_{2 n})

of

G

, where

(e_{1}, \dots, e_{n})

and

(e_{n + 1}, \dots, e_{2 n})

are bases of V and

Z

respectively, such that

μ (e_{i}, e_{j}) = μ (e_{n + i}, e_{n + j}) = 0

and

μ (e_{i}, e_{n + j}) = δ_{i, j},

i, j = 1, \dots, n .

The Lie bracket of

G

reads

[e_{i}, e_{j}] = \sum_{k = 1}^{n} C_{i j}^{n + k} e_{n + k}

, where the structure constants

C_{i j}^{n + k}

satisfy the following identities:

\begin{matrix} C_{i j}^{n + k} & = & C_{j k}^{n + i} = - C_{i k}^{n + j} . \end{matrix}

(60)

Proof.

The first claim is a general fact which we prove by induction on the dimension

2 n

of the vector space, say

E,

underlying

G

. This is true when

n = 1

. Indeed, suppose a 2-dimensional vector space E has a nondegenerate bilinear symmetric form

μ

such that a line

R v

such that is

μ (v, v) = 0 .

Let

\tilde{v}

be an element of

E

satisfying

μ (v, \tilde{v}) \neq 0

. By setting

\begin{matrix} v^{*} : = \frac{1}{μ (v, \tilde{v})} (\tilde{v} - \frac{μ (\tilde{v}, \tilde{v})}{2 μ (v, \tilde{v})} v), \end{matrix}

(61)

we get a basis

(v, v^{*})

of E satisfying

\begin{matrix} μ (v, v) = μ (v^{*}, v^{*}) = 0, and μ (v, v^{*}) = 1 . \end{matrix}

(62)

Now suppose this property true up to order

n - 1

, for some

n \geq 2

and let us show that it also holds true for

n .

Let E be a

2 n

-dimensional vector space endowed with a nondegenerate bilinear symmetric form

μ

such that an n-dimensional subspace

\tilde{E}

is totally isotropic with respect to

μ .

Let v be a nonzero vector of E not belonging to

\tilde{E} .

The linear form

f_{v}

defined on

\tilde{E}

by

f_{v} (x) = μ (v, x)

for any

x \in \tilde{E},

is nonzero and

dim ker f_{v} = n - 1 .

So we write

\tilde{E}

as

\tilde{E} = R \tilde{v} \oplus ker (f_{v})

for some

\tilde{v} \in \tilde{E}

satisfying

0 \neq f_{v} (\tilde{v}) = μ (v, \tilde{v}) .

In the 2-dimensional nondegenerate vector space

E : = R v \oplus R \tilde{v}

we apply (61) to get a basis

(e_{n} : = \frac{1}{μ (\tilde{v}, v)} (v - \frac{μ (v, v)}{2 μ (\tilde{v}, v)} \tilde{v}), e_{2 n} : = \tilde{v})

satisfying (62). The orthogonal

E^{⊥}

of

E

in E with respect to

μ

, is a

2 (n - 1)

-dimensional nondegenerate vector space that contains

ker (f_{v})

as a totally isotropic

(n - 1)

-dimensional subspace and we have the decomposition

E = E \oplus E^{⊥}

. By hypothesis on the dimension

2 (n - 1)

, there is a basis

(e_{1}, \dots, e_{n - 1}, e_{n + 1}, \dots, e_{2 n - 1})

of

E^{⊥}

such that

(e_{n + 1}, \dots, e_{2 n - 1})

is a basis of

ker (f_{v})

and

μ (e_{i}, e_{j}) = μ (e_{n + i}, e_{n + j}) = 0

and

μ (e_{i}, e_{n + j}) = δ_{i, j}

, for every

i, j = 1, \dots, n - 1 .

Altogether, we have a basis

(e_{1}, \dots, e_{2 n})

of E satisfying

μ (e_{i}, e_{j}) = μ (e_{n + i}, e_{n + j}) = 0

and

μ (e_{i}, e_{n + j}) = δ_{i, j}

, for every

i, j = 1, \dots, n .

, such that

(e_{n + 1}, \dots, e_{2 n})

is a basis of the Lagrangian subspace

\tilde{E} .

The second claim is proved as follows. From the following equalities, due to the ad-invariance,

\begin{matrix} μ ([e_{i}, e_{j}], e_{k}) & = & μ (e_{i}, [e_{j}, e_{k}]) = - μ (e_{j}, [e_{i}, e_{k}]), \end{matrix}

we get

\begin{matrix} \sum_{m = 1}^{n} C_{i j}^{n + m} μ (e_{n + m}, e_{k}) & = & \sum_{p = 1}^{n} C_{j k}^{n + p} μ (e_{i}, e_{n + p}) = - \sum_{q = 1}^{n} C_{i k}^{n + q} μ (e_{j}, e_{n + q}), \end{matrix}

(63)

which simplifies to

\begin{matrix} C_{i j}^{n + k} & = & C_{j k}^{n + i} = - C_{i k}^{n + j} . \end{matrix}

(64)

□

Note that the above equalities (60) imply that

\begin{matrix} C_{i j}^{n + i} = C_{i j}^{n + j} = 0, i, j = 1, \dots, n . \end{matrix}

(65)

4. Dual Connections and Statistical Structures

4.1. Statistical Structures

Let G be a Lie group with a Cartan-Schouten metric

μ .

For any totally symmetric covariant 3-tensor S on G, define the tensor A as

\begin{matrix} μ (A (X, Y), Z) = S (X, Y, Z) . \end{matrix}

(66)

From (66), one extracts the corresponding dual connections

\nabla^{1}

and

\nabla^{- 1}

to get

\nabla^{1} : = \nabla - \frac{1}{2} A

and

\nabla^{- 1} : = \nabla + \frac{1}{2} A .

That is,

\begin{matrix} \nabla_{X}^{1} Y : = \frac{1}{2} ([X, Y] - A (X, Y)), \nabla_{X}^{- 1} Y : = \frac{1}{2} ([X, Y] + A (X, Y)), \end{matrix}

(67)

for any vector fields

X, Y

on

G .

It is readily seen that both

\nabla^{1}

and

\nabla^{- 1}

are torsion-free, dual with respect to

μ

and satisfy

\nabla^{1} \cdot μ = - \nabla^{- 1} \cdot μ = S .

However, since

μ

is not left (nor right) invariant in general, A is not left invariant in general if we choose S to be left invariant. We will call a left invariant Cartan-Schouten statistical structure, a triplet

(G, μ, \bar{\nabla})

where G is a Lie group,

μ

and

\bar{\nabla}

are respectively a Cartan-Schouten metric and a left invariant connection on G such that

\bar{\nabla} μ

is a totally symmetric 3-tensor. From now on,

\bar{\nabla}

is considered to be torsion free. If we write

\bar{\nabla}

as

\bar{\nabla} = \nabla - \frac{1}{2} t,

with

t : G \times G \to G

symmetric. We set

μ_{ϵ} = : \bar{μ} .

Then the total symmetry identity

\bar{\nabla} μ (x^{+}, y^{+}, z^{+}) = \bar{\nabla} μ (y^{+}, x^{+}, z^{+})

taken at the unit

ϵ,

is equivalent to

\begin{matrix} \bar{μ} (t (x, y), z) + \bar{μ} (y, t (x, z)) = \bar{μ} (t (y, x), z) + \bar{μ} (x, t (y, z)) . \end{matrix}

(68)

A Lie algebra together with a metric and a (locally) flat torsion free connection satisfying (68), is called a Hessian Lie algebra (see e.g. [3,38]). We have proved the following.

Proposition 6.

There is a 1-1 correspondence between left invariant Hessian structures (

\bar{μ}, \bar{\nabla}

) where the metric

\bar{μ}

satisfies (30) and flat statistical structures on Lie groups associated to Cartan-Schouten metrics together with left invariant connections.

4.2. Biinvariant Dual Connections

As above, ∇ stands for the Cartan-Schouten canonical connection. A connection on

G

is of the form

{\bar{\nabla}}_{x} y = \frac{1}{2} [x, y] - \frac{1}{2} k (x, y),

where

k : G \times G \to G

is some bilinear map. Write

k = k^{s k e w} + k^{s y m},

where

k^{s k e w} (x, y) = \frac{1}{2} (k (x, y)) - k (y, x))

and

k^{s y m} (x, y) = \frac{1}{2} (k (x, y)) + k (y, x)) .

The torsion

T^{\bar{\nabla}}

is given by

T^{\bar{\nabla}} = - k^{s y m} .

So the connection

\bar{\nabla}

is torsion-free if and only if

k

is symmetric. If

k

is skew-symmetric, then obviously, the connections ∇ and

\bar{\nabla}

share the same geodesics. The connection

\bar{\nabla}

is biinvariant if and only if it satisfies the following equation (see e.g. [4,33])

\begin{matrix} [z, {\bar{\nabla}}_{x} y] = {\bar{\nabla}}_{[z, x]} y + {\bar{\nabla}}_{x} [z, y] . \end{matrix}

(69)

We deduce that (69) is equivalent to the following

\begin{matrix} [z, k (x, y)] = k ([z, x], y) + k (x, [z, y]), \end{matrix}

(70)

or another equivalent statement is that the covariant derivative of

k

vanishes,

\nabla k = 0 .

If

k

is skew-symmetric, the biinvariance condition is also equivalent to saying that

k

is a 2-cocycle for the adjoint representation of

(G, [,]) .

That is, the Chevalley-Eilenberg differential of

k

vanishes:

\begin{matrix} \partial k (x, y, z) : = [x, k (y, z)] - k ([x, y], z) - k (y, [x, z]) = 0, \end{matrix}

(71)

for any

x, y, z \in G .

In particular, for any linear map

ψ : G \to G,

the coboundary

k_{ψ} = \partial ψ = 2 \nabla ψ,

defined by

k_{ψ} (x, y) = [x, ψ (y)] - ψ ([x, y])

, is such that

\nabla^{ψ} : = \nabla - \frac{1}{2} k_{ψ}

is a biinvariant connection for which every 1-parameter subgroup through the unit of the Lie group, is a geodesic.

We summarize the above in the following

Proposition 7.

The is a 1-1 correspondence between biinvariant Cartan-Schouten connections on a Lie group and the second space cocyles

k : \land^{2} G \to G,

for the adjoint action of

G .

Any bilinear map

\bar{k} : G / [G, G] \times G / [G, G] \to Z (G)

naturally lift up to a biinvariant connection

k

on

G .

Here the Lie algebra

G / [G, G]

is the quotient of

G

and its derived ideal

G / [G, G,

and

Z (G)

is the center of

G .

The Lie algebra

G

is 2-nilpotent if and only if

[G, G] \subset Z (G) .

So the dimension of the space of biinvariant connections is greater than the dimension of the space of bilinear maps

\bar{k} : G / [G, G] \times G / [G, G] \to [G, G] .

We look at the case where

k

is symmetric, so that the connection on

G

given by

{\bar{\nabla}}_{x} y = \frac{1}{2} [x, y] - \frac{1}{2} k (x, y),

is torsion free. We define the 3-tensor

S,

with

\begin{matrix} S (x, y, z) : = μ (k (x, y), z) . \end{matrix}

(72)

The total symmetry of S is equivalent to the relation

\begin{matrix} μ (k (x, y), z) = μ (y, k (x, z)), \end{matrix}

(73)

for any

x, y, z \in G .

The following parameter family of biinvariant

α

-connections

\begin{matrix} \nabla_{x}^{α} y : = \frac{1}{2} ([x, y] - α k (x, y)), \nabla_{x}^{- α} y : = \frac{1}{2} ([x, y] + α k (x, y)) \end{matrix}

(74)

is such that

\nabla^{α}

and

\nabla^{- α}

are dual with respect to

μ,

for any

α \in R .

We also have

\begin{matrix} {\bar{\nabla}}^{α} μ (x^{+}, y^{+}, z^{+}) = \frac{α}{2} (μ (k (x^{+}, y^{+}), z^{+}) + μ (y^{+}, k (x^{+}, z^{+}))) . \end{matrix}

(75)

So if S is symmetric, then

{\bar{\nabla}}^{α} μ = α S = - {\bar{\nabla}}^{- α} μ .

Here is a method for constructing biinvariant torsion free flat connections.

Theorem 13.

Let G be a Lie group,

G

its Lie algebra,

Z (G) \neq 0

the center of

G .

Let

X_{1}, \dots, X_{p} \in Z (G),

where

p \geq 1

is an integer. Let

B_{j} : G \times G \to R,

be bilinear symmetric with

\nabla B_{j} = 0,

j = 1, \dots, p .

The map

k : G \times G \to G,

k (x, y) = \sum_{j = 1}^{p} B_{j} (x, y) X_{j},

defines a biinvariant torsion-free connection

\bar{\nabla} = \nabla - \frac{1}{2} k .

If G is 2-nilpotent and

B_{j} (x, X_{k}) = 0,

j, k = 1, \dots, p,

for any

x \in G,

then

\bar{\nabla}

is flat. In particular, for any closed 1-forms

f_{j},,

j = 1, \dots, p,

set

B : = \sum_{j = 1}^{p} f_{j} \otimes f_{j}

and

k (x, y) : = B (x, y) X,

for some

X \in Z (G) .

Then

\bar{\nabla} = \nabla - \frac{1}{2} k

is biinvariant, torsion free and flat.

Proof.

Suppose

B_{j}

are symmetric bilinear forms on

G

such that

\nabla B_{j} = 0,

and

X_{j} \in Z (G),

j = 1, \dots, p .

Consider the symmetric bilinear map

k : G \times G \to G,

k (x, y) = \sum_{j = 1}^{p} B_{j} (x, y) X_{j} .

By abuse of notation, we use here the same notations for both a quantity in the Lie algebra and the corresponding left invariant quantity. For example a left invariant vector field

x^{+}

is simply denoted by

x .

We use the fact that covariant derivative of each

B_{j}

vanishes, that is,

\begin{matrix} \nabla B_{j} (x, y, z) = - \frac{1}{2} (B_{j} ([x, y], z) + B_{j} (y, [x, z])) = 0 . \end{matrix}

(76)

Since

[x_{0}, x] = 0,

for any

x \in G,

the equation which expresses the biinvariance reads

\begin{matrix} [x, k (y, z)] - k ([x, y], z) - k (y, [x, z]) & = & \sum_{j = 1}^{p} (B_{j} (y, z) [x, X_{j}] \\ + 2 (\nabla B_{j} (x, y, z)) X_{j}) = 0 . \end{matrix}

(77)

Thus the connection

\bar{\nabla} = \nabla - \frac{1}{2} k

is biinvariant and, since

t

is symmetric,

\bar{\nabla}

is also torsion-free. If

B_{j} (x, X_{k}) = 0,

for any

x \in G

and

j, k = 1, \dots, p,

the curvature of

\bar{\nabla}

coincides with that of

\nabla,

\begin{matrix} R (x, y) z = - \frac{1}{4} [[x, y], z] . \end{matrix}

(78)

So if in addition,

G

is 2-nilpotent then

R (x, y) z = 0,

for any

x, y, z \in G .

□

5. A New Model for Statistics, Machine Learning and Data Science

5.1. On 2-nilpotent Lie Group Structures on $R^{N}$

2-step nilpotent Lie groups are the only Lie groups for which the Cartan-Schouten connection is flat hence entailing that any Cartan-Schouten metric is a Hessian metric. Thus their study benefits from the nice properties of Hessian metrics. Moreover, the fact that the Riemannian mean of any Cartan-Schouten metric coincides with the biinvariant means (exponential barycenter) of the Lie group as discussed in Section 5.2, makes them much more attractive.

Here is our main point: seen as a manifold,

R^{N}

is the (common) universal cover for all nilpotent Lie groups, thus we can look at any phenomenon happening in

R^{N}

as a phenomenon sitting on a manifold home to infinitely many nilpotent Lie group structures. So we can look for those Lie group structures which are compatible with the phenomenon at hand, i.e. those Lie group structures for which our phenomenon is left or right invariant, biinvariant, or the group is a group of symmetries, etc. Along these lines, in the present paper we concentrate on the case of 2-step nilpotent Lie groups which offer many advantages, with many foreseen applications.

5.2. Exponential Barycenter

The group exponential barycenter of a dataset (

σ_{i}

) is a solution

m

(if it exists, see[29]) of the following barycenter equation

\begin{matrix} \sum_{i = 1}^{p} log (m^{- 1} σ_{i}) = 0, \end{matrix}

(79)

where log is (locally) the inverse of the exponential map. In our case, the exponential map of Cartan-Schouten metrics coincides with that of the Lie group. Thus, the solution of (79) coincides with the (biinvariant) mean of the Lie group. Equation (79) may not admit a unique solution, or may not even admit a solution at all. We will rather use the terminology of cloud, instead of dataset.

5.3. On a New Model of Parametric Means

Applying the above study and discussions, we propose here a brand new model of parametric mean, say

m,

for statistics, machine learning and data science. Since the parameter evolves in the whole manifold of 2-nilpotent Lie algebras, one enjoys a wide room of parameters to manoeuvre, unlike the traditional methods such as the arithmetic mean, median, mode, expectation, least square method, maximum likelihood, linear regression, ... This is particularly suitable for fitting data or estimating the parameters amid several constraints. The space of application of this mean is the ordinary Euclidean space

R^{N}

or any vector space of dimension N, for any given integer

N \geq 2 .

We first partition N into an arbitrary (

n, m

), where

n + m = N .

For a cloud of p points

{σ_{i}}_{i = 1, \dots, p}

where we denote the coordinates of each point

σ_{i}

by

(x_{i}^{1}, \dots, x_{i}^{n + m}) \in R^{n + m},

we will let

E (x^{r})

stand for the arithmetic mean of the

r -

th components

E (x^{r}) : = \frac{1}{p} \sum_{i}^{p} x_{i}^{r},

r = 1, \dots, n + m .

Theorem 14.

Every cloud (dataset) of points in

R^{n + m}

admits a biinvariant mean for some parameter family of Lie group structures on

R^{n + m}

which is also the common Riemannian mean of infinitely many Riemannian and pseudo-Riemannian metrics. More precisely, the mean

m = (m^{1}, \dots, m^{n + m})

of a cloud of p points

{σ_{i}}_{i = 1, \dots, p}

where

σ_{i} = (x_{i}^{1}, \dots, x_{i}^{n + m}) \in R^{n + m},

is given, for

k = 1, \dots, n,

q = 1, \dots, m,

by

m^{k} = E (x^{k}) and m^{n + q} = E (x^{n + q}) + \frac{1}{4} \sum_{j, l = 1}^{p} γ_{j l}^{q} (E (x^{j}) E (x^{l}) - \frac{1}{p} \sum_{i = 1}^{p} x_{i}^{j} x_{i}^{l}) .

(80)

The parameters

{(γ_{j l}^{q})}_{i, j = 1, \dots, n, q = 1, \dots, m}

are real numbers satisfying

γ_{j l}^{q} = - γ_{l j}^{q} .

Proof.

Let

u \in R^{n + m}

and

{σ_{i}}_{i = 1, \dots, p}

a cloud of points of

R^{n + m}

with components

σ_{i} = (x_{i}^{1}, \dots, x_{i}^{n + m}) .

We endow

R^{n}

with the set of 2-nilpotent Lie group structures (50). We denote by

u^{- 1} = (u^{1}, \dots, u^{n + m}),

the inverse of u for the parametric Lie group structures (50). By the notation

γ_{j l}^{q} u^{j} x_{i}^{l}

, we mean

\sum_{j, l = 1}^{p} γ_{j l}^{q} u^{j} x_{i}^{l}

,so that

u^{- 1} σ_{i} = (u^{1} + x_{i}^{1}, \dots, u^{n} + x_{i}^{n}, u^{n + 1} + x_{i}^{n + 1} + \frac{1}{2} γ_{j l}^{1} u^{j} x_{i}^{l}, \dots, u^{n + m} + x_{i}^{n + m} + \frac{1}{2} γ_{j l}^{m} u^{j} x_{i}^{l})

(81)

and then

\begin{matrix} log (u^{- 1} σ_{i}) & = & (u^{1} + x_{i}^{1}, \dots, u^{n} + x_{i}^{n}, u^{n + 1} + x_{i}^{n + 1} + \frac{1}{2} γ_{j l}^{1} u^{j} x_{i}^{l} - \frac{1}{4} γ_{j l}^{1} (u^{j} + x_{i}^{j}) (u^{l} + x_{i}^{l}), \dots, \\ u^{n + m} + x_{i}^{n + m} + \frac{1}{2} γ_{j l}^{m} u^{j} x_{i}^{l} - \frac{1}{4} γ_{j l}^{m} (u^{j} + x_{i}^{j}) (u^{l} + x_{i}^{l})) \\ = & (u^{1} + x_{i}^{1}, \dots, u^{n} + x_{i}^{n}, u^{n + 1} + x_{i}^{n + 1} + \frac{1}{4} γ_{j l}^{1} (u^{j} x_{i}^{l} - u^{l} x_{i}^{j} - u^{j} u^{l} - x_{i}^{j} x_{i}^{l}), \dots, \\ u^{n + m} + x_{i}^{n + m} + \frac{1}{4} γ_{j l}^{m} (u^{j} x_{i}^{l} - u^{l} x_{i}^{j} - u^{j} u^{l} - x_{i}^{j} x_{i}^{l})) . \end{matrix}

(82)

In order to find the biinvariant mean relative to (50), we need to solve the equation

\begin{matrix} \sum_{i = 1}^{p} log (u^{- 1} σ_{i}) = 0 . \end{matrix}

(83)

From the equations

p u^{k} + \sum_{i = 1}^{p} x_{i}^{k} = 0,

we get

u^{k} = - E (x^{k})

(84)

whereas, the equations

p u^{n + q} + \sum_{i = 1}^{p} (x_{i}^{n + q} + \frac{1}{4} γ_{j l}^{1} (u^{j} x_{i}^{l} - u^{l} x_{i}^{j} - u^{j} u^{l} - x_{i}^{j} x_{i}^{l}))

lead to

\begin{matrix} u^{n + q} & = & - E (x^{n + q}) + \frac{1}{4} γ_{j l}^{q} (- u^{j} E (x^{l}) + u^{l} E (x^{j}) + u^{j} u^{l} + \frac{1}{p} \sum_{i = 1}^{p} x_{i}^{j} x_{i}^{l}) \\ = & - E (x^{n + q}) + \frac{1}{4} γ_{j l}^{q} (E (x^{j}) E (x^{l}) - E (x^{l}) E (x^{j}) + E (x^{j}) E (x^{l}) + \frac{1}{p} \sum_{i = 1}^{p} x_{i}^{j} x_{i}^{l}) \\ = & - E (x^{n + q}) + \frac{1}{4} γ_{j l}^{q} (E (x^{j}) E (x^{l}) + \frac{1}{p} \sum_{i = 1}^{p} x_{i}^{j} x_{i}^{l}) . \end{matrix}

(85)

The needed mean

m

is the inverse relative to the group product (50), of

u^{- 1}

with components

(u^{k}, u^{n + q})

given in (84) and (85), with

k = 1, \dots, n

and

q = 1, \dots, m .

We thus deduce the components

(m^{1}, \dots, m^{n}, m^{n + 1}, \dots, m^{n + m})

of m as follows

m^{k} = E (x^{k}) and m^{n + q} = E (x^{n + q}) + \frac{1}{4} \sum_{j, l = 1}^{p} γ_{j l}^{q} (E (x^{j}) E (x^{l}) - \frac{1}{p} \sum_{i = 1}^{p} x_{i}^{j} x_{i}^{l}) .

(86)

□

5.4. More Discussions on the New Model of Parametric Means

In order to apply this mean to a cloud (dataset) of points in the Euclidean space

R^{N}

for an integer

N \geq 2,

we have partitioned N into (

n, m

) with

n + m = N .

The choice of the partition (

n, m

) of N depends on the studied problem and is thus left to the user.

Note that, for each

k = 1, \dots, n,

the k-th component of

m

, is the arithmetic mean of the k-th components of all the points in the cloud

\begin{matrix} m^{k} = E (x_{i}^{k}) : = \frac{1}{p} \sum_{i} x_{i}^{k}, \end{matrix}

(87)

whereas, for each

q = 1, \dots, m,

the

(n + q)

-th component of

m

, is the sum of the arithmetic mean of the

(n + q)

-th components of all the points in the cloud and a linear combination of all the

E (x^{j}) E (x^{l}) - \frac{1}{p} \sum_{i = 1}^{p} x_{i}^{j} x_{i}^{l}

weighted by the parameters

γ_{j l}^{q} .

The terms

E (x^{j}) E (x^{l}) - \frac{1}{p} \sum_{i = 1}^{p} x_{i}^{j} x_{i}^{l}

are related to the variance of the dataset, as will be more explicitly explained in subsequent works. More importantly, one can adjust, fix or estimate the parameters

γ_{j l}^{q}

to better fit a problem at hand. Even better yet, each fixed value of the parameter

γ = (γ_{j l}^{q})

defines a different 2-nilpotent (simply connected) Lie group structure on

R^{n + m}

and lives in the smooth manifold of nilpotent Lie algebra structures on

R^{n + m} .

Since they (the

(γ_{j l}^{q})

’s) can vary smoothly, one can make them undergo differential or partial differential equations, series and limits, if the studied phenomenon requires so. One remarks that when the parameter vanishes,

γ_{j l}^{q} = 0

for any

j, l = 1, \dots, n,

q = 1, \dots, m,

then

m

coincides with the arithmetic mean of the dataset (

σ_{j}

).

Note also that the mean

m

in (80) is the common Riemannian mean of all the infinitely many Cartan-Schouten metrics (57). So, one can choose the best metric to pair with

m

, depending on the studied problem. Given that the set of such metrics is also a smooth manifold of dimension

\frac{1}{2} (m + n) (n + m + 1)

as insured by Theorem 10), one has a wide range of metrics to choose from. This flexibility of choice, fitting,..., could be a good advantage over other traditional tools such as the ordinary expectation, the least square, etc.

References

Amari, S.: Information geometry and its applications. Appl. Math. Sci.,194. Springer, (Tokyo), 2016. [CrossRef]
Amari, S.: Differential-Geometrical Methods in Statistics. Lecture Notes in Statistics 28 (Springer, New York, 1985). [CrossRef]
Aubert, A. and Medina, A.: Groupes de Lie pseudo-riemanniens plats. Tohoku Math. J.(2) 55, no. 4, 487-506 (2003). [CrossRef]
Benayadi, S. and Boucetta, M.: Special bi-invariant linear connections on Lie groups and finite dimensional Poisson structures. Differential Geom. Appl. 36, 66-89 (2014). [CrossRef]
Benayadi, S. and Elduque, A.: Classification of quadratic Lie algebras of low dimension. J. Math. Phys. 55, 081703 (2014). [CrossRef]
Benito, P.; de-la-Concepción, D.; Roldán-López, J. and Sesma, I., Quadratic 2-step Lie algebras: computational algorithms and classification. J. Symbolic Comput. 94, 70-89 (2019). [CrossRef]
Cartan, E. and Schouten, J.A.: On the geometry of the group-manifold of simple and semi-simple groups. Proc. Akad. Wekensch, Amsterdam 29 (1926) 803-815.
Diatta, A.; Manga B. and Sy, F.: On dual quaternions, dual split quaternions and Cartan-Schouten metrics on perfect Lie groups. Trends Math. Birkhäuser/Springer, Cham, 2024, pp 317-339. [CrossRef]
Diatta, A., Géométrie de Poisson et de contact des espaces homogènes. Ph.D. Thesis. University Montpellier 2, France (2000).
Diatta A., Left Invariant Contact Structures on Lie Groups. Diff. Geom. Appl. 26 (2008), no. 5, 544-552. [CrossRef]
Eberlein, P. Geometry of 2-step nilpotent Lie groups. Modern dynamical systems and applications, 67-101. Cambridge University Press, Cambridge, 2004.
Eberlein, P. Geometry of 2-step nilpotent groups with a left invariant metric. Ann. Sci. Ecole Norm. Sup. (4) 27, no. 5, 611-660 (1994). [CrossRef]
Eberlein, P. Geometry of 2-step nilpotent groups with a left invariant metric. II Trans. Amer. Math. Soc. 343, no. 2, 805-828 (1994).
García T. N. Gromov-Hausdorff limit of Wasserstein spaces on point clouds. Calc. Var. Partial Differential Equations 59, no.2, Paper No. 73, 43 pp (2020). [CrossRef]
Godoy M.M.; Kruglikov, B.; Markina, I. and Vasil’ev, A.: Rigidity of 2-step Carnot groups. J. Geom. Anal.28, no.2, 1477-1501 (2018). [CrossRef]
Kumari, S. and Pestov, V. G. Universal consistency of the k-NN rule in metric spaces and Nagata dimension. II ESAIM Probab. Stat. 28, 132-160 (2024). [CrossRef]
Figueroa-O’Farrill, J. and Stanciu, S.: On the structure of symmetric self-dual Lie algebras. J. Math. Phys. 37 (8), 4121-4134 (1996). [CrossRef]
Gallier J. and Quaintance, J.: Differential geometry and Lie groups - a computational perpective. Geometry and Computing, 12. Springer, Cham, 2020, 777pp. [CrossRef]
Ghanam, R.; Hindeleh, F. and Thompson, G.; Bi-invariant and noninvariant metrics on Lie groups. J. Math. Phys. 48 (2007), no. 10, 102903. [CrossRef]
Lauritzen, S. L.: Statistical manifolds In Differential Geometry in Statistical Inferences. IMS Lecture Notes Monogr. Ser., 10, Inst. Math. Statist., Hayward California, 1987, pp. 96-163.
Lê, H. V.:Statistical manifolds are statistical models. J. Geom. 84, no. 1-2, 83-93 (2005). [CrossRef]
Lorenzi, M. and Pennec, X.: Geodesics, parallel transport and one-parameter subgroups for diffeomorphic image registration. Int. J. Comput. Vis. 105, no. 2, 111-127 (2013). [CrossRef]
Matumoto, T.: Any statistical manifold has a contrast function. - On the C³-functions taking the minimum at the diagonal of the product manifold. Hiroshima Math. J. 23, 327-332 (1993).
Matsuzoe, H.: Geometry of statistical manifolds and its generalization. In Proceedings of the 8th International Workshop on Complex Structures and Vector Fields. World Scientific, 2007, pp. 244-251. [CrossRef]
Matsuzoe, H.: Statistical manifolds and affine differential geometry. Advanced Studies in Pure Mathematics 57, 2010 Probabilistic Approach to Geometry pp. 303-321.
Medina, A. Groupes de Lie munis de métriques biinvariantes. Tôhoku Math. J. 37, 405-421 (1985). [CrossRef]
Medina, A. and Revoy, Ph.: Algèbres de Lie et produit scalaire invariant. Ann. Scient. Ec. Norm. Sup., 4e serie, 18, no 3, 553-561 (1985).
Milnor, J.: Curvatures of left invariant metrics on Lie groups. Adv. Math. 21, no. 3, 293-329 (1976). [CrossRef]
Miolane, N. and Pennec, X.; Computing bi-invariant pseudo-metrics on Lie groups for consistent statistics. Entropy 17, no. 4, 1850-1881 (2015). [CrossRef]
Miolane, N. and Pennec X.; Statistics on Lie groups: a need to go beyond the pseudo-Riemannian framework. AIP Conference Proceedings 1641 (1) 59-66 (2015). [CrossRef]
Nomizu, K.: Left-invariant Lorentz metrics on Lie groups. Osaka J. Math. 16, no. 1, 143-150 (1979).
Noui, L. and Revoy, Ph., Algèbres de Lie orthogonales et formes trilinéaires alternées. Comm. Algebra 25, no. 2, 617-622 (1997). [CrossRef]
Pennec, X.: Bi-invariant means on Lie groups with Cartan-Schouten connections. Geometric Science of Information, 59-67, Lecture Notes in Comput. Sci, 8085, Springer, Heidelberg, 2013. [CrossRef]
Pennec, X. Intrinsic statistics on Riemannian Manifolds: Basic tools for geometric measurements. J. Math. Imaging Vision 25 (1), 127-154 (2006). [CrossRef]
Phillips, N.C.: How many exponentials? Amer. J. Math. 116, no. 6, 1513-1543 (1994).
Rawashdeh, M. and Thompson, G. The inverse problem for six-dimensional codimension two nilradical Lie algebras. J. Math. Phys. 47, no. 11, 112901, 29 pp (2006). [CrossRef]
Samereh, L.; Peyghan, E. and Mihai, I.: On almost Norden statistical manifolds. Entropy 24, no. 6, Paper No. 758, 10 pp (2022). [CrossRef]
Shima, H. Homogeneous Hessian manifolds. Ann. Inst. Fourier (Grenoble) 30 (1980), 91-128. [CrossRef]
Strugar, I.; Thompson, G.: Inverse problem for the canonical Lie group connection. Houston J. Math. 35, no. 2, 373-409 (2009).
Sy, F.; Restricted Inverse problem of Langrangian dynamics for the Cartan-Schouten canonical connection and applications. Ph.D Thesis. Université C.A. Diop. In preparation.
Thompson, G. Metrics compatible with a symmetric connection in dimension three. J. Geom. Phys. 19, 1-17 (1996). [CrossRef]
Tondeur, Ph.: Sur certaines connexions naturelles d’un groupe de Lie. Applications. Séminaire Ehresmann. Topologie et géométrie différentielle, tome 6 (1964), exp. no 5, p1-9.
Watanabe, S.: Algebraic geometry and statistical learning theory. Cambridge Monogr. Appl. Comput. Math., 25. Cambridge University Press, Cambridge, 2009.
Zefran,M.; Kumar, V. and Croke, C.; Metrics and Connections for Rigid-Body Kinematics. The International Journal of Robotics Research, 18 (2) 242 (1999). [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Cartan-Schouten Metrics for Information Geometry and Machine Learning

Abstract

Keywords:

Subject:

1. Introduction

2. On Information Geometry on Lie Groups

2.1. Fisher Information Metric, Amari-Chentsov 3-Tensor, $α$ -Connections

2.2. Information Geometry Using Cartan-Schouten Metrics

2.3. Some Advantages and More Motivations

2.4. Hypersurfaces, Totally Geodesic Submanifolds

3. Cartan-Schouten Metrics on Lie Groups

3.1. General Results

3.2. Riemannian Cartan-Schouten Metrics

3.3. Lorentzian Cartan-Schouten Metrics

3.3.1. A General Result on Lorentzian Cartan-Schouten Metrics

3.3.2. The Oscillator Lie algebras and Lie Groups

3.4. Case Where the Exponential Map Is a Diffeomorphism

3.5. Cartan-Schouten Metrics on 2-nilpotent Lie Groups

3.5.1. Proof of Theorem 10

3.5.2. The Heisenberg Lie Group $H_{2 n + 1}$

3.5.3. Cartan-Schouten Metrics on Carnot Groups

3.5.4. Biinvariant Metrics on 2-nilpotent Lie Groups

4. Dual Connections and Statistical Structures

4.1. Statistical Structures

4.2. Biinvariant Dual Connections

5. A New Model for Statistics, Machine Learning and Data Science

5.1. On 2-nilpotent Lie Group Structures on $R^{N}$

5.2. Exponential Barycenter

5.3. On a New Model of Parametric Means

5.4. More Discussions on the New Model of Parametric Means

References

MDPI Initiatives

Important Links

Subscribe

Cartan-Schouten Metrics for Information Geometry and Machine Learning

Abstract

Keywords:

Subject:

1. Introduction

2. On Information Geometry on Lie Groups

2.1. Fisher Information Metric, Amari-Chentsov 3-Tensor, α -Connections

2.2. Information Geometry Using Cartan-Schouten Metrics

2.3. Some Advantages and More Motivations

2.4. Hypersurfaces, Totally Geodesic Submanifolds

3. Cartan-Schouten Metrics on Lie Groups

3.1. General Results

3.2. Riemannian Cartan-Schouten Metrics

3.3. Lorentzian Cartan-Schouten Metrics

3.3.1. A General Result on Lorentzian Cartan-Schouten Metrics

3.3.2. The Oscillator Lie algebras and Lie Groups

3.4. Case Where the Exponential Map Is a Diffeomorphism

3.5. Cartan-Schouten Metrics on 2-nilpotent Lie Groups

3.5.1. Proof of Theorem 10

3.5.2. The Heisenberg Lie Group H 2 n + 1

3.5.3. Cartan-Schouten Metrics on Carnot Groups

3.5.4. Biinvariant Metrics on 2-nilpotent Lie Groups

4. Dual Connections and Statistical Structures

4.1. Statistical Structures

4.2. Biinvariant Dual Connections

5. A New Model for Statistics, Machine Learning and Data Science

5.1. On 2-nilpotent Lie Group Structures on R N

5.2. Exponential Barycenter

5.3. On a New Model of Parametric Means

5.4. More Discussions on the New Model of Parametric Means

References

MDPI Initiatives

Important Links

Subscribe

2.1. Fisher Information Metric, Amari-Chentsov 3-Tensor, $α$ -Connections

3.5.2. The Heisenberg Lie Group $H_{2 n + 1}$

5.1. On 2-nilpotent Lie Group Structures on $R^{N}$