Information Geometry Description of Inferential Scattering

Marco Favretti

doi:10.20944/preprints202604.0922.v1

Submitted:

10 April 2026

Posted:

14 April 2026

You are already at the latest version

Abstract

We investigate the geometrical structure underlying the notion of Inferential Scattering, which was formulated by E. T. Jaynes in the 1980s using the language of equilibrium statistical mechanics. We show that inferential scattering can be naturally defined on a dually flat Riemannian manifold equipped with dual coordinate systems, a differential- geometric structure that occupies a central place in information geometry. We find that the evolution of the system on the dually flat manifold can be expressed as the horizontal lift of an integrable connection. We stress that the notion of inferential scattering has a wide range of applications, being a form of inference and therefore applicable to any statistical system with insufficient information.

Keywords:

Riemannian manifold

;

dually flat manifold

;

exponential family

;

linear connection

;

scattering theory

Subject:

Computer Science and Mathematics - Geometry and Topology

1. Introduction

The notion of Inferential Scattering was exposed by E.T. Jaynes in a couple of papers [1,2] whose content has remained little known in the scientific literature with respect to more celebrated works of this author. Jaynes’ original exposition refers to equilibrium statistical mechanics. It is well known that the probabilistic formalism of equilibrium statistical mechanics is described by an exponential family manifold which is an instance of a dually flat manifold [3]; the main aim of this work is to rephrase the content of the notion of inferential scattering in this enlarged setting of dually flat manifolds to better appreciate its geometric structure and possibly expand its range of applications. Dually flat manifolds are a generalization of Riemannian manifolds and they occupy a central place in Information Geometry. The have found multiple applications in the description of neural networks [4], machine learning (especially Restricted Boltzmann Machines) [5] and the description of hierarchical systems [6].

In a dually flat manifold M there are two local coordinate systems

θ = θ (p)

and

η = η (p)

which describe the same point

p \in M

. The transition maps

η \circ θ^{- 1}

and

θ \circ η^{- 1}

are related by a Legendre transform so that

η (θ) = \nabla ψ (θ)

with inverse

θ (η) = \nabla φ (η)

where ∇ denotes the gradient and

φ, ψ

are called potentials. In the case of an exponential manifold, there is a natural statistical mechanic interpretation of

θ

as control variables and

η

as effect variables, see Section 4. The tangents of the above introduced gradient maps give the linear variation

d η

in the effect variables due to a variation

d θ

in control variables and vice-versa.

The phenomenon of inferential scattering arises when particular kinds of variations

d η

and

d θ

are considered. In this work it is shown that its underlying geometric structure can be understood in terms of horizontal lift with respect to a couple of integrable connections (dually orthogonal foliations) which are characteristics of the dually flat manifold structure. The notion of inferential scattering comes into play because the variation

d η

in effect variables can be written as an infinite sum of contributions. According to Jaynes [1] this perturbative expansion can be interpreted using the multiple scattering phenomenon of a wave emanating from a point source A in presence of an obstacle C: the intensity of the wave at a point B in space is the sum of the contribution of the wave arriving from A plus the contribution of the wave reflected by C plus the one of the wave reflected from C back in A and then in B and so on with the higher order multiple reflections (Born series, see [10]). The main interest of this analysis of Jaynes’ result is to show that the scattering phenomenon is present also in this abstract setting of a dually flat manifold, where it establishes a relation between the dual coordinate systems

θ

and

η

.

We conclude with the plan of the paper: In Section 2 we give a cursory view of the differential geometry notion of dually flat manifold and mutually dual foliations. In Section 3 we describe the paradigmatic case of the exponential family manifold and in Section 4 we investigate the geometric structure of the notion of direct and inverse control and in Section 5 the one of inferential scattering.

2. Dually Flat Manifolds

We briefly recall the notions of differential geometry useful for introducing the definition of a dually flat manifold, a central object in information geometry. We keep the exposition to a minimum, without proofs, essentially following [3]. Other important introductory references are [5,7,9].

Let

(M, g)

be a n-dimensional smooth manifold with g a Riemannian metric. In a local chart

ζ

the coordinates of a point

p \in M

are

ζ (p) = (x_{1}, \dots x_{n})

and the canonical basis is

\partial_{i} = \frac{\partial}{\partial x_{i}}

. We equip M with a linear connection ∇ whose expression in the local chart is given by the Christoffel symbols

Γ_{i j}^{k}

as

\nabla_{\partial_{i}} \partial_{j} = Γ_{i j}^{k} \partial_{k}, Γ_{i j, k} = g (\nabla_{\partial_{i}} \partial_{j}, \partial_{k}) = g_{l k} Γ_{i j}^{l} .

Throughout the paper we adopt the convention that repeated indices are summed. The torsion

τ

of the connection ∇ is defined as

τ (X, Y) = \nabla_{X} Y - \nabla_{Y} X - [X, Y]

where

X, Y \in X (M)

are vector fields on M and

[X, Y]

is the Lie bracket. The connection ∇ is said to be compatible with the metric g or simply metric if

X \cdot g (X, Y) = g (\nabla_{X} Y, Z) + g (X, \nabla_{Y} Z) \forall X, Y, Z \in X (M) .

(1)

If the connection is metric, the parallel transport defined by ∇ is an isometry, preserving the length and the angle between vectors. A celebrated theorem in differential geometry states that in a Riemannian manifold there exists a unique metric connection which is torsion-free

τ = 0

, the so-called Levi-Civita connection

\hat{\nabla}

. For later use we introduce the curvature defined by the connection ∇

R (X, Y, Z) = \nabla_{X} (\nabla_{Y} Z) - \nabla_{Y} (\nabla_{X} Z) - \nabla_{[X, Y]} Z .

The connection ∇ is flat if the associated curvature is identically zero. This is a coordinate-free condition and the Christoffel symbols are zero in every coordinate chart. Moreover, there exists a local affine coordinate system with the property that every canonical vector

\partial / \partial x_{i}

of the basis can be parallel transported along the coordinate line

x_{i}

so each coordinate line is a geodesic.

The above notion (1) of compatibility between the metric g and a connection ∇ is generalized in Information Geometry using the following

Definition 1.

The connections ∇ and

\nabla^{*}

are called conjugate or dual with respect to the metric in

(M, g)

if

X \cdot g (X, Y) = g (\nabla_{X} Y, Z) + g (X, \nabla_{Y}^{*} Z) \forall X, Y, Z \in X (M) .

One can show [3] that the connection

\nabla^{*}

dual to ∇ is unique. Moreover, in [3] it is proved that if

\nabla, \nabla^{*}

are dual and both torsion-free, then

R = 0

if and only if

R^{*} = 0

. This motivate the following

Definition 2.

A Riemannian manifold

(M, g, \nabla, \nabla^{*})

equipped with two connections which are dual and torsion-free is called dually flat when

R = R^{*} = 0

, i.e. both connections are flat.

As a consequence, there exists on M two affine local coordinate systems

θ

and

η

with the property that the vectors

\partial^{i} = \frac{\partial}{\partial η^{i}}

of the basis are ∇-parallel transported along the coordinate lines

η^{i}

and the vectors

\partial_{i} = \frac{\partial}{\partial θ_{i}}

are

\nabla^{*}

-parallel transported along the coordinate lines

θ_{i}

. The coordinates

θ

and

η

are called dual or bi-orthogonal because it holds that

g (\partial_{i}, \partial^{j}) = δ_{i}^{j}

where

δ

is the Kronecker symbol. Note that the two basis are orthogonal each other but the vectors of each base need not to be orthogonal. More in detail, if we consider the representation of the metric g in the two basis

g^{i j} = g (\partial^{i}, \partial^{j}), g_{i j} = g (\partial_{i}, \partial_{j})

it holds that the two matrices are in the relation

g^{i j} g_{j k} = δ_{k}^{i}

. Finally, let

θ = θ (η)

and

η = η (θ)

be the transition maps connecting the representation of a point

p \in M

in the two coordinate charts. Note that they are one the inverse of the other. Using the orthogonality condition

g^{i j} g_{j k} = δ_{k}^{i}

one can prove that their tangent maps coincide with the representation of the metric g in the two coordinate basis, that is

g^{i j} (η) = \frac{\partial θ_{i} (η)}{\partial η^{j}}, g_{i j} (θ) = \frac{\partial η^{i} (θ)}{\partial θ_{j}} .

(2)

We now show that

Proposition 1.

The change of coordinate

η = η (θ)

and

θ = θ (η)

can be represented using the so-called potential scalar functions

φ (η)

and

ψ (θ)

in the sense that

θ_{i} (η) = \frac{\partial φ (η)}{\partial η^{i}}, η^{i} (θ) = \frac{\partial ψ (θ)}{\partial θ_{i}}, i = 1, \dots, n .

(3)

Proof.

We prove the first relation of (3), the proof for the second follows the same pattern. We need to prove that the one form

ω = θ_{i} (η) d η^{i}

is exact, that is there exists a scalar function

φ (η)

such that

d φ = ω

. A necessary (and sufficient condition if the manifold is simply connected) is that

d ω = 0

. Then we need to check that

d ω = d (θ_{i} (η) d η^{i}) = \frac{\partial θ_{i}}{\partial η^{j}} d η^{j} \land d η^{i} = (\frac{\partial θ_{i}}{\partial η^{j}} - \frac{\partial θ_{j}}{\partial η^{i}}) d η^{j} \otimes d η^{i} = 0

that is

\frac{\partial θ_{i}}{\partial η^{j}} - \frac{\partial θ_{j}}{\partial η^{i}} = 0 \forall i, j = 1, \dots n

Now, using (2), the above relation becomes

g^{i j} = g^{j i}

which is true. □

Note that the relations (2) and (3) together imply that the potentials

φ (η)

and

ψ (θ)

are convex function. Indeed we have

\frac{\partial^{2} φ (η)}{\partial η^{j} \partial η^{i}} = \frac{\partial θ_{i}}{\partial η^{j}} = g^{i j} \in S y m^{+} (n)

i.e. that the Hessian matrix of

φ

is positive definite. Since the metric g is the Hessian of a function, the dually flat manifold is an example of an Hessian manifold. We do not pursue this line of investigation, see [9] for further details.

Note that the existence of a dual coordinate system induces stringent conditions on the metric g: indeed by deriving (2) and using Schwartz condition on mixed second order derivative we have that

\frac{\partial g^{i j}}{\partial η^{k}} = \frac{\partial g^{i k}}{\partial η^{j}} and \frac{\partial g_{i j}}{\partial θ_{k}} = \frac{\partial g_{i k}}{\partial θ_{j}} .

Another remarkable consequence is the following, which is a classical result of Legendre transform theory: since the change of coordinate maps are gradient maps, their inverse are also gradient maps and they are related by the Legendre condition

φ (η) = θ (η) \cdot η - ψ (θ (η)) .

Indeed, using the one-form notation, we have

d φ (η) = d θ \cdot η + θ \cdot d η - d ψ = d θ \cdot η + θ \cdot d η - η \cdot d θ = θ \cdot d η .

2.1. Mutually Dual Foliations of a Dually Flat Space

A remarkable fact of dually flat manifolds is that they can admit a couple of orthogonal foliations. Let us consider an n-dimensional dually flat manifold

(M, g, \nabla, \nabla^{*})

and let

η (p) = η

,

θ (p) = θ

,

p \in M

, be a couple of bi-orthogonal affine local coordinate systems. Given a k with

1 < k < n

we partition the coordinates in two subsets

η = (η^{A}, η^{B}) = (η^{1}, \dots, η^{k}, η^{k + 1}, \dots, η^{n}) and θ = (θ_{A}, θ_{B}) = (θ_{1}, \dots, θ_{k}, θ_{k + 1}, \dots, θ_{n})

and we consider the coordinate projections

π_{A} (η) = η^{A}

and

π_{B} (θ) = θ_{B}

. For every

{\bar{η}}^{A}

and

{\bar{θ}}_{B}

we define the submanifolds of M

E ({\bar{η}}^{A}) = {p \in M : π_{A} \circ η (p) = {\bar{η}}^{A})} Q ({\bar{θ}}_{B}) = {p \in M : π_{B} \circ θ (p) = {\bar{θ}}_{B})}

(4)

In [3] is is proved that since M is dually flat and the subsets

E ({\bar{η}}^{A})

and

Q ({\bar{θ}}_{B})

are defined as affine subspaces of an affine coordinate system , they are totally geodesic submanifolds with respect to the ∇ and

\nabla^{*}

connection respectively, that is it holds that a ∇-geodesic (respectively

\nabla^{*}

-geodesic) which is tangent to the submanifold at a point remains tangent to the submanifold. Moreover since their tangent spaces can be described as

T_{p} E ({\bar{η}}_{A}) = < \partial^{k + 1}, \dots, \partial^{n} >, T_{p} Q ({\bar{θ}}_{B}) = < \partial_{1}, \dots, \partial_{k} >

the two submanifolds are mutually orthogonal by the condition

g (\partial_{i}, \partial^{j}) = δ_{i}^{j}

and complementary in the sense that their direct sum of subspaces gives the tangent space

T_{p} M

. See Figure 1 below. If we introduce the so called mixed coordinate system

m (p) = (η^{A} (p), θ_{B} (p)) = (π_{A} (η (p)), π_{B} (θ (p)))

the metric g admits the block decomposition

g (m (p)) = g^{a b} d η^{a} \otimes d η^{b} + g_{c d} d θ_{c} \otimes d θ_{d}, a, b \in {1, \dots, k}, c, d \in {k + 1, \dots, n} .

So the complementary projections

π_{A}

and

π_{B}

define a couple of mutually dual foliation of M whose leaves are orthogonal submanifolds of constant dimension (respectively

\dim E ({\bar{η}}^{A}) = n - k

, and

\dim Q ({\bar{θ}}_{B}) = k

) and totally geodesic with respect to the ∇ and

\nabla^{*}

connections. This partition of the space is called k-cut of a dual coordinate system in [5]. The notion of mutually dual foliations has important applications on Information Geometry due to the fact that a version of the Pithagorean Theorem applies (see again [3]).

3. Exponential Family Manifold

We describe an example of dually flat manifold which occupies a central place in information geometry. We start with the definition of statistical model. Let

(X, B, d μ)

be a measure space, where the set X can be numerable or not,

B

is a

σ -

algebra over X and

d μ

is a measure on X. The infinite dimensional space of all the positive probability densities over X is

P (X) = {p (x) \geq 0 a . e . on X, and \int_{X} p (x) d μ = 1} .

Given an open convex set

U \subset R^{n}

with coordinates

θ = (θ_{1}, \dots, θ_{n})

we consider the subset of

P (X)

of probability densities parameterized by the parameters

θ \in U

S = {p_{θ} = p (x, θ), θ \in U} \subset P (X) .

The set S has the structure of a n-dimensional sub-manifold of

P (X)

if: 1) the map

θ \mapsto p_{θ}

is injective and 2) the n base functions

p_{i} = \partial p (x, θ) / \partial θ_{i}

are linearly independent as functions over X. In this case we call S a n-dimensional statistical model. Usually, linear independence condition is checked using the equivalent score function basis functions

l_{i} = \partial l / \partial θ_{i}

, where

l = ln p

is the likelihood of p.

An exponential family is the n-dimensional statistical manifold

E = {p (x, θ) = e^{θ \cdot h (x) + k (x) - ψ (θ)}, θ \in U} \subset P (X)

(5)

where

h : X \to R^{n}

is a map of maximum rank n whose components

h_{i}

,

i = 1, \dots, n

are called feature functions or observables, k is a positive function such that

e^{k (x)} d μ

is a probability density over X and the function

ψ (θ)

ψ (θ) = ln \int_{X} e^{θ \cdot h (x) + k (x)} d μ

is the cumulant or moment generating function, called free energy in statistical mechanics. A direct computation shows that the averages of

h_{i}

can be written as

η_{i} = E_{p_{θ}} (h_{i}) = \frac{\partial ψ (θ)}{\partial θ_{i}}, i = 1, \dots, n

(6)

and that the covariances are

c o v (h_{i}, h_{j}) = E_{p_{θ}} (h_{i} h_{j}) - E_{p_{θ}} (h_{i}) E_{p_{θ}} (h_{j}) = \frac{\partial^{2} ψ (θ)}{\partial θ_{i} \partial θ_{j}} .

Using the score vectors

l_{i} = h_{i} - \partial ψ / \partial θ_{i}

associated to the exponential model one can show that the Hessian of the cumulant function coincides with the Fisher Riemannian metric

g_{i j}

[3] that is

g_{i j} (θ) = E_{p_{θ}} (l_{i} l_{j}) = \frac{\partial^{2} ψ (θ)}{\partial θ_{i} \partial θ_{j}} (θ),

which is positive definite by the maximum rank hypothesis on h. So

(E, g)

is a Riemannian manifold. To show that

E

is dually flat manifold one needs to introduce Amari

α

-connection with Christoffel symbols

Γ_{i j, k}^{(α)} = E [(\partial_{i} \partial_{j} l + \frac{1 - α}{2} \partial_{i} l \partial_{j} l) \partial_{k} l], α \in [0, 1]

where

l = ln p (x, θ)

. By direct computation one shows that

Γ^{(1)}

(resp.

Γ^{(- 1)}

) vanish in

θ

(resp.

η)

coordinates, so

θ

and

η

are global affine charts and the curvature

R^{(1)}

and

R^{(- 1)}

both vanish, making

E

dually flat. Note that a connection can be locally flat (zero curvature) without admitting a global affine coordinate system. For an exponential family manifold the Christoffel symbols vanish on the entire domains of the

θ

and

η

coordinates, which are convex open sets, so the flat structure is global. For more details see [3].

4. Controlled Evolution in a Dually Flat Manifold

In this Section we show how the dual coordinate system

η, θ

can be used to describe the controlled evolution of a point on a dually flat manifold. It is well known that the statistical description of a thermodynamic system is given by an exponential family (5) whose feature functions

h (x) \in R^{n}

are the physical observables of interest for the macroscopic description of the system given in terms of their average values

η \in R^{n}

. The associated natural parameters

θ = θ (η)

are interpreted as generalized inverse temperatures. This is a generalization of the fact that if

h (x) \in R

represents the energy of the system in the microscopic phase-space configuration

x \in X

, its average

η

is the internal energy of the system in contact with an heat bath at the inverse temperature

θ (η)

. The heat bath allows one to control the (inverse) temperature

θ

of the system. This brief review of equilibrium statistical mechanics terminology motivates the name of (thermal) control or input coordinates for

θ \in R^{n}

and of effect or output coordinate for

η \in R^{n}

.

The equilibrium statistical mechanics formalism is an example of an exponential family manifold which is an instance of a dually flat manifold; we will be interested in the following two problems formulated for a dually flat n-dimensional Riemannian manifold

(M, g)

equipped with a dual coordinate systems

(η, θ)

:

1) forward of direct problem: given a path in control variables

θ (t) \in R^{n}

to determine the corresponding evolution

η (t) \in R^{n}

in effect space;

2) backward or inverse problem: given an observed evolution

η (t)

in effect space, to determine the corresponding control or cause path

θ (t)

.

Since

η

and

θ

are local affine coordinates of a dually flat manifold, the above problems have a solution using (3)

direct problem : given θ (t), the evolution is η (t) = \nabla ψ (θ (t))

(7)

and

inverse problem : given η (t), the evolution is θ (t) = \nabla φ (η (t)) .

(8)

In the following we will consider two important classes of direct and inverse problems.

Direct Problem: System in a Heat Bath

In [1] the following instance of a direct problem for a control path referred to a k-cut of a mutually dual foliation (see Section 2.1) is considered: let

t \mapsto (θ_{A} (t), {\bar{θ}}_{B}

) be given, that is the system is contained in a heath bath which realize the constraint

θ_{B} (t) = {\bar{θ}}_{B}

. Using (7) we have, adopting for simplicity the shorthand block notation

ψ_{A B} = \partial_{A B} ψ

\{\begin{matrix} {\dot{η}}^{A} (t) = \frac{d}{d t} (\partial_{A} ψ (θ_{A} (t), {\bar{θ}}_{B})) = \partial_{A A} ψ {\dot{θ}}_{A} = ψ_{A A} {\dot{θ}}_{A}, \\ {\dot{η}}_{B} (t) = \frac{d}{d t} (\partial_{B} ψ (θ_{A} (t), {\bar{θ}}_{B})) = ψ_{B A} {\dot{θ}}_{A} \end{matrix}

and since

ψ_{A A} \in M a t (k)

is invertible, we can write

{\dot{θ}}_{A} = ψ_{A A}^{- 1} {\dot{η}}_{A}

and obtain the result

{\dot{η}}^{B} = ψ_{B A} ψ_{A A}^{- 1} {\dot{η}}_{A} = Γ_{B A} (θ_{A} (t), {\bar{θ}}_{B}) {\dot{η}}^{A}, {\dot{η}}^{A} = ψ_{A A} {\dot{θ}}_{A} .

(9)

So, if the system is contained in a B-type heath bath

θ_{B} (t) = {\bar{θ}}_{B}

, then the evolution in the

η^{B}

effect variables can be reconstructed from the evolution of the

η^{A}

variables using (9)₁.

This result has an interesting geometrical interpretation: if we consider the local fibration

η \circ π_{A} (p) = η^{A}

, with fibres

E ({\bar{η}}^{A})

see (4), the orthogonal submanifold

Q ({\bar{θ}}_{B})

defines an orthogonal complement

H (η \circ π_{A})

to the tangent space to the fibres

V (η \circ π_{A}) = ker T (η \circ π_{A})

and hence

Q ({\bar{θ}}_{B})

is an integrable orthogonal Ehresmann connection, i.e a section of the fibre map

η \circ π_{A}

and the

Γ_{B A}

are the coefficients of the Ehresmann connection, see e.g. [11,12]. Therefore (9) can be seen as the horizontal lift of the path

η_{A} (t)

in base space

{\dot{η}}^{B} = h o r (η^{A} (t)) = Γ_{B A} (θ (t)) {\dot{η}}^{A} (t) \otimes \partial^{B}

where

h o r : R^{k} \to H (η \circ π_{A}) = T Q (θ_{B}) \subset T_{p} M

is the horizontal lift operator.

In the case of an exponential family manifold, the connection coefficients

Γ_{B A}

have an important interpretation in statistical theory

Γ_{B A} (θ) = ψ_{B A} ψ_{A A}^{- 1} = c o v (h_{A}, h_{B}) c o v {(h_{A}, h_{A})}^{- 1}

where

c o v (h_{A}, h_{B})

is the matrices of the covariances between

h_{A}

and

h_{B}

observables and

c o v (h_{A}, h_{A})

is the square matrix of the variance and covariance between

h_{A}

type observables. This shows that when the

θ_{B}

parameter are kept constant, the inference on the

η^{B}

variables given the evolution of the

η^{A}

variables is the linear predictor of the B random variables in terms of the A random variables.

The dual of the above system evolution is given by the following:

Inverse Problem: System Insensitive to the Control in the A Variables

Given an evolution in the effect space of the type

η (t) = ({\bar{η}}^{A}, η^{B} (t))

, to determine the corresponding time evolution in the cause parameter

θ (t)

. Using the same argument as above we have from (7)

\{\begin{matrix} {\dot{η}}^{A} (t) = \frac{d}{d t} (\partial_{A} ψ (θ_{A} (t), θ_{B} (t)) = ψ_{A A} {\dot{θ}}_{A} + ψ_{A B} {\dot{θ}}_{B} = 0, \\ {\dot{η}}^{B} (t) = \frac{d}{d t} (\partial_{B} ψ (θ_{A} (t), θ_{B} (t)) = ψ_{B A} {\dot{θ}}_{A} + ψ_{B B} {\dot{θ}}_{B} . \end{matrix}

Therefore the constraint in effect space induces the following constraint in control space

{\dot{θ}}_{A} = - ψ_{A A}^{- 1} ψ_{A B} {\dot{θ}}_{B} = - Γ_{A B}^{T} (θ_{A} (t), θ_{B} (t)) {\dot{θ}}_{B}

(10)

where

Γ_{A B}^{T}

is the transpose of the matrix

Γ_{B A}

. Note that if

{\dot{θ}}_{B} (t) \neq 0

the above equation (10) can be given the simpler form of an ODE for the function

θ_{A} = θ_{A} (θ_{B})

as

\frac{d θ_{A}}{d θ_{B}} = - Γ_{A B}^{T} (θ_{A}, θ_{B}) .

For the sake of completeness we derive the above result (10) using the inverse problem formulation in (8). Adopting the shorthand notation

\partial^{A B} φ = φ^{A B}

we have

\{\begin{matrix} {\dot{θ}}_{A} (t) = \frac{d}{d t} (\partial^{A} φ ({\bar{η}}^{A} (t), η^{B})) = \partial^{A B} φ {\dot{η}}^{B} = φ^{A B} {\dot{η}}^{B}, \\ {\dot{θ}}_{B} (t) = \frac{d}{d t} (\partial^{B} φ ({\bar{η}}^{A} (t), η_{B})) = φ^{B B} {\dot{η}}^{B} \end{matrix}

and since

φ^{B B} \in M a t (n - k)

is invertible, we have that

{\dot{θ}}_{A} = φ^{A B} {(φ^{B B})}^{- 1} {\dot{θ}}_{B} = Γ^{A B} {\dot{θ}}_{B}

which coincides with the above result (10) because

Γ^{A B} = φ^{A B} {(φ^{B B})}^{- 1} = - ψ_{A A}^{- 1} ψ_{A B} = - Γ_{A B}^{T} .

This last equality can be easily checked by expanding in block representation the identity

g^{i j} g_{j k} = φ^{i j} ψ_{j k} = δ_{k}^{i} .

Example: Direct Problem for the Gaussian Exponential Manifold

Before turning to the mixed control problem we give a simple example of the theory developed so far. It is well known that the gaussian probability density is the maximal entropy density under the constraint of fixed value for the first two moments. Therefore, we can consider for

x \in X = R

the feature functions

h_{1} (x) = x

and

h_{2} (x) = x^{2}

. The probability density defines an exponential family of the form

p (θ_{1}, θ_{2}, x) = e^{θ_{1} x + θ_{2} x^{2} - ψ (θ_{1}, θ_{2})}

where

ψ (θ_{1}, θ_{2}) = ln \int_{R} e^{θ_{1} x + θ_{2} x^{2}} d x = - \frac{θ_{1}^{2}}{4 θ_{2}} + \frac{1}{2} ln (- θ_{2}) + \frac{1}{2} ln π, θ_{2} < 0 .

We consider the following forward control problem: to determine the evolution of

η^{B} = η^{2} = E_{p_{θ}} [x^{2}]

given the evolution

η^{A} = η^{1} = E_{p_{θ}} [x]

under the control path

(θ_{A} (t), {\bar{θ}}_{B}) = (θ_{1} (t), {\bar{θ}}_{2})

. This can be done using (9) by computing the Ehresmann connection coefficient

Γ_{B A} = Γ_{21} = \frac{ψ_{12}}{ψ_{11}} = \frac{ψ_{θ_{1} θ_{2}}}{ψ_{θ_{1} θ_{1}}} = - \frac{θ_{1}}{θ_{2}} = - \frac{θ_{1} (t)}{{\bar{θ}}_{2}}

therefore

{\dot{η}}^{2} (t) = Γ_{21} (t) {\dot{η}}^{1} (t) = - \frac{θ_{1} (t)}{{\bar{θ}}_{2}} {\dot{η}}^{1} (t)

(11)

which can directly integrated

η^{2} (t) = - \frac{1}{{\bar{θ}}_{2}} \int θ_{1} (t) {\dot{η}}^{1} (t) d t + const .

If

{\dot{η}}^{1} (t) \neq 0

,

η^{1} (t)

can be locally inverted in

t = t (η^{1})

and we can express the horizontal lift (11) in the form

\frac{d η^{2}}{d η^{1}} = Γ_{21} (t) = Γ_{21} (t (η^{1}))

so that

η^{2} = \int Γ_{21} (t (η^{1})) d η^{1} = - \frac{1}{{\bar{θ}}_{2}} \int θ_{1} (t (η^{1})) d η^{1} + const .

As an example, consider the evolution

η^{1} (t) = t

, with inverse

t (η^{1}) = η^{1}

. We want to determine the control path

θ_{1} (t (η^{1})) = θ_{1} (η^{1})

so that the variance is a given function

v a r (h_{1}) = η^{2} - {(η^{1})}^{2} = f (η^{1})

. We have the functional equation

f (η^{1}) = η^{2} - {(η^{1})}^{2} = - \frac{1}{{\bar{θ}}_{2}} \int θ_{1} (η^{1}) d η^{1} + const - {(η^{1})}^{2}

which gives by derivation

θ_{1} (η^{1}) = (- {\bar{θ}}_{2}) (2 η^{1} + f^{'} (η^{1})) .

5. Mixed Control Problem and Inferential Scattering

The notion of Inferential scattering was exposed in a couple of papers by E.T. Jaynes [1,2]. Jaynes’ original exposition refers to equilibrium statistical mechanics. In this setting, the results of the above Section describe the inference on the evolution of the average value of a group of random variables (B) given the evolution of the average value of another group (A) of random variables under the hypothesis that the random variables are described by an exponential family and a constraint is in force. It can be seen as an inference result.

To generalize the notion of inferential scattering we need to introduce the notion of

(k, l)

-cut in a n-dimensional dual coordinate system of a dually flat manifold: let

(l, k)

with

1 < l < k < n

and consider the splitting of the coordinates

η = (η^{a}, η^{c}, η^{B}), θ = (θ_{a}, θ_{c}, θ_{B}), a = 1, \dots, l, c = l + 1, \dots, k, B = k + 1, \dots, n .

In practice, we split the A block variables in

A = (a, c)

. Now we can consider the following instance of mixed control problem

Mixed Control Problem for a System in a B-Type Heat Bath and Insensitive to the Control in the c Variables

This is a combination of the two above cases. We want to determine the evolution in the

η^{B}

variables under the controlled evolution

θ_{a} (t)

knowing that we have the constraint

η^{c} (t) = {\bar{η}}^{c}

in the effect space and

θ_{B} (t) = {\bar{θ}}_{B}

in cause space. Reasoning as above we consider the system of evolution equations

\{\begin{matrix} {\dot{η}}^{a} = \frac{d}{d t} (\partial_{a} ψ (θ_{a} (t), θ_{c} (t), {\bar{θ}}_{B}) = ψ_{a a} {\dot{θ}}_{a} + ψ_{a c} {\dot{θ}}_{c}, \\ {\dot{η}}^{c} = \frac{d}{d t} (\partial_{c} ψ (θ_{a} (t), θ_{c} (t) {\bar{θ}}_{B}) = ψ_{c a} {\dot{θ}}_{a} + ψ_{c c} {\dot{θ}}_{c} = 0, \\ {\dot{η}}^{B} = \frac{d}{d t} (\partial_{B} ψ (θ_{a} (t), θ_{c} (t), {\bar{θ}}_{B}) = ψ_{B a} {\dot{θ}}_{a} + ψ_{B c} {\dot{θ}}_{c} . \end{matrix}

From the second equation we get

{\dot{θ}}_{c} = - ψ_{c c}^{- 1} ψ_{c a} {\dot{θ}}_{a}

(12)

and we substitute

{\dot{θ}}_{c}

in the first and third equation to obtain

\{\begin{matrix} {\dot{η}}^{a} = ψ_{a a} {\dot{θ}}_{a} - ψ_{a c} ψ_{c c}^{- 1} ψ_{c a} {\dot{θ}}_{a} = ψ_{a a} (I - ψ_{a a}^{- 1} ψ_{a c} ψ_{c c}^{- 1} ψ_{c a}) {\dot{θ}}_{a} = ψ_{a a} (I - R (a, c)) {\dot{θ}}_{a} \\ {\dot{η}}^{B} = ψ_{B a} {\dot{θ}}_{a} - ψ_{B c} ψ_{c c}^{- 1} ψ_{c a} {\dot{θ}}_{a} = (ψ_{B a} - ψ_{B c} ψ_{c c}^{- 1} ψ_{c a}) {\dot{θ}}_{a} \end{matrix}

(13)

where

I

is the identity matrix and we have introduced the correlation coeffcient matrix

R (a, c) = ψ_{a a}^{- 1} ψ_{a c} ψ_{c c}^{- 1} ψ_{c a} .

(14)

The last equality in (13)₁ can be inverted

{\dot{θ}}_{a} = {(I - R (a, c))}^{- 1} ψ_{a a}^{- 1} {\dot{η}}^{a}

and substituted in (13)₂ to finally obtain

{\dot{η}}^{B} = (ψ_{B a} - ψ_{B c} ψ_{c c}^{- 1} ψ_{c a}) {(I - R (a, c))}^{- 1} ψ_{a a}^{- 1} {\dot{η}}^{a} = G_{B a} {\dot{η}}^{a} .

(15)

This is our inference result on the evolution of the

η^{B}

variables given the mixed (forward and backward) control. Note that

G_{B a} = G_{B a} (θ_{a} (t), θ_{c} (t), {\bar{θ}}_{B}))

but the evolution in the controlled variables

θ_{a}, θ_{c}

is subject to the equation (12) which enforces the constraint in effect space

η^{c} = {\bar{η}}^{c}

.

5.1. The Inferential Scattering Formula

Since

R (a, c)

in (14) is the analogous of the correlation coefficient of two random variables

ρ (A, B) = \frac{c o v {(A, B)}^{2}}{v a r (A) v a r (B)}, 0 < ρ (A, B) < 1

it reasonable to assume that the matrix

R (a, c)

has spectral radius less than one. Under this assumption, the following Taylor expansion hold

{(I - R (a, c))}^{- 1} = I + R (a, c) + R {(a, c)}^{2} + R {(a, c)}^{3} + \dots

(16)

If we substitute (16) in (15) we readily obtain the following expansion

{\dot{η}}^{B} = (Γ_{B a} - Γ_{B c} Γ_{c a} + Γ_{B a} Γ_{a c} Γ_{c a} - Γ_{B c} Γ_{c a} Γ_{a c} Γ_{c a} + \dots) {\dot{η}}^{a}

(17)

where we have set

Γ_{B a} = ψ_{B a} ψ_{a a}^{- 1}, Γ_{B c} = ψ_{B c} ψ_{c c}^{- 1}, Γ_{c a} = ψ_{c a} ψ_{a a}^{- 1}, Γ_{a c} = ψ_{a c} ψ_{c c}^{- 1} .

The above expansion formula (17) has the following interpretation, see [1]: the term

Γ_{B a} d η^{a}

is the inferred (infinitesimal) variation of the B observable induced by a variation

d η^{a}

of the a observables. The term

Γ_{B c} Γ_{c a} d η^{a}

is the inferred variation in B as induced by a variation

d η_{c} = Γ_{c a} d η^{a}

of the c observables which is in turn induced by the variation

d η^{a}

. The other terms in (17) have a similar interpretation. This result has been called Inferential Scattering formula [2] by analogy with the Born series expansion of the operator which describes the multiple scattering phenomenon [10]: a wave can reach a point B in space directly from source a (

Γ_{B a}

term) or after having been reflected by an obstacle in c (

Γ_{B c} Γ_{c a}

term) or after having been reflected back in a from c and then arriving at B (

Γ_{B a} Γ_{a c} Γ_{c a}

term). See diagram (18) below. Note that the first term in (17)

Γ_{B a}

coincides with the one derived before in (9) in case of forward control. Of course the size of the higher order correction terms after

Γ_{B a}

depends on the norm of the correlation matrix

R (a, c)

.

(18)

5.2. A Mixed Control Example: Neural Network of Two Neurons

The following toy model (see e.g. [5]) describes the neural activity of a set of two neurons using two discrete random variables which can take the value 1 (active) or 0 (silent). The state space of the network is

X = {(0, 0), (0, 1), (1, 0), (1, 1)}

. Given

x = (x_{1}, x_{2}) \in X

, let

p (x)

be the probability that the network is in the state x. As observables we consider

h_{i} (x) = x_{i}

,

i = 1, 2

which describe the state of the two neurons and

h_{3} (x) = x_{1} x_{2}

. The exponential manifold is comprised of discrete probability distributions of the form

p (x) = e^{θ \cdot h - ψ (θ)} = e^{θ_{1} x_{1} + θ_{2} x_{2} + θ_{3} x_{1} x_{2} - ψ (θ)}

where

ψ (θ) = ln (\sum_{x \in X} e^{θ_{1} x_{1} + θ_{2} x_{2} + θ_{3} x_{1} x_{2}}) = ln (1 + e_{1}^{θ} + e_{2}^{θ} + e^{θ_{1} + θ_{2} + θ_{3}}) = ln Z (θ)

The associated effect space coordinates

η^{i} = E_{p_{θ}} [x_{i}] = \frac{\partial ψ}{\partial θ_{i}}, i = 1, 2 η^{3} = E_{p_{θ}} [x_{1} x_{2}] = \frac{\partial ψ}{\partial θ_{3}}

are called firing rate of neurons

i = 1, 2

and joint firing rate. A direct computation shows that

η^{i} = \frac{e^{θ_{i}} + e^{θ_{1} + θ_{2} + θ_{3}}}{Z (θ)} = Prob (x_{i} = 1), i = 1, 2, η^{3} = \frac{e^{θ_{1} + θ_{2} + θ_{3}}}{Z (θ)} = Prob (x_{1} x_{2} = 1) .

The Riemannian metric matrix

g \in M a t (3)

can also be computed directly as

g_{i j} (θ) = \frac{\partial^{2} ψ (θ)}{\partial θ_{i} \partial θ_{j}} .

We investigate the mixed control problem (see Section 5) for this system. We consider the

(2, 1)

-cut in the 3-dimensional dual coordinate system defined by the following identification

(η^{1}, η^{2}, η^{3}) = (η^{a}, η^{c}, η^{B}), (θ_{1}, θ_{2}, θ_{3}) = (θ_{a}, θ_{c}, θ_{B}) .

We want to determine the evolution in the

η^{B} = η^{3}

variable (joint firing rate of neuron 1 and 2) in terms of the evolution of

η^{a} = η^{1}

(firing rate of neuron 1) under the control

θ_{B} (t) = θ_{3} (t)

knowing that we have the constraint

η^{c} (t) = η^{2} (t) = {\bar{η}}^{2}

(firing rate of neuron 2 is insensitive to the control) in the effect space. To this we can use the perturbation expansion (17)

d η^{3} = (Γ_{31} - Γ_{32} Γ_{21} + Γ_{31} Γ_{12} Γ_{21} - Γ_{32} Γ_{21} Γ_{12} Γ_{21} + \dots) d η^{1}

where

Γ_{31} = \frac{ψ_{31}}{ψ_{11}}, Γ_{32} = \frac{ψ_{32}}{ψ_{22}}, Γ_{21} = \frac{ψ_{21}}{ψ_{11}} Γ_{12} = \frac{ψ_{12}}{ψ_{22}}

and

R (1, 2) = Γ_{12} Γ_{21} = \frac{ψ_{12} ψ_{21}}{ψ_{11} ψ_{22}} = \frac{ψ_{12}^{2}}{ψ_{11} ψ_{22}} = \frac{c o v {(x_{1}, x_{2})}^{2}}{v a r (x_{1}) v a r (x_{2})} = ρ^{2} (x_{1}, x_{2})

where

ρ (x_{1}, x_{2}) \in [0, 1]

is Pearson’s correlation coefficient which is a measure of the strength of the linear correlation (normalized covariance) between the two variables. So the perturbation parameter in the Born series expansion (16) is the correlation coefficient. The explicit expression (setting

θ_{1} = x

,

θ_{2} = y

and

θ_{3} = z

for simplicity) is

R (1, 2) (x, y, z) = (\frac{1}{1 + e^{x}} - \frac{1}{e^{x + z}}) (\frac{1}{1 + e^{y}} - \frac{1}{e^{y + z}}) .

Note that the two control variables

(x, y)

are not completely free because relation (12) which enforces the constraint

η^{3} = η^{c} = {\bar{η}}^{c}

holds. In our notation after a straightforward computation (12) reads

\dot{y} = - \frac{ψ_{12}}{ψ_{22}} \dot{x} = (\frac{1}{1 + e^{x}} - \frac{1}{1 + e^{x + \bar{z}}}) \dot{x} = - Γ_{12} (x, \bar{z}) \dot{x}

where we have set

z = θ_{3} = {\bar{θ}}_{3} = \bar{z}

. If

\dot{x} \neq 0

the above equation can be readily integrated

\frac{d y}{d x} = Γ_{12} (x, \bar{z}) hence y (x) = \int Γ_{12} (x, \bar{z}) d x + const .

to give

y (x, \bar{z}) = 2 [artanh (1 + 2 e^{x + \bar{z}}) - artanh (1 + 2 e^{x})] .

Therefore, in the mixed control case, the correlation coefficient

R (1, 2)

is a function of

θ_{1} = x

only, of the form

R_{\bar{z}} (1, 2) (x) = R (1, 2) (x, y (x, \bar{z}), \bar{z}) .

A plot of

R (1, 2)

for

\bar{z} = {\bar{θ}}_{3} = 5

is in Figure 2. The figure helps to visualize the size of the perturbation expansion parameter

R (1, 2)

. This simple example is only suggestive of the possible applications of the notion of Inferential Scattering.

6. Discussion

The primary aim of this paper was to embed the notion of Inferential Scattering as exposed b E.T. Jaynes for a thermodynamical system into the language and the concepts of Information Geometry. The key geometrical object turns out to be a dually flat manifold with its coupled bi-orthogonal coordinate systems

η

and

θ

. These coordinates have a simple and appealing interpretation in the statistical mechanics framework as (thermal) control and effect space coordinates.

We have also shown that the forward and backward control problems can be seen as a horizontal lift (parallel transport) with respect to an integrable orthogonal Ehresmann connection. In this way the controlled evolution of the system in effect space is globally defined providing a generalization of the original formulation for infinitesimal displacements contained in the Jaynes papers. We stress that this notion of controlled evolution has a potentially wide application range being a form of inference, hence applicable to every statistical system with insufficient information. The geometrical analysis presented in this paper could be further investigated and we plan to do it in a subsequent paper: for example, it remains to be seen its formulation in the larger setting of Hessian manifolds and the possible developments using a weaker setting for the Legendre transform. We hope that the material presented here can renew interest in the scientific community for the original contribution of E.T. Jaynes and its potential applications.

Funding

This research received no external funding.

Acknowledgments

In this section you can acknowledge any support given which is not covered by the author contribution or funding sections.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Jaynes, E.T. Generalized scattering. In Maximum–Entropy and Bayesian Methods in Inverse Problems; Ray Smith, C., Grandy, W.T. Jr., Eds.; D. Reidel Publishing Company: Dordrecht, 1985; pp. 377–398. [Google Scholar]
Jaynes, E.T. Inferential scattering. 1993. Available online: https://bayes.wustl.edu/etj/articles/cinfscat.pdf (accessed on 03/03/2026).
Amari, S.; Nagaoka, H. Methods of Information Geometry. In AMS Translations of Mathematical Monographs; Harada, D., Translator; AMS and Oxford University Press: New York, 2000; p. 191. [Google Scholar]
Amari, S. I. Information geometry of multiple spike trains. In Analysis of Parallel Spike trains; Springer: Boston, MA, USA, 2010; pp. 221–252. [Google Scholar]
Amari, S. Information geometry and its applications; Springer, 2016. [Google Scholar]
Amari, S. I. Information geometry on hierarchy of probability distributions. IEEE transactions on information theory 2002, 47(5), 1701–1711. [Google Scholar] [CrossRef]
Amari, S. Information geometry and manifolds of neural networks, From Statistical Physics to Statistical Inference and Back; Springer: Dordrecht, The Netherlands, 1994; pp. 113–138. [Google Scholar]
Amari, S.; Kurata, K.; Nagaoka, H. Information Geometry of Boltzmann machines. IEEE Trans. on Neural Networks 1992, 3(2), 260–271. [Google Scholar] [CrossRef] [PubMed]
Nielsen, F. An elementary introduction to information geometry. Entropy 2020, 22.10, 1100. [Google Scholar] [CrossRef] [PubMed]
Newton, Roger G. Scattering theory of waves and particles; Springer Science & Business Media, 2013. [Google Scholar]
Marsden, Jerrold E.; Montgomery, R.; Rațiu, Tudor S. Reduction, symmetry, and phases in mechanics; American Mathematical Soc., 1990; Vol. 436. [Google Scholar]
Favretti, M. Geometry and control of thermodynamic systems described by generalized exponential families. Journal of Geometry and Physics 2022, 176, 104497. [Google Scholar] [CrossRef]

Figure 1. Cartoon picture of a mutually dual foliation in a dually flat space.

Figure 2. Plot of the correlation coefficient

R (1, 2)

as a function of

x = θ_{1}

and

y = θ_{2}

for

(θ_{1}, θ_{2}) \in {[- 5, 5]}^{2}

and

\bar{z} = {\bar{θ}}_{3} = 5

(forward control). The blue line gives the allowed values of

θ_{2} = θ_{2} (θ_{1})

as a consequence of the constraint in effect space (backward control)

η^{3} = {\bar{η}}^{3}

. We have plotted

R (1, 2) \times 5

to enhance visibility in the plots. (a) : side view. (b) : top view.

Figure 2. Plot of the correlation coefficient

R (1, 2)

as a function of

x = θ_{1}

and

y = θ_{2}

for

(θ_{1}, θ_{2}) \in {[- 5, 5]}^{2}

and

\bar{z} = {\bar{θ}}_{3} = 5

(forward control). The blue line gives the allowed values of

θ_{2} = θ_{2} (θ_{1})

as a consequence of the constraint in effect space (backward control)

η^{3} = {\bar{η}}^{3}

. We have plotted

R (1, 2) \times 5

to enhance visibility in the plots. (a) : side view. (b) : top view.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Information Geometry Description of Inferential Scattering

Abstract

Keywords:

Subject:

1. Introduction

2. Dually Flat Manifolds

2.1. Mutually Dual Foliations of a Dually Flat Space

3. Exponential Family Manifold

4. Controlled Evolution in a Dually Flat Manifold

Direct Problem: System in a Heat Bath

Inverse Problem: System Insensitive to the Control in the A Variables

Example: Direct Problem for the Gaussian Exponential Manifold

5. Mixed Control Problem and Inferential Scattering

Mixed Control Problem for a System in a B-Type Heat Bath and Insensitive to the Control in the c Variables

5.1. The Inferential Scattering Formula

5.2. A Mixed Control Example: Neural Network of Two Neurons

6. Discussion

Funding

Acknowledgments

Conflicts of Interest

References

MDPI Initiatives

Important Links

Subscribe