Preprint
Article

This version is not peer-reviewed.

Variations on the Expectation Due to Changes in the Probability Measure

A peer-reviewed article of this preprint also exists.

Submitted:

24 July 2025

Posted:

25 July 2025

You are already at the latest version

Abstract
In this paper, closed-form expressions for the variation of the expectation of a given function due to changes in the probability measure (probability distribution drifts) are presented. They unveil interesting connections with Gibbs probability measures, mutual information, and lautum information.
Keywords: 
;  ;  ;  

1. Introduction

Let m be a positive integer and denote by ( R m ) the set of all probability measures on the measurable space R m , B R m , with B R m being the Borel σ -algebra on R m . Given a Borel measurable function h : R n × R m R , consider the functional
{ G h : R n × ( R m ) × ( R m ) R ( x , P 1 , P 2 ) h ( x , y ) d P 1 ( y ) h ( x , y ) d P 2 ( y ) , .
which quantifies the variation of the expectation of the measurable function h due to changing the probability measure from P 2 to P 1 . These variations are often referred to as probability distribution drifts, in some application areas, see for instance [1,2] and [3]. The functional G h is defined when both integrals exist and are finite.
In order to define the expectation of G h x , P 1 , P 2 when x is obtained by sampling a probability measure in ( R n ) , the structure formalized below is required.
Definition 1. 
A family P Y | X ( P Y | X = x ) x R n of elements of ( R m ) indexed by R n is said to be a conditional probability measure if, for all sets A B R m , the map
{ R n [ 0 , 1 ] x P Y | X = x ( A ) .
is Borel measurable. The set of all such conditional probability measures is denoted by R m | R n .
In this setting, consider the functional
G ¯ h : { R m | R n × R m | R n × R n R P Y | X ( 1 ) , P Y | X ( 2 ) , P X G h x , P Y | X = x ( 1 ) , P Y | X = x ( 2 ) d P X ( x ) .
This quantity can be interpreted as the variation of the integral (expectation) of the function h when the probability measure changes from the joint probability measure P Y | X ( 1 ) P X to another joint probability measure P Y | X ( 2 ) P X , both in R m × R n . This follows from (2) by observing that
G ¯ h P Y | X ( 1 ) , P Y | X ( 2 ) , P X = h ( x , y ) d P Y | X ( 1 ) P X ( y , x ) h ( x , y ) d P Y | X ( 2 ) P X ( y , x ) .
Special attention is given to the quantity G ¯ h P Y , P Y | X , P X , for some P Y | X R m | R n , with P Y being the marginal of the joint probability measure P Y | X · P X . That is, for all sets A B R m ,
P Y A = P Y | X = x A d P X ( x ) .
Its relevance stems from the fact that it captures the variation of the expectation of the function h when the probability measure changes from the joint probability measure P Y | X P X to the product of its marginals P Y P X . That is,
G ¯ h P Y , P Y | X , P X = h ( x , y ) d P Y ( y ) h ( x , y ) d P Y | X = x ( y ) d P X ( x )
= h ( x , y ) d P Y P X ( y , x ) h ( x , y ) d P Y | X P X ( y , x ) .

1.1. Novelty and Contributions

This work makes two key contributions: First, it provides a closed-form expression for the variation G h x , P 1 , P 2 in (1) for a fixed x R n and two arbitrary probability measures P 1 and P 2 , formulated explicitly in terms of relative entropies. Second, it derives a closed-form expression for the expected variation G ¯ h P Y | X ( 1 ) , P Y | X ( 2 ) , P X in (2), again in terms of information measures, for arbitrary conditional probability measures P Y | X ( 1 ) , P Y | X ( 2 ) , and an arbitrary probability measure P X .
A further contribution of this work is the derivation of specific closed-form expressions for G ¯ h P Y , P Y | X , P X in (6), which reveal deep connections with both mutual information [4,5] and lautum information [6]. Notably, when P Y | X is a Gibbs conditional probability measure, this variation simplifies (up to a constant factor) to the sum of the mutual and lautum information induced by the joint distribution P Y | X P X .
Although these results were originally discovered in the analysis of generalization error of machine learning algorithms, see for instance [7,8,9,10,11], where the function h in (1) was assumed to represent an empirical risk, this paper presents such results in a comprehensive and general setting that is no longer tied to such assumptions. Also, strong connections with information projections and Pythagorean identities [12,13] are discussed. This new general presentation not only unifies previously scattered insights but also makes the results applicable across a broad range of domains in which changes in the expectation due to variations of the underlying probability measures are relevant.

1.2. Applications

The study of the variation of the integral (expectation) of h (for some fixed x R n ) due to a measure change from P 2 to P 1 , i.e., the value G h x , P 1 , P 2 in (1), plays a central role in the definition of integral probability metrics (IPMs)[14,15]. Using the notation in (1), an IPM results from the optimization problem
sup h H | G h x , P 1 , P 2 | ,
for some fixed x R n and a particular class of functions H . Note for instance that the maximum mean discrepancy is an IPM [16], as well as the Wasserstein distance of order one [17,18,19,20].
Other areas of mathematics in which the variation G h x , P 1 , P 2 in (1) plays a key role is distributionally robust optimization (DRO) [21,22] and optimization with relative entropy regularization [8,9]. In these areas, the variation G h x , P 1 , P 2 is a central tool. See for instance, [7,23]. Variations of the form G h x , P 1 , P 2 in (1) have also been studied in [10] and [11] in the particular case of statistical machine learning for the analysis of generalization error. The central observation is that the generalization error of machine learning algorithms can be written in the form G ¯ h P Y , P Y | X , P X in (6). This observation is the main building block of the method of gaps introduced in [11], which leads to a number of closed-form expressions for the generalization error involving mutual information, lautum information, among other information measures.

2. Preliminaries

The main results presented in this work involve Gibbs conditional probability measures. Such measures are parametrized by a Borel measurable function h : R n × R m R ; a σ -finite measure Q on R m ; and a vector x R n . Note that the variable x will remain inactive until Section 4. Although it is introduced now for consistency, it could be removed altogether from all results presented in this section and Section 3.
Consider the following function:
K h , Q , x : { R R t log exp t h ( x , y ) d Q y . .
Under the assumption that Q is a probability measure, the function K h , Q , x in (8) is the cumulant generating function of the random variable h ( x , Y ) , for some fixed x R n and Y Q . Using this notation, the definition of the Gibbs conditional probability measure is presented hereunder.
Definition 2 
(Gibbs Conditional Probability Measure). Given a Borel measurable function h : R n × R m R ; a σ-finite measure Q on R m ; and a λ R , the probability measure P Y | X ( h , Q , λ ) R m | R n is said to be an ( h , Q , λ ) -Gibbs conditional probability measure if
x X , K h , Q , x λ < + ;
for some set X R n ; and for all ( x , y ) X × supp Q ,
d P Y | X = x ( h , Q , λ ) d Q y = exp λ h x , y K h , Q , x λ ,
where the function K h , Q , x is defined in (8); supp Q stands for the support of the σ-finite measure Q; and the function d P Y | X = x ( h , Q , λ ) d Q is the Radon-Nikodym derivative [24,25] of the probability measure P Y | X = x ( h , Q , λ ) with respect to Q.
Note that, while P Y | X ( h , Q , λ ) is an ( h , Q , λ ) -Gibbs conditional probability measure, the measure P Y | X = x ( h , Q , λ ) , obtained by conditioning it upon a given vector x X , is referred to as an ( h , Q , λ ) -Gibbs probability measure.
The condition in (9) is easily met under certain assumptions. For instance, if h is a nonnegative function and Q is a finite measure, then it holds for all λ 0 , + . Let Q R m P R m : P Q , with P Q standing for “P absolutely continuous with respect to Q”. The relevance of ( h , Q , λ ) -Gibbs probability measures relies on the fact that under some conditions, they are the unique solutions to problems of the form,
min P Q ( R m ) h ( x , y ) d P ( y ) + 1 λ D ( Q ) , a n d
max P Q ( R m ) h ( x , y ) d P ( y ) + 1 λ D ( Q ) ,
where λ R { 0 } , x R , and D ( Q ) denotes the relative entropy (or KL divergence) of P with respect to Q.
Definition 3 
(Relative Entropy). Given two σ-finite measures P and Q on the same measurable space, such that P is absolutely continuous with respect to Q, the relative entropy of P with respect to Q is
D ( Q ) = d P d Q ( x ) log d P d Q ( x ) d Q ( x ) ,
where the function d P d Q is the Radon-Nikodym derivative of P with respect to Q.
The connection between the optimization problems (11) and (12) and the Gibbs probability measure P Y | X = x ( h , Q , λ ) in (10) has been pointed out by several authors. See for instance, [8] and [26,27,28,29,30,31,32,33,34,35] for the former; and [10] together with [36,37,38] for the latter. In these references a variety of assumptions and proof techniques have been used to prove such connections. A general and unified statement of these observations is presented hereunder.
Lemma 1. 
Assume that the optimization problem in (11) (respectively, in (12)) admits a solution. Then, if λ > 0 (respectively, if λ < 0 ), the probability measure P Y | X = x ( h , Q , λ ) in (10) is the unique solution.
Proof
For the case in which λ > 0 , the proof follows the same approach as the proof of [8]. Alternatively, for the case in which λ < 0 , the proof follows along the lines of the proof of [10]. □
The following lemma highlights a key property of ( h , Q , λ ) -Gibbs probability measures.
Lemma 2. 
Given an ( h , Q , λ ) -Gibbs probability measure, denoted by P Y | X = x ( h , Q , λ ) , with x R n ,
1 λ K h , Q , x λ = h ( x , y ) d Q y 1 λ D ( P Y | X = x ( h , Q , λ ) )
= h ( x , y ) d P Y | X = x ( h , Q , λ ) y + 1 λ D ( Q ) ;
moreover, if λ > 0 ,
1 λ K h , Q , x λ = min P Q ( R m ) h ( x , y ) d P ( y ) + 1 λ D ( Q ) ;
alternatively, if λ < 0 ,
1 λ K h , Q , x λ = max P Q ( R m ) h ( x , y ) d P ( y ) + 1 λ D ( Q ) ,
where the function K h , Q , x is defined in (8).
Proof
The proof of (15) follows from taking the logarithm of both sides of (10) and integrating with respect to P Y | X = x ( h , Q , λ ) . As for the proof of (14), it follows by noticing that for all ( x , y ) R n × supp Q , the Radon-Nikodym derivative d P Y | X = x ( h , Q , λ ) d Q y in (10) is strictly positive. Thus, from [39], it holds that d Q d P Y | X = x ( h , Q , λ ) y = d P Y | X = x ( h , Q , λ ) d Q y 1 . Hence, taking the negative logarithm of both sides of (10) and integrating with respect to Q leads to (14). Finally, the equalities in (16) and (17) follow from Lemma 1 and (15). □
Lemma 2, at least equalities (15), (16), and (17), can be seen as an immediate restatement of Donsker-Varadhan variational representation of the relative entropy [40]. Alternative interesting proofs for (14) have been presented by several authors including [10] and [35]. A proof for (15) appears in [29] in the specific case of λ > 0 .
The following lemma introduces the main building block of this work, which is a characterization of the variation from the probability measure P Y | X = x ( h , Q , λ ) in (10) to an arbitrary measure P Q R m , i.e., G h x , P , P Y | X = x ( h , Q , λ ) . Such a result appeared for the first time in [7] for the case in which λ > 0 ; and in [10] for the case in which λ < 0 , in different contexts of statistical machine learning. A general and unified statement of such results is presented hereunder.
Lemma 3. 
Consider an ( h , Q , λ ) -Gibbs probability measure, denoted by P Y | X = x ( h , Q , λ ) R m , with λ 0 and x R . For all P Q R m ,
G h x , P , P Y | X = x ( h , Q , λ ) = 1 λ D ( P Y | X = x ( h , Q , λ ) ) + D ( Q ) D ( Q ) .
Proof
The proof follows along the lines of the proofs of [7] for the case in which λ > 0 ; and in [10] for the case in which λ < 0 . A unified proof is presented hereunder by noticing that for all P Q R m ,
D ( P Y | X = x ( h , Q , λ ) ) = log d P d P Y | X = x ( h , Q , λ ) ( y ) d P ( y )
= log d Q d P Y | X = x ( h , Q , λ ) ( y ) d P d Q ( y ) d P ( y )
= log d Q d P Y | X = x ( h , Q , λ ) ( y ) d P ( y ) + D ( Q )
= λ h ( x , y ) d P ( y ) + K h , Q , x λ + D ( Q )
= λ G h x , P , P Y | X = x ( h , Q , λ ) D ( Q ) + D ( Q ) ,
where (20) follows from [39]; (22) follows from [39] and (10); and (23) follows from (15). □
It is interesting to highlight that G h x , P , P Y | X = x ( h , Q , λ ) in (18) characterizes the variation of the expectation of the function h ( x , · ) : R m R , when λ > 0 (resp. λ < 0 ) and the probability measure changes from the solution to the optimization problem (11) (resp. (12)) to an alternative measure P. This result takes another perspective if it is seen in the context of information projections [13]. Let Q be a probability measure and S Q R m be a convex set. From [13], it holds that for all measures P S ,
D ( Q ) D ( P ) + D ( Q ) ,
where, P satifies
P arg min P S D ( Q ) .
In the particular case in which the set S in (24) satisfies
S P Q R m : h ( x , y ) d P ( y ) = c ,
for some real c, with the vector x and the function h defined in Lemma 3, the optimal measure P in (25) is the Gibbs probability measure P Y | X = x ( h , Q , λ ) in (10), with λ > 0 chosen to satisfy
h ( x , y ) d P Y | X = x ( h , Q , λ ) ( y ) = c .
The case in which the measure Q in (25) is a σ -finite measure, for instance, either the Lebesgue measure or the counting measure, respectively leads to the classical framework of differential entropy maximization or discrete entropy maximization, which have been studied under particular assumptions on the set S in [36,37] and [38].
When the reference measure Q is a probability measure, under the assumption that (27) holds, it follows from [13] that for all P S , with S in (26),
D ( Q ) = D ( P Y | X = x ( h , Q , λ ) ) + D ( Q ) ,
which is known as the Pythagorean theorem for relative entropy. Such a geometric interpretation follows from admitting relative entropy as an analog of squared Euclidean distance. The first appearance of such a “Pythagorean theorem” was in [12] and was later revisited in [13]. Interestingly, the same result can be obtained from Lemma 3 by noticing that for all P S , with S in (26),
G h x , P , P Y | X = x ( h , Q , λ ) = 0 .
The converse of the Pythagorean theorem [41] together with Lemma 3, lead to the geometric construction shown in Figure 1. A similar interpretation was also presented in [11] and [11] in the context of the generalization error of machine learning algorithms. The former considers λ > 0 , while the latter considers λ < 0 . Nonetheless, the interpretation in Figure 1 is general and independent of such an application.
The relevance of Lemma 3, with respect to this body of literature on information projections, follows from the fact that Q might be a σ -finite measure, which is a class of measures that includes the class of probability measures, and thus, unifies the results separately obtained in the realm of maximum entropy methods and information-projection methods. More importantly, when P S , with S in (26), it might hold that G h x , P , P Y | X = x ( h , Q , λ ) < 0 or G h x , P , P Y | X = x ( h , Q , λ ) > 0 , with G h x , P , P Y | X = x ( h , Q , λ ) in (18), which resonates with the fact that ( h , Q , λ ) -Gibbs conditional probability measures are also related to another class of optimization problems, as shown by the following lemma.
Lemma 4. 
Assume that the following optimization problems possess at least one solution for some x R n ,
min P Q R m h ( x , y ) d P ( y )
s . t . D ( Q ) ρ .
and
max P Q R m h ( x , y ) d P ( y )
s . t . D ( Q ) ρ .
Consider the ( h , Q , λ ) -Gibbs probability measure P Y | X = x ( h , Q , λ ) in (10), with λ R { 0 } such that ρ = D ( Q ) . Then, the ( h , Q , λ ) -Gibbs probability measure P Y | X = x ( h , Q , λ ) is a solution to (4) if λ > 0 ; or to (4) if λ < 0 .
Proof
Note that if λ > 0 , then, 1 λ D ( P Y | X = x ( h , Q , λ ) ) 0 . Hence, from Lemma 3, it holds that for all probability mesures P such that D ( Q ) ρ ,
G h x , P , P Y | X = x ( h , Q , λ ) 1 λ D ( Q ) D ( Q )
= 1 λ ρ D ( Q )
0 ,
with equality if D ( P Y | X = x ( h , Q , λ ) ) = 0 . This implies that P Y | X = x ( h , Q , λ ) is a solution to (4). Note also that if λ < 0 , from Lemma 3, it holds that for all probability mesures P such that D ( Q ) ρ ,
G h x , P , P Y | X = x ( h , Q , λ ) 1 λ D ( Q ) D ( Q )
= 1 λ ρ D ( Q )
0 ,
with equality if D ( P Y | X = x ( h , Q , λ ) ) = 0 . This implies that P Y | X = x ( h , Q , λ ) is a solution to (4). □

3. Characterization of  G h x , P 1 , P 2 in (1)

The main result of this section is the following theorem.
Theorem 1. 
For all probability measures P 1 and P 2 , both absolutely continuous with respect to a given σ-finite measure Q on R m , the variation G h x , P 1 , P 2 in (1) satisfies,
G h x , P 1 , P 2 = 1 λ ( D ( P Y | X = x h , Q , λ ) D ( P Y | X = x h , Q , λ ) + D ( Q ) D ( Q ) ) ,
where the probability measure P Y | X = x h , Q , λ , with λ 0 , is an ( h , Q , λ ) -Gibbs probability measure.
Proof
The proof follows from Lemma 3 and by observing that
G h x , P 1 , P 2 = G h x , P 1 , P Y | X = x ( h , Q , λ ) G h x , P 2 , P Y | X = x ( h , Q , λ ) .
Theorem 1 might be particularly simplified in the case in which the reference measure Q is a probability measure. Consider for instance the case in which P 1 P 2 (or P 2 P 1 ). In such a case, the reference measure might be chosen as P 2 (or P 1 ), as shown hereunder.
Corollary 1. 
Consider the variation G h x , P 1 , P 2 in (1). If the probability measure P 1 is absolutely continuous with respect to P 2 , then,
G h x , P 1 , P 2 = 1 λ ( D ( P Y | X = x h , P 2 , λ ) D ( P Y | X = x h , P 2 , λ ) D ( P 2 ) ) .
Alternatively, if the probability measure P 2 is absolutely continuous with respect to P 1 , then,
G h x , P 1 , P 2 = 1 λ ( D ( P Y | X = x h , P 1 , λ ) D ( P Y | X = x h , P 1 , λ ) + D ( P 1 ) ) ,
where the probability measures P Y | X = x h , P 1 , λ and P Y | X = x h , P 2 , λ are respectively ( h , P 1 , λ ) - and ( h , P 2 , λ ) -Gibbs probability measures, with λ 0 .
In the case in which neither P 1 is absolutely continuous with respect to P 2 ; nor P 2 is absolutely continuous with respect to P 1 , the reference measure Q in Theorem 1 can always be chosen as a convex combination of P 1 and P 2 . That is, for all Borel sets A B R m , Q A = α P 1 A + ( 1 λ ) P 2 A , with α ( 0 , 1 ) .
Theorem 1 can be specialized to the specific cases in which Q is the Lebesgue or the counting measure.
If Q is the Lebesgue measure the probability measures P 1 and P 2 in (38) admit probability density functions f 1 and f 2 , respectively. Moreover, the terms D ( Q ) and D ( Q ) are Shannon’s differential entropies [4] induced by P 1 and P 2 , denoted by h ( P 1 ) and h ( P 2 ) , respectively. That is, for all i { 1 , 2 } ,
h ( P i ) f i ( x ) log f i ( x ) d x .
The probability measure P Y | X = x h , Q , λ , with λ 0 , x R n , and Q the Lebesgue measure, possesses a probability density function, denoted by f Y | X = x h , Q , λ : R m ( 0 , + ) , which satisfies
f Y | X = x h , Q , λ ( y ) = exp λ h ( x , y ) exp λ h ( x , y ) d y .
If Q is the counting measure the probability measures P 1 and P 2 in (38) admit probability mass functions p 1 : Y [ 0 , 1 ] and p 2 : Y [ 0 , 1 ] , with Y a countable subset of R m . Moreover, D ( Q ) and D ( Q ) are respectively Shannon’s discrete entropies [4] induced by P 1 and P 2 , denoted by H ( P 1 ) and H ( P 2 ) , respectively. That is, for all i { 1 , 2 } ,
H ( P i ) y Y p i ( y ) log p i ( y ) .
The probability measure P Y | X = x h , Q , λ , with λ 0 and Q the counting measure, possesses a conditional probability mass function, denoted by p Y | X = x h , Q , λ : Y ( 0 , + ) , which satisfies
p Y | X = x h , Q , λ ( y ) = exp λ h ( x , y ) y Y exp λ h ( x , y ) .
These observations lead to the following corollary of Theorem 1.
Corollary 2. 
Given two probability measures P 1 and P 2 , with probability density functions f 1 and f 2 respectively, the variation G h x , P 1 , P 2 in (1) satisfies,
G h x , P 1 , P 2 = 1 λ ( D ( P Y | X = x h , Q , λ ) D ( P Y | X = x h , Q , λ ) h P 2 + h P 1 ) ,
where the probability density function of the measure P Y | X = x h , Q , λ , with λ 0 and Q the Lebesgue measure, is defined in (42); and the entropy functional h is defined in (41). Alternatively, given two probability measures P 1 and P 2 , with probability mass functions p 1 and p 2 respectively, the variation G h x , P 1 , P 2 in (1) satisfies,
G h x , P 1 , P 2 = 1 λ ( D ( P Y | X = x h , Q , λ ) D ( P Y | X = x h , Q , λ ) H P 2 + H P 1 ) ,
where the probability mass function of the measure P Y | X = x h , Q , λ , with λ 0 and Q the counting measure, is defined in (44); and the entropy functional H is defined in (43).

4. Characterizations of  G ¯ h P Y | X ( 1 ) , P Y | X ( 2 ) , P X in (2)

The main result of this section is a characterization of G ¯ h P Y | X ( 1 ) , P Y | X ( 2 ) , P X in (2).
Theorem 2. 
Consider the variation G ¯ h P Y | X ( 1 ) , P Y | X ( 2 ) , P X in (2) and assume that for all x supp P X , the probability measures P Y | X = x ( 1 ) and P Y | X = x ( 2 ) are both absolutely continuous with respect to a σ-measure Q. Then,
G ¯ h P Y | X ( 1 ) , P Y | X ( 2 ) , P X = 1 λ ( D ( P Y | X = x h , Q , λ ) D ( P Y | X = x h , Q , λ )
+ D ( Q ) D ( Q ) ) d P X ( x ) ,
where the probability measure P Y | X h , Q , λ , with λ 0 , is an ( h , Q , λ ) -Gibbs conditional probability measure.
Proof
The proof follows from (2) and Theorem 1. □
Two special cases are particularly noteworthy.

When the reference measure Q is the Lebesgue measure

both D ( Q ) d P X ( x ) and D ( Q ) d P X ( x ) in (47) become Shannon’s differential conditional entropies, denoted by h P Y | X ( 1 ) | P X and h P Y | X ( 2 ) | P X , respectively. That is, for all i { 1 , 2 } ,
h P Y | X ( i ) | P X h P Y | X = x ( i ) d P X ( x ) ,
where h is the entropy functional in (41).

When the reference measure Q is the counting measure

both D ( Q ) d P X ( x ) and D ( Q ) d P X ( x ) in (47) become Shannon’s discrete conditional entropies, denoted by H P Y | X ( 1 ) | P X and H P Y | X ( 2 ) | P X , respectively. That is, for all i { 1 , 2 } ,
H P Y | X ( i ) | P X H P Y | X = x ( i ) d P X ( x ) ,
where H is the entropy functional in (43).
These observations lead to the following corollary of Theorem 2.
Corollary 3. 
Consider the variation G ¯ h P Y | X ( 1 ) , P Y | X ( 2 ) , P X in (2) and assume that for all x supp P X , the probability measures P Y | X = x ( 1 ) and P Y | X = x ( 2 ) possess probability density functions. Then,
G ¯ h P Y | X ( 1 ) , P Y | X ( 2 ) , P X = 1 λ ( D ( P Y | X = x h , Q , λ ) D ( P Y | X = x h , Q , λ ) ) d P X ( x )
1 λ h P Y | X ( 2 ) | P X + 1 λ h P Y | X ( 1 ) | P X ,
where the probability density function of the measure P Y | X = x h , Q , λ , with λ 0 and Q the Lebesgue measure, is defined in (42); and for all i { 1 , 2 } , the conditional entropy h P Y | X ( i ) | P X is defined in (48). Alternatively, assume that for all x supp P X , the probability measures P Y | X = x ( 1 ) and P Y | X = x ( 2 ) possess probability mass functions. Then,
G ¯ h P Y | X ( 1 ) , P Y | X ( 2 ) , P X = 1 λ ( D ( P Y | X = x h , Q , λ ) D ( P Y | X = x h , Q , λ ) ) d P X ( x )
1 λ H P Y | X ( 2 ) | P X + 1 λ H P Y | X ( 1 ) | P X ,
where the probability mass function of the measure P Y | X = x h , Q , λ , with λ 0 and Q the counting measure, is defined in (44); and for all i { 1 , 2 } , the conditional entropy H P Y | X ( i ) | P X is defined in (49).
Note that, from (2), it follows that the general expression for the expected variation G ¯ h P Y | X ( 1 ) , P Y | X ( 2 ) , P X might be simplified according to Corollary 1. For instance, if for all x supp P X , the probability measure P Y | X = x ( 1 ) is absolutely continuous with respect to P Y | X = x ( 2 ) , the measure P Y | X = x ( 2 ) can be chosen to be the reference measure in the calculation of G h x , P Y | X = x ( 1 ) , P Y | X = x ( 2 ) in (2). This observation leads to the following corollary of Theorem 2.
Corollary 4. 
Consider the variation G ¯ h P Y | X ( 1 ) , P Y | X ( 2 ) , P X in (2) and assume that for all x supp P X , P Y | X = x ( 1 ) P Y | X = x ( 2 ) . Then,
G ¯ h P Y | X ( 1 ) , P Y | X ( 2 ) , P X = 1 λ ( D ( P Y | X = x h , P Y | X = x ( 2 ) , λ ) D ( P Y | X = x h , P Y | X = x ( 2 ) , λ )
D ( P Y | X = x ( 2 ) ) ) d P X ( x ) .
Alternatively, if for all x supp P X , the probability measure P Y | X = x ( 2 ) is absolutely continuous with respect to P Y | X = x ( 1 ) , then,
G ¯ h P Y | X ( 1 ) , P Y | X ( 2 ) , P X = 1 λ ( D ( P Y | X = x h , P Y | X = x ( 1 ) , λ ) D ( P Y | X = x h , P Y | X = x ( 1 ) , λ )
+ D ( P Y | X = x ( 1 ) ) ) d P X ( x ) ,
where the measures P Y | X = x h , P Y | X = x ( 1 ) , λ and P Y | X = x h , P Y | X = x ( 2 ) , λ are respectively ( h , P Y | X = x ( 1 ) , λ ) - and ( h , P Y | X = x ( 2 ) , λ ) -Gibbs probability measures.
The Gibbs probability measures P Y | X = x h , P Y | X = x ( 1 ) , λ and P Y | X = x h , P Y | X = x ( 2 ) , λ in Corollary 4 are particularly interesting as their reference measures depend on x. Gibbs measures of this form appear, for instance, in [8].

5. Characterizations of  G ¯ h P Y , P Y | X , P X in (6)

The main result of this section is a characterization of G ¯ h P Y , P Y | X , P X in (6), which describes the variation of the expectation of the function h when the probability measure changes from the joint probability measure P Y | X P X to the product of its marginals P Y · P X . This result is presented hereunder and involves the mutual information I P Y | X ; P X and lautum information L P Y | X ; P X , defined as follows:
I P Y | X ; P X D ( P Y ) d P X ( x ) ; a n d
L P Y | X ; P X D ( P Y | X = x ) d P X ( x ) .
Theorem 3. 
Consider the expected variation G ¯ h P Y , P Y | X , P X in (6) and assume that, for all x supp P X :
  • The probability measures P Y and P Y | X = x are both absolutely continuous with respect to a given σ-finite measure Q; and
  • The probability measures P Y and P Y | X = x are mutually absolutely continuous.
Then, it follows that
G ¯ h P Y , P Y | X , P X = 1 λ ( I P Y | X ; P X + L P Y | X ; P X + log d P Y | X = x d P Y | X = x h , Q , λ ( y ) d P Y ( y ) d P X ( x ) log d P Y | X = x d P Y | X = x h , Q , λ ( y ) d P Y | X = x ( y ) d P X ( x ) ) ,
where the probability measure P Y | X h , Q , λ , with λ 0 , is an ( h , Q , λ ) -Gibbs conditional probability measure.
Proof
The proof is presented in Appendix A. □
An alternative expression for G ¯ h P Y , P Y | X , P X in (6) involving only relative entropies is presented by the following theorem.
Theorem 4. 
Consider the expected variation G ¯ h P Y , P Y | X , P X in (6) and assume that, for all x supp P X , the probability measure P Y | X = x is absolutely continuous with respect to a given σ-finite measure Q. Then, it follows that
G ¯ h P Y , P Y | X , P X = 1 λ ( D ( P Y | X = x 1 h , Q , λ ) D ( P Y | X = x 2 h , Q , λ ) ) d P X ( x 1 ) d P X ( x 2 ) ,
where P Y | X h , Q , λ , with λ 0 , is an ( h , Q , λ ) -Gibbs conditional probability measure.
Proof
The proof is presented in Appendix B. □
Theorem 4 expresses the variation G ¯ h P Y , P Y | X , P X in (6) as the expectation (w.r.t. P X P X ) of a comparison of the conditional probability measure P Y | X with a Gibbs conditional probability measure P Y | X h , Q , λ via relative entropy. More specifically, the expression consists in the expectation of the difference of two relative entropies. The former compares P Y | X = x 1 with P Y | X = x 2 h , Q , λ , where ( x 1 , x 2 ) X × X are independently sampled from the same probability measure P X . The latter compares these two conditional measures conditioning on the same element of X . That is, it compares P Y | X = x 2 with P Y | X = x 2 h , Q , λ .
An interesting observation from Theorem 3 and Theorem 4 is that the last two terms in the right-hand side of (56) are both zero in the case in which P Y | X is an ( h , Q , λ ) -Gibbs conditional probability measure. Similarly, in such a case, the second term in the right-hand side of (57) is also zero. This observation is highlighted by the following corollary.
Corollary 5. 
Consider an ( h , Q , λ ) -Gibbs conditional probability measure, denoted by P Y | X ( h , Q , λ ) R m | R n , with λ 0 ; and a probability measure P X R n . Let the measure P Y ( h , Q , λ ) R m be such that for all sets A B R m ,
P Y ( h , Q , λ ) A = P Y | X = x ( h , Q , λ ) A d P X ( x ) .
Then,
G ¯ h P Y ( h , Q , λ ) , P Y | X ( h , Q , λ ) , P X = 1 λ I P Y | X ( h , Q , λ ) ; P X + L P Y | X ( h , Q , λ ) ; P X
= 1 λ D ( P Y | X = x 1 h , Q , λ ) d P X ( x 1 ) d P X ( x 2 ) .
Note that mutual information and lautum information are both nonnegative information measures, which from Corollary 5, implies that G ¯ h P Y ( h , Q , λ ) , P Y | X ( h , Q , λ ) , P X in (59) might be either positive or negative depending exclusively on the sign of the regularization factor λ . The following corollary exploits such an observation to present a property of Gibbs conditional probability measures and their corresponding marginal probability measures.
Corollary 6. 
Given a probability measure P X R n , the ( h , Q , λ ) -Gibbs conditional probability measure P Y | X h , Q , λ in (10) and the probability measure P Y ( h , Q , λ ) in (58) satisfy
h ( x , y ) d P Y h , Q , λ ( y ) d P X ( x ) h ( x , y ) d P Y | X = x h , Q , λ ( y ) d P X ( x ) i f λ > 0 ;
or
h ( x , y ) d P Y h , Q , λ ( y ) d P X ( x ) h ( x , y ) d P Y | X = x h , Q , λ ( y ) d P X ( x ) i f λ < 0 .
Corollary 6 highlights the fact that a deviation from the joint probability measure P Y | X h , Q , λ P X Y × X to the product of its marignals P Y h , Q , λ P X Y × X might increase or decrease the expectation of the function h depending on the sign of λ .

6. Final Remarks

A simple re-formulation of Varandan’s variational representation of relative entropy (Lemma 2) has been harnessed to provide an explicit expression of the variation of the expectation of a multi-dimensional real function when the probability measure changes from a Gibbs probability measure to an arbitrary measure (Lemma 3). This result reveals strong connections with information projection methods, Pythagorean identities involving relative entropy, and mean optimization problems constrained to a neighborhood around a reference measure, where the neighborhood is defined via an upper bound on relative entropy with respect to the reference measure (Lemma 4). An algebraic manipulation on Lemma 3 leads to an explicit expression for the variation of the expectation under study when the probability measure changes from an arbitrary measure to another arbitrary measure (Theorem 1). The astonishing simplicity in the proof, which is straight forward from Lemma 3, contrasts with the generality of the result. In particular, the only assumption is that both measures, before and after the variation, are absolutely continuous with a reference measure. The underlying observation is the central role played by Gibbs probability distributions in this analysis. In particular, such a variation is expressed in terms of comparisons, via relative entropy, of the initial and final probability measures with respect to the Gibbs probability measure specifically built for the function under study. Interestingly, the reference measure of such Gibbs probability measures can be freely chosen beyond probability measures. When such a reference is a σ -finite measure, e.g., Lebesgue measure or a counting measure, the expressions mentioned above include Shannon’s fundamental information measures, e.g., entropy and conditional entropy (Corollary 2).
Using these initial results, the variations of expectations of multi-dimensional functions due to variations of joint probability measures has been studied. In this case, the focus has been on two particular measure changes, which unveil connections with both mutual and lautum information. First, one of the marginal probability measures remains the same after the change (Theorem 2); and second, the joint probability measure changes to the product of its marginals (Theorem 3 and Theorem 4). In the case of Gibbs joint probability measures, these expressions involve exclusively well known information measures: mutual information; lautum information; and relative entropy. These expressions reveal general connections between the variation in the expectation of arbitrary functions, induced by changes in the underlying probability measure, and both mutual and lautum information. These connections extend beyond those previously established in the analysis of generalization error in machine learning algorithms; see, for instance, [26] and [8].

Author Contributions

All authors contributed equally to this research.

Funding

This research was supported in part by the European Commission through the H2020-MSCA-RISE-2019 project 872172; the French National Agency for Research (ANR) through the Project ANR-21-CE25-0013 and the project ANR-22-PEFT-0010 of the France 2030 program PEPR Réseaux du Futur; and in part with the Agence de l’innovation de défense (AID) through the project UK-FR 2024352

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Proof of Theorem 3

The proof follows from Theorem 2, which holds under assumption ( a ) and leads to
G ¯ h P Y , P Y | X , P X = 1 λ ( D ( P Y | X = x h , Q , λ ) D ( P Y | X = x h , Q , λ ) + D ( Q ) D ( Q ) ) d P X ( x ) .
The proof continues by noticing that
D ( Q ) d P X ( x ) = log d P Y | X = x d Q ( y ) d P Y | X = x ( y ) d P X ( x )
= log d P Y | X = x d P Y ( y ) d P Y d Q ( y ) d P Y | X = x ( y ) d P X ( x ) = log d P Y | X = x d P Y ( y ) d P Y | X = x ( y ) d P X ( x )
+ log d P Y d Q ( y ) d P Y | X = x ( y ) d P X ( x ) = log d P Y | X = x d P Y ( y ) d P Y | X = x ( y ) d P X ( x )
+ d P Y | X = x d P Y ( y ) log d P Y d Q ( y ) d P Y ( y ) d P X ( x ) = log d P Y | X = x d P Y ( y ) d P Y | X = x ( y ) d P X ( x )
+ log d P Y d Q ( y ) d P Y | X = x d P Y ( y ) d P X ( x ) d P Y ( y ) = log d P Y | X = x d P Y ( y ) d P Y | X = x ( y ) d P X ( x )
+ log d P Y d Q ( y ) d P Y ( y )
= I P Y | X ; P X + D ( Q ) ,
where () follows from [39]; () follows from [39]; and () follows from [39], which implies that for all y R m , d P Y | X = x d P Y ( y ) d P X ( x ) = 1 .
Note also that
D ( P Y | X = x h , Q , λ ) d P X ( x ) = log d P Y d P Y | X = x h , Q , λ d P Y ( y ) d P X ( x )
= log d P Y d P Y | X = x ( y ) d P Y | X = x d P Y | X = x h , Q , λ ( y ) d P Y ( y ) d P X ( x ) = log d P Y d P Y | X = x ( y ) d P Y ( y ) d P X ( x )
+ log d P Y | X = x d P Y | X = x h , Q , λ ( y ) d P Y ( y ) d P X ( x )
= L P Y | X = x ; P X + log d P Y | X = x d P Y | X = x h , Q , λ ( y ) d P Y ( y ) d P X ( x ) ,
where (A9) follows from [39]. Finally, using () and () in (A1) yields (56), which completes the proof.

Appendix B. Proof of Theorem 4

The proof follows by observing that the functional G ¯ h in (6) satisfies
G ¯ h P Y , P Y | X , P X = h ( x 2 , y ) d P Y | X = x 1 ( y ) h ( x 1 , y ) d P Y | X = x 1 ( y ) d P X ( x 2 ) d P X ( x 1 ) .
Using the functional G h in (1), the terms above can be written as follows
h ( x 1 , y ) d P Y | X = x 1 ( y ) = G h x 1 , P Y | X = x 1 , P Y | X = x 1 h , Q , λ + h ( x 1 , y ) d P Y | X = x 1 h , Q , λ ( y ) , and
h ( x 2 , y ) d P Y | X = x 1 ( y ) = G h x 2 , P Y | X = x 1 , P Y | X = x 2 h , Q , λ + h ( x 2 , y ) d P Y | X = x 2 h , Q , λ ( y ) .
Using (A14) and () in (A13) yields
G ¯ h P Y , P Y | X , P X
= ( G h x 2 , P Y | X = x 1 , P Y | X = x 2 h , Q , λ + h ( x 2 , y ) d P Y | X = x 2 h , Q , λ ( y ) G h x 1 , P Y | X = x 1 , P Y | X = x 1 h , Q , λ h ( x 1 , y ) d P Y | X = x 1 h , Q , λ ( y ) ) d P X ( x 2 ) d P X ( x 1 ) ,
= ( G h x 2 , P Y | X = x 1 , P Y | X = x 2 h , Q , λ G h x 1 , P Y | X = x 1 , P Y | X = x 1 h , Q , λ ) d P X ( x 2 ) d P X ( x 1 )
= 1 λ ( D ( P Y | X = x 1 h , Q , λ ) D ( P Y | X = x 2 h , Q , λ ) ) d P X ( x 1 ) d P X ( x 2 ) ,
where the last equality holds from Lemma 3, which implies
G h x 2 , P Y | X = x 1 , P Y | X = x 2 h , Q , λ d P X ( x 2 ) d P X ( x 1 ) = 1 λ D ( P Y | X = x 2 ( h , Q , λ ) ) 1 λ d P X ( x 2 ) d P X ( x 1 ) + 1 λ D ( Q ) d P X ( x 2 ) 1 λ D ( Q ) d P X ( x 1 ) ,
and
G h x 1 , P Y | X = x 1 , P Y | X = x 1 h , Q , λ d P X ( x 2 ) d P X ( x 1 )
= G h x 1 , P Y | X = x 1 , P Y | X = x 1 h , Q , λ d P X ( x 1 ) = 1 λ D ( P Y | X = x 1 ( h , Q , λ ) ) d P X ( x 1 ) + 1 λ D ( Q ) d P X ( x 1 )
1 λ D ( Q ) d P X ( x 1 ) ,
which completes the proof.

References

  1. Gama, J.; Medas, P.; Castillo, G.; Rodrigues, P. Learning with drift detection. In Proceedings of the Proceedings of the 17th Brazilian Symposium on Artificial Intelligence, Sao Luis, Maranhao, Brazil, Oct. 2004; pp. 286–295.
  2. Webb, G.I.; Lee, L.K.; Goethals, B.; Petitjean, F. Analyzing concept drift and shift from sample data. Data Mining and Knowledge Discovery 2018, 32, 1179–1199. [Google Scholar] [CrossRef]
  3. Oliveira, G.H.F.M.; Minku, L.L.; Oliveira, A.L. Tackling virtual and real concept drifts: An adaptive Gaussian mixture model approach. IEEE Transactions on Knowledge and Data Engineering 2021, 35, 2048–2060. [Google Scholar] [CrossRef]
  4. Shannon, C.E. A mathematical theory of communication. The Bell System Technical Journal 1948, 27, 379–423. [Google Scholar] [CrossRef]
  5. Shannon, C.E. A mathematical theory of communication. The Bell System Technical Journal 1948, 27, 623–656. [Google Scholar] [CrossRef]
  6. Palomar, D.P.; Verdú, S. Lautum information. IEEE Transactions on Information Theory 2008, 54, 964–975. [Google Scholar] [CrossRef]
  7. Perlaza, S.M.; Esnaola, I.; Bisson, G.; Poor, H.V. On the Validation of Gibbs Algorithms: Training Datasets, Test Datasets and their Aggregation. In Proceedings of the Proceedings of the IEEE International Symposium on Information Theory (ISIT), Taipei, Taiwan, Jun. 2023.
  8. Perlaza, S.M.; Bisson, G.; Esnaola, I.; Jean-Marie, A.; Rini, S. Empirical Risk Minimization with Relative Entropy Regularization. IEEE Transactions on Information Theory 2024, 70, 5122–5161. [Google Scholar] [CrossRef]
  9. Zou, X.; Perlaza, S.M.; Esnaola, I.; Altman, E. Generalization Analysis of Machine Learning Algorithms via the Worst-Case Data-Generating Probability Measure. In Proceedings of the Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, Canada, Feb.
  10. Zou, X.; Perlaza, S.M.; Esnaola, I.; Altman, E.; Poor, H.V. The Worst-Case Data-Generating Probability Measure in Statistical Learning. IEEE Journal on Selected Areas in Information Theory 2024, 5, 175–189. [Google Scholar] [CrossRef]
  11. Perlaza, S.M.; Zou, X. The Generalization Error of Machine Learning Algorithms. arXiv 2024. [Google Scholar] [CrossRef]
  12. Chentsov, N.N. Nonsymmetrical distance between probability distributions, entropy and the theorem of Pythagoras. Mathematical notes of the Academy of Sciences of the USSR 1968, 4, 686–691. [Google Scholar] [CrossRef]
  13. Csiszár, I.; Matus, F. Information projections revisited. IEEE Transactions on Information Theory 2003, 49, 1474–1490. [Google Scholar] [CrossRef]
  14. Müller, A. Integral probability metrics and their generating classes of functions. Advances in applied probability 1997, 29, 429–443. [Google Scholar] [CrossRef]
  15. Zolotarev, V.M. Probability metrics. Teoriya Veroyatnostei i ee Primeneniya 1983, 28, 264–287. [Google Scholar] [CrossRef]
  16. Gretton, A.; Borgwardt, K.M.; Rasch, M.J.; Schölkopf, B.; Smola, A. A Kernel Two-Sample Test. Journal of Machine Learning Research 2012, 13, 723–773. [Google Scholar]
  17. Villani, C. Optimal transport: Old and new, first ed.; Springer: Berlin, Germany, 2009. [Google Scholar]
  18. Liu, W.; Yu, G.; Wang, L.; Liao, R. An Information-Theoretic Framework for Out-of-Distribution Generalization with Applications to Stochastic Gradient Langevin Dynamics. arXiv 2024, arXiv:2403.19895 2024. [Google Scholar]
  19. Liu, W.; Yu, G.; Wang, L.; Liao, R. An Information-Theoretic Framework for Out-of-Distribution Generalization. In Proceedings of the Proceedings of the IEEE International Symposium on Information Theory (ISIT), Athens, Greece, July 2024; pp. 2670–2675.
  20. Agrawal, R.; Horel, T. Optimal Bounds between f-Divergences and Integral Probability Metrics. Journal of Machine Learning Research 2021, 22, 1–59. [Google Scholar]
  21. Rahimian, H.; Mehrotra, S. Frameworks and results in distributionally robust optimization. Open Journal of Mathematical Optimization 2022, 3, 1–85. [Google Scholar] [CrossRef]
  22. Xu, C.; Lee, J.; Cheng, X.; Xie, Y. Flow-based distributionally robust optimization. IEEE Journal on Selected Areas in Information Theory 2024, 5, 62–77. [Google Scholar] [CrossRef]
  23. Hu, Z.; Hong, L.J. Kullback-Leibler divergence constrained distributionally robust optimization. Optimization Online 2013, 1, 9. [Google Scholar]
  24. Radon, J. Theorie und Anwendungen der absolut additiven Mengenfunktionen, first ed.; Hölder: Vienna, Austria, 1913. [Google Scholar]
  25. Nikodym, O. Sur une généralisation des intégrales de MJ Radon. Fundamenta Mathematicae 1930, 15, 131–179. [Google Scholar] [CrossRef]
  26. Aminian, G.; Bu, Y.; Toni, L.; Rodrigues, M.; Wornell, G. An Exact Characterization of the Generalization Error for the Gibbs Algorithm. Advances in Neural Information Processing Systems 2021, 34, 8106–8118. [Google Scholar]
  27. Perlaza, S.M.; Bisson, G.; Esnaola, I.; Jean-Marie, A.; Rini, S. Empirical Risk Minimization with Relative Entropy Regularization: Optimality and Sensitivity. In Proceedings of the Proceedings of the IEEE International Symposium on Information Theory (ISIT), Espoo, Finland, Jul. 2022; pp. 684–689.
  28. Jiang, W.; Tanner, M.A. Gibbs posterior for variable selection in high-dimensional classification and data mining. The Annals of Statistics 2008, 36, 2207–2231. [Google Scholar] [CrossRef]
  29. Perlaza, S.M.; Esnaola, I.; Bisson, G.; Poor, H.V. On the Validation of Gibbs Algorithms: Training Datasets, Test Datasets and their Aggregation. In Proceedings of the Proceedings of the IEEE International Symposium on Information Theory (ISIT), Taipei, Taiwan, Jun. 2023.
  30. Alquier, P.; Ridgway, J.; Chopin, N. On the properties of variational approximations of Gibbs posteriors. Journal of Machine Learning Research 2016, 17, 8374–8414. [Google Scholar]
  31. Bu, Y.; Aminian, G.; Toni, L.; Wornell, G.W.; Rodrigues, M. Characterizing and understanding the generalization error of transfer learning with Gibbs algorithm. In Proceedings of the Proceedings of the 25th International Conference on Artificial Intelligence and Statistics (AISTATS), Virtual Conference, Mar. 2022; pp. 8673–8699.
  32. Raginsky, M.; Rakhlin, A.; Tsao, M.; Wu, Y.; Xu, A. Information-theoretic analysis of stability and bias of learning algorithms. In Proceedings of the Proceedings of the IEEE Information Theory Workshop (ITW), Cambridge, UK, Sep. 2016; pp. 26–30.
  33. Zou, B.; Li, L.; Xu, Z. The Generalization Performance of ERM algorithm with Strongly Mixing Observations. Machine Learning 2009, 75, 275–295. [Google Scholar] [CrossRef]
  34. He, H.; Aminian, G.; Bu, Y.; Rodrigues, M.; Tan, V.Y. How Does Pseudo-Labeling Affect the Generalization Error of the Semi-Supervised Gibbs Algorithm? In Proceedings of the Proceedings of the 26th International Conference on Artificial Intelligence and Statistics (AISTATS), Valencia, Spain, Apr. 2023; pp. 8494–8520. [Google Scholar]
  35. Hellström, F.; Durisi, G.; Guedj, B.; Raginsky, M. Generalization Bounds: Perspectives from Information Theory and PAC-Bayes. Foundations and Trends® in Machine Learning 2025, 18, 1–223. [Google Scholar] [CrossRef]
  36. Jaynes, E.T. Information Theory and Statistical Mechanics I. Physical Review Journals 1957, 106, 620–630. [Google Scholar] [CrossRef]
  37. Jaynes, E.T. Information Theory and Statistical Mechanics II. Physical Review Journals 1957, 108, 171–190. [Google Scholar] [CrossRef]
  38. Kapur, J.N. Maximum Entropy Models in Science and Engineering, first ed.; Wiley: New York, NY, USA, 1989. [Google Scholar]
  39. Bermudez, Y.; Bisson, G.; Esnaola, I.; Perlaza, S.M. Proofs for Folklore Theorems on the Radon-Nikodym Derivative. Technical Report RR-9591, INRIA, Centre Inria d’Université Côte d’Azur, Sophia Antipolis, France, 2025.
  40. Donsker, M.D.; Varadhan, S.S. Asymptotic evaluation of certain Markov process expectations for large time, I. Communications on pure and applied mathematics 1975, 28, 1–47. [Google Scholar] [CrossRef]
  41. Heath, T.L. The Thirteen Books of Euclid’s Elements, 2nd revised edition ed.; Dover Publications, Inc.: New York, 1956. [Google Scholar]
Figure 1. Geometric interpretation of Lemma 3, with Q a probability measure.
Figure 1. Geometric interpretation of Lemma 3, with Q a probability measure.
Preprints 169541 g001
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2025 MDPI (Basel, Switzerland) unless otherwise stated