1. Introduction
Let us consider a finite set of
n situations that may occur according to a distribution
. The probability of each situation is generally determined by repeating the experiment independently and using Laplace’s operational definition to obtain the estimators
In addition, the central limit theorem provides an estimate of the difference between the true value and the estimate [
1].
This empirical method generally works quite well, but poses a few problems. On the one hand, this estimator and its uncertainty depend on the number of repetitions, giving good results only when this number is sufficiently large, which is a rather vague concept. On the other hand, it is difficult to generalize this definition when the observation of the situation is unclear. However, a situation is often determined using a measuring instrument that produces a noisy result.
The aim of this work is to base the estimation of probabilities associated with events on Bayes’ theorem, starting from Jeffreys’ non-informative priors.
2. Ingredients
2.1. Elements of Bayesian Theory of Statistical Decision-Making
This theory first distinguishes between two spaces: the space of states
and the space of observations
x[
3]. Each observation is assumed to be made in the presence of an unknown state. The
measurement model is the stochastic relation linking the state to the result of the measurement. For a given state, we give a transition probability
1, which gives the distribution of observations
x given
. Given an
a priori distribution
over states and an observation
x, Bayes’ theorem gives the
a posteriori distribution over states
When the state space is a differentiable manifold and the measurement model is twice differentiable with respect to
, we define a prior distribution called
Jeffreys’ prior as follows [
2]. First, we form the
Fisher information matrix
Jeffreys’ prior is then defined as
A fundamental property of this distribution is its invariance. Indeed, it can be shown that does not depend on the choice of coordinate system.
2.2. Hyperspheres
The estimators constructed in this work make use of unit hyperspheres. Let us recall some of their properties [
4].
In Euclidean space
, the unit hypersphere of dimension
is the differentiable manifold
The use of the letter
will be justified below. The usual spherical coordinates are given by
and the normalized invariant measure under
is given by
Let
be a sequence of positive integers. The even moments relative to this measure are given by
We can show that the function
H above is a generalization of the beta function
The normalization factor is chosen so that
3. Probability Estimators for Known Situations
The problem is estimating a distribution
. This distribution can be viewed as a point in the
dimensional simplex
This simplex will be the state space for a measurement model whose observations are situations. In a canonical way, the probability of obtaining situation
k given that the distribution is
is obviously
This canonical measurement model is twice differentiable with respect to
. We can therefore calculate the associated Jeffreys prior
. It was stated above that this distribution is independent of the parameterization of the simplex. Let us choose the new coordinates
Since
, we obtain a measure model on a state space that is the positive part of the hypersphere
Now, it turns out that Jeffreys’ prior for this model is the invariant measure
mentioned above:
This result allows us to apply Bayes’ theorem to a sequence of independent observations. Let
be such a sequence of length
t, we obtain the posterior on the sphere
given
Introducing the function
giving the multiplicity of an index
k in the sequence
:
we obtain (implying the sequence
to simplify the notation)
The probability of occurrence of a situation
k can be estimated by the posterior mean,
The associated uncertainties will be given by the covariance matrix
Let us define
as a sequence of integers that is zero except for the
, whose value is 1. It is easy to see that
The fact that
and the properties of the Gamma function [
5] immediately give
4. Self-Consistent Random Sequences
Let
be a sequence of length
. Equation (15) therefore gives the Bayesian estimates for the
if we have observed the sequence
, which we will also denote by
At t, let us randomly draw a new situation according to these . Let us form a new sequence by concatenation: . Nothing prevents us from repeating the process for this new sequence.
We say that a sequence is constructed in a self-consistent manner if
is drawn uniformly from the n situations
at each step, is drawn according to the probability
We note that the probability of obtaining a sequence
in this way is
Now, it turns out that this probability is given by the function
Since no additional assumptions are made, we will call the probability a priori (non-informative) self-consistent on sequences of given length t. Sampling according to this prior will be useful later for estimation in uncertain situations.
Note
This way of generating a random sequence of numbers provides a possible answer to the question: Given a sequence, what is its probability of occurrence? It can be easily seen that the sequences with maximum self-consistent probability are also those with minimum entropy.
Incidentally, for sequences of given length
t, we have
From relations (18) and (19), we also have
5. Generalization to Fuzzy Situations
In reality, situations are often measured by noisy instruments. Instead of knowing the situation at a stage
s, the observer only has a measurement result
x from a measurement model
. For a given
x, this vector, whose indices are the situations, is often called the likelihood function. For a sequence of steps
, the observations
2 give a
likelihood matrix whose time is the row index and the situation is the column index
Let’s return to the situation on the hypersphere. Suppose that at time
s, the prior probability on
is
. The appearance of a result
for the measurement model
, combined with the canonical model
, gives, by Bayes’ theorem, an a posteriori that will be taken as a priori in the next step:
Let
and use the definition of the function
H to obtain the estimators at
which can also be written as
The preceding equations suggest a new use for Bayes’ theorem. Indeed, given the self-consistent prior
on sequences of length
t and the product measure model giving the likelihood function
ℓ, we can form the posterior on the sequences
and we obtain
Equations (30) and (31) are integrals that can be calculated using Monte Carlo. It is possible to sample the posterior as follows:
draw according to the weights
draw according to the weights
…
draw according to the weights
A sample of
r sequences
of this type gives the MC estimators
as well as MC uncertainties
Note
Simulating this sampling is very easy, for example in C++, and gives excellent results for likelihood matrices with 3 situations and 10,000 observation times with an r of around one million. The advantage of sampling on the posterior is that it directly locates the sequences that contribute significantly to the overall sum. Recall that the sums (30) and (31) contain terms.
6. Conclusions
Replacing
with
makes Jeffreys’ prior uniform. In general, probabilistic quantities can be parameterized by the square root, which is their natural expression. This phenomenon is found in quantum mechanics [
6], where probability is expressed as the square of the modulus of a wave function, hence the choice of the Greek letter
for the points of the hypersphere.
The generalization of beta functions given by the function H bridges the gap between continuous calculus on wave functions and sums over sequences, which are computable by MC.
An application of this work is under development for stationary Markov processes.
7. Summary
7.2. Seen by
References
- Chung, K. L. A course in probability theory; chap 7Academic Press, 1968. [Google Scholar]
- Bernardo, J.M.; Smith, A.F.M. Bayesian theory p 314, 358Wiley. 1994, 314. [Google Scholar]
- Laedermann, J.-P. Theorie bayesienne de la decision statistique et mesure de la radioactivityUNIL. These de doctorat, 2003. [Google Scholar]
- Hyperspheric coordinateshttps. Available online: http://en.wikipedia.org/wiki/N-sphere.
- Gradshtyn, L.S.; Ryzbik, L.M. Table of integrals, series, and products, 6.4Alan Jeffrey. 2000. [Google Scholar]
- Piron, C. Quantum MechanicsPresses polytechniques et universitaires romandes; 1998. [Google Scholar]
| 1 |
The use of the Roman font will denote a generic probability, the underlying spaces being identifiable by the variables used. |
| 2 |
The measurement model may vary at each step. |
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).