Along with the development of experimental biophotonics methods, the urgent need for efficient tools and techniques to analyze acquired data coexists. The underlying reasons are reflected in the increasing complexity of data obtained via novel experimental contributions to the field and limitations in current bioimaging analyses. Involvement of novel and innovative tools and methodologies from other scientific fields in biological investigations appears as the modern trend in analytical and statistical biophotonics.
Photonic methods allow the development of noninvasive examination tools that offer high precision measurements and can be used to monitor and analyze biological systems.
The main contents of this section are a concise overview of the most important issues of maximum entropy methods, a list of the types of problems to which these methods have been applied satisfactorily.
GME methods grow out of the development, in the 1950s, of the principle of maximum caliber. According to this principle, given primary knowledge expressed as expected values for a collection of macroscopic properties of a physical system, the probability distribution—prime distribution for its possible microstates—partitioning the states into properties that are compatible with values that match those expected and those that do not—is the one with the greatest Shannon entropy (Shannon, 1948).
This result complements other fundamental results of statistical mechanics, but it also generalizes them greatly. While there exist efficient measurements for a limited subset of the properties describable in principle and for rare states of the system, there is potential information in the remaining possible microstates; that potential defines alternative, stable, and in general qualitatively good probability distributions, sometimes very different from the equilibrium one. Making use of that information is thus a good enough method for developing novel testable hypotheses, for checking statistical models and evaluating their confidence levels while avoiding undue bias. In addition, mutual information is an especially appealing association measure for high-throughput studies of intracellular networks due to its information-theoretic characteristics, more global nature, and ability to tease out nonlinear relationships in the presence of noise. It has been previously used in experimental and computational studies of diverse systems, from simple preservation to more complicated ones where traditional methods might be misled by the complex issue of multiple hypotheses, particularly if experimental control samples are unavailable, the multiplicity of comparisons is circumscribed, and the high dimensionality of the dataset is responsible for strongly skewed feature distributions. Crucial problems for a successful application of the mutual information measure are rapid and precise association estimation with acceptable false association control and adequate noise robustness of the association detection. It has been revealed in numerous communication and information-theoretic function approximations and limit theorems that the entropy functional (and the information functional for the conditional case) have Gaussian self-averaging averages in the large sample size limit. With the increase of sample size, the dependence of the associated estimation on specific estimation algorithms tends to disappear.
3.1. Overview of Constrained Entropy Methods
Maximum entropy modeling can be employed in cases where imperfect information is available to both select natural features and also to ensure an unambiguous posterior prediction (Golan et al., 2023; Golan, 2018; Golan, 2008; Golan et al., 1996; Bernardini Papalia et al., 2021; Bernardini Papalia et al., 2018). The theory of maximum entropy learning assumes a set of mappings from observed events to constraint features.
The problem of learning an unknown probability function p from a set of observed data X1, ..., Xn is of fundamental importance in statistics. We consider a relatively complex feature space and a highly non-parametric model.
Although prior knowledge regarding X, which we group as features f1, ..., fj, may be available, the only information allowed in the GME model formulation is a list of constraints. Such information, if accurate, is clearly useful for learning a better posterior. Even when the observation space X is simple, effectively using such potentially relevant information without mistakenly allowing the hypothesis class to overfit is a more advanced problem.
Entropy is a measure of the uncertainty that a random variable in a particular state exists. For a continuous probability distribution, the entropy is defined by a differential equation that has unique properties for probability distributions that are equilibrium solutions for maximization of entropy under appropriate constraints. The maximum entropy principle gives a consistent and objective approach to the construction of probability distributions based only on partial information and basic principles of probability. Maximum entropy utilizes all of the provided limited information and, in general, leads to more robust probability and invariant density distributions than any other methods.
Let Z be a random variable that lies in a certain state z with a probability of p(z) where Z ∈ [z_min, z_max]. It can be assumed that an approximative probability density function estimated from a given single data set can describe such a random variable Z by methods like histograms, kernel density estimators, or maximum likelihood estimation. Although a derived p(z) will ensure some reduced entropy, note that for a model that is based on some data and an underlying probability, the model is completely determined and unique. All other models depend on an appropriate modification of its probability, which is the main result of this work. The entropy H[p(z)] of Z, as implied by p(z), can be formulated by using the definition of Shannon’s entropy with a normalized constant N, i.e., H[p(z)] = −N R p(z) ln(p(z)), where R is given as R = 2 or R = 10 if the entropy is assumed to be in log using 2 or 10 as the base, respectively.
3.2. Applications in Data Analysis
One of the major challenges in the field of biophotonics is the systematic data analysis and the extraction of knowledge from the generated data. To overcome this obstacle in the field, general maximum entropy methods, often in the form of negative Laplace transform or different types of inversion methods, is here proposed for data obtained by light transmission at a given set of spatial sampling locations.
Furthermore, general maximum entropy methods, often formulated in the form of a weak-constraint minimization problem, can be utilized for space-time hyperspectral imaging of biological systems, and for utilizing a new class of nodes in conjunction with compressive sampling to gather significantly more information about the labels from the same number of measured diffraction patterns than can be gathered with conventional methods. That is, more successful than what can be achieved via compression sampling alone.
The constrained maximum entropy method is an approach to model problems where the probabilities are known only on a countable subset of the sample space. We replace the unknown information on the complement of the countable subset by assigning estimated probabilities derived from a regularized entropy. The counting probabilities act as a horizon toward the past, and there are the equilibrium of the maximum Boltzmann-Gibbs entropy. These estimated probabilities are derived from maximization entropy within a multifractal modeling. The generalized maximum entropy uses the principle of maximum entropy to solve probability models where the probabilities are unknown on a countable set. To construct a maximum entropy probability model on a countable state space, an empirical probability measure calculated on the experimental data is used, and a class of probability measures is defined. Based on the empirical probability measure, it defines an M-probability model, which is mainly concentrated on the empirical probability measure and maximizes its entropy. Differing from other methods of estimation and inference, which seek an estimative distribution based on a maximum likelihood viewpoint, the maximum entropy method looks for the distribution whose divergence to the uniform distribution is minimal, among all distributions that satisfy whatever partial information is available in the form of expectations.
The main advantage of the constrained entropy method is its ability to deliver distributions and give a suitable meaning to the system of constraints based on incomplete information. The maximum entropy method also has some drawbacks associated with its particular form of information. A priori information can be considered.
Analytical constrained entropy methods are very successful in many challenging data interpretation problems where prior knowledge is known a priori and fundamental constraints are available. The application of prior information in the form of a Shannon entropy-based functional combined with a minimum information loss principle makes maximum entropy techniques a potentially more powerful method. The traditional linear and polynomial fitted models describe data with low complexity, but such models are known to singularly fail in high dimensions, skewed data distribution, or in cases when there is a lack of prior understanding about the subject data. In extremely noisy experiments where the acquisition time is limited and when the signals are partial or truncated, the validity of traditional fitting is questionable.
CME methods are well accommodated to these intrinsic fuzziness and associated estimation variability of biophotonic data and they seems to accommodate the quantum uncertainty present in the collection of the biophotonics image data in order to fully statistically describe the data modeling results and their estimation uncertainty.
The readiness and the benefits of this additional methodological support provide a supplement to the deep dictionary learning pursuit of advanced experiments and reduce the impedance mismatch between the data and the advanced analysis tools.
This method allows for a statistical approach to the restoration of quantum biological system images or pure quantum data within a matrix algebra formalism and has great potential for the research field in many real-life tasks in areas such as quantum biology, quantum information science, and biophotonics.
According to available information the Constrained Maximum Entropy (CME) method models the photon emission of a biological system as a frequency distribution over states and proceeds by ordering the frequency distributions that satisfy the constraints (the information used) by their Shannon information entropy or, when available information suggests a non-uniform prior, by their relative entropy (the Kulback-Leibler divergence). Whit this information the resulting maximum posterior probability distribution is the frequency distribution that satisfies the constraints that has the highest Shannon information entropy or the minimal Kulback-Leibler divergence and is least uncommitted to information not yet available. In addition, following Golan (2018) and Bernardini Fernandez (2021) this framework can be generalized to allow for noise in the constraints given the uncertainty about the process under study and the nature of count of the variable of interest.
In our contest a noise component
for each count observation
is included as a constraint in the CME formulation:
In such a case, we assume that the elements are given from two sources: a signal plus a noise term () that refers to our uncertainty about the target variable.
Each count
is assumed as a discrete random variable that can take
different values. Defining a supporting vector (for the sake of simplicity assumed as common for all the
)
that contains the
possible realizations of the targets with unknown probabilities
,
can be written as:
The idea can be generalized in order to include an error term
and define each
as:
We represent uncertainty about the realizations of the errors treating each element
as a discrete random variable with
possible outcomes contained in a convex set
, which for the sake of simplicity will be assumed as common for all the
. We also assume that these possible realizations are symmetric around zero (
). The traditional way of fixing the upper and lower limits of this set is to apply the three-sigma rule (see Pukelsheim, 1994). Under these conditions, each
can be defined as:
where
is the unknown probability of the outcome
for the count
i.
The model can thus be written in the following terms (5):
The solution to the estimation problem is given by the minimization of the Kullback-Leibler divergence between the posteriors distributions s and the a priori probabilities .
The solution to the estimation problems is given by minimizing the KL divergence between the
s and the
s. Specifically, the constrained minimization problem can be written as:
Restrictions (3) are just normalization constrains, whereas (2) reflects the observable information that we have on the count If we do not have an informative prior, the a priori distributions are specified as uniform , which leads to the GME solution. The uniform distribution is usually set as the natural prior for the error terms.
Following Golan et al. (1996), it is possible to introduce other data constraints in the CME formulation if additional information is available.
Once the respective supporting vectors and the a priori probability distributions are set, the estimation can be made in the terms of the following program formulation:
Both for the parameters and the errors, the supporting vectors usually contain values symmetrically centered on zero. If all the a priori distributions () are specified as uniform, then the GCE solution reduces to the GME one.
To recover the probability vectors
, the Lagrangian Function following the matricial form would be:
with the first order conditions:
The solution to this system of equations and parameters yields the following solution:
where
is a normalization factor and
is the estimate of the Lagrange multiplier associated to constraints. The constrained optimization problem can be formulated in terms of the -unconstrained- dual
, depending only on the parameter
Using the estimated distribution of previous temporal measurements as prior distributions, it is possible to empirically model the learning that occurs from repeated samples (Bernardini Papalia, 2024).