Introduction
The electrical activity in the brain reflects a combination of hidden internal states which, although not directly observable, can be inferred via the signals picked up by neuroimaging devices [1-3]. One way to describe these signals is in terms of probability distributions evolving in time. As conditions change in the brain, the probability distributions shift accordingly, reflecting an ongoing reorganization of internal representations. Understanding precisely how these distributions evolve, and what computational processes underlie their transformations among brain regions, remains a central challenge in computational neuroscience.
Changes in neural activity can be analysed by studying how specific functionals act on probability distributions. Two key examples of such functionals are entropy[4-6], which affects the variance, and expectation[7-9], which shifts the mean. Each functional is associated, via its gradient, with a specific flow across the space of probability densities. This geometric[
10] perspective allows for a decomposition of transformations into interpretable information-theoretic components.
We specifically formulate the change in probability distributions when the observation scale is altered. This perspective also aligns with established computational theories such as predictive coding[
11,
12] and efficient representation[13-15], in which internal states are continuously adjusted to balance precision and uncertainty. By framing representational change as a gradient flow, we model these adjustments as structured, directional transformations shaped by information-theoretic drives.
Previous work on neural signal transmissions has largely focused on statistical dependencies between observed activation patterns[
16,
17] — whether scalar (e.g., firing rates) or vector-valued (e.g., population activity). Metrics such as mutual information[
18,
19] and Granger causality[
20] quantify how strongly activity in one region predicts activity in another. However, these metrics do not capture how the full probability distributions over latent variables are transformed across regions. We address precisely this issue with a framework that quantifies the form of gradient flows over probability distributions. We then show that entropy and expectation form orthogonal flows in the case of Gaussian distributions.
We validate this framework in silico and then extract dominant flows linking regions within the murine visual cortex, captured using two-photon imaging. The visual cortex in mice is particularly well-suited to our study, given that adjacent cortical regions therein exhibit coordinated patterns of activity[
21,
22] across functionally specialized regions[23-25]. Beyond this specific application, our approach introduces a generalizable method for analysing any scenario in which distributions are transformed — not just among cortical regions, but also between measurement devices, or across spatiotemporal scales.
Methods
Our goal is to derive orthogonal gradient directions of a probability distribution with respect to changes in observational scale. These gradients indicate the directions along which the distribution undergoes maximal change. We begin with the following definitions:
:the state of the system, represented by an -dimensional variable.
:a positive-valued parameter that controls the scale of observation.
: a probability density function over
, conditioned on the observation scale
, which
remains normalized for all values of , such that:
We define the space of all valid (smooth, positive, normalized) probability distributions as
information space :
This defines a nonlinear manifold of valid distributions within the space of all possible functions.
Power law generators: Due to the ubiquity of power laws in the analysis of neural systems
2, we investigate how the probability distribution
changes to a new distribution
via:
where the partition function in the denominator ensures correct normalization of the new distribution
for all values of
.
We next analyse the form of Eq.
for very small changes in scale. Specifically, we seek the associated generator[
26] — i.e., the infinitesimal power law transformation associated with an increase in
. As motivated by Noether’s theorem[
27] and Lie theory[
28], the derivation of a generator creates a powerful tool that allows for the recovery of arbitrary transformations.
To see how this applies to our case, we begin by defining the scale parameter
in terms of an arbitrarily small constant
:
thereby allowing for
any scale parameter
to be defined by the iterated application of
.
Applying Eq.
to Eq.
, we obtain:
Next, using the fact that
for small
, we then expand
to first order in
and use the identities:
, and
, to linearize the effect of a power law transform:
which defines the transformation from Eq.
evaluated near
.
To ensure that the transformed density remains normalized, we divide Eq. (4) by its associated partition function:
Substituting the normalization condition from Eq.
into the denominator, and using the definition of the mean:
Finally, we use the fact that
for small
and
to linearize Eq.
, thereby yielding the power law generator:
which can equivalently be expressed as the following differential equation:
Power law and Entropic Flow: We now note that the generator derived in Eq.
includes a term
, which resembles the integrand of entropy
, hinting at a connection between power law transformations and entropy:
We investigate this connection by calculating in which direction, within the space of valid probability distributions in Eq. , entropy increases most rapidly.
This direction is given by the (negative) functional gradient of the entropy in Eq.
:
with a mean given by:
which, using Eqs.
and
, can be written as:
We define an entropic flow
as the mean gradient in Eq.
subtracted from the gradient in Eq.
. This has the effect of projecting the gradient onto the manifold
of valid probability densities in Eq.
:
which, using Eqs.
and
, reads:
i.e., we discover exactly the same expression as in Eq.
, meaning that we can write:
This reveals a relationship between entropic flow and power law transformations indexed by a scale parameter .
Generalized Flow: The form of Eq.
can be generalized to arbitrary functionals
, which define continuous trajectories through information space
via associated flow parameters
. The flow of
preserves the geometric structure of Eq.
, in terms of a projected gradient on the log density of
, while allowing for arbitrary functionals:
Here, the term is not an artefact of the entropic expression in Eq. . Rather, persists in the generalized flow expression in Eq. because parameterizes a flow of the form , which maps to . Eq. therefore yields a class of projected gradient flows which depend on the choice of functional .
Basis Flows: Thus far we have established that:
Power law transformations are associated with entropic flow,
The power law/entropy link can be generalized to arbitrary functionals other than entropy.
Given the above two points, our next question is whether we can find a flow
that is orthogonal to entropic flow
, as this would allow for a decomposition into independent components. To find such an orthogonal flow, we require that the inner product between
and
equals zero:
where we can use Eqs.
and
to write the covariance as:
which is equivalent to the covariance between
and
under
:
The simplest class of
is given by linear expectation:
with a functional derivative given by:
If we then assume a Gaussian form for
, for which
, Eq.
becomes:
which satisfies the orthogonality condition in Eq.
, which in turn shows that entropy and expectation define orthogonal flows under the zero-mean Gaussian distribution. It is important to note that this orthogonality only holds for symmetric distributions.
We next look for the transformation associated with the expectation functional using Eq.
:
which has a solution given by:
and hence:
where the partition function in the denominator ensures correct normalization.
Therefore, just as entropic flow arises from power law transformations in Eq. , the expectation flow corresponds to an exponential tilt in Eq. . Intuitively, the entropic and expectation flows capture how variance and expectation change with observational scale, respectively. We can summarize the links between these two information-theoretic functionals and their associated geometric transformations as follows:
Entropic flow:
Functional: Entropy:
Flow: Entropic gradient ascent:
Transformation:Power law re-weighting:
Expectation flow:
Functional: Expectation value :
Flow: Expectation gradient ascent:
Transformation: Exponential tilt:
Synthetic Data: Having established entropy and expectation as orthogonal basis functionals, we now consider a mixed entropic-expectation via Eqs.
and
:
where the parameters
and
determine the relative contributions of entropy and expectation, respectively.
We performed two in silico tests to validate parameter recovery. We simulated the flow in Eq. using pre-specified values of the entropy () and expectation () coefficients, using samples from: 1) a Gaussian process and 2) a one-dimensional Langevin process with a time-varying oscillatory drift term. In both cases, recovery accuracy for and was assessed by comparing true vs. recovered parameters and evaluating distribution similarity via Wasserstein, total variation, and L2 metrics.
Two-photon imaging data: We then evaluated the relative contribution of the entropy and expectation flows in an empirical dataset. We used publicly available two-photon imaging data collected in five mice[
29]. The dataset includes individual neuronal responses collected from six retinotopically defined visual cortical areas: the primary visual cortex (V1), lateromedial (LM), anterolateral (AL), rostrolateral (RL), anteromedial (AM), and posteromedial (PM) areas (see
Figure 1).
Visual stimuli consisted of movies with durations of 30–120 seconds, and resting state activity was recorded under a constant grey screen for 5 minutes. All recordings were pre-processed to extract ΔF/F calcium traces and responses from neurons were aligned to stimulus timing and grouped by retinotopically defined visual area (see
Figure 2).
We then apply the mixed-flow transformation model in Eq. to all pairs of regions in the visual cortices in the five mice. For each region, we estimate its marginal probability distribution using a 100-bin histogram of its mean time series. Given a pair of regions and , we treat the mean time series of region as the input signal and that of region as the target output. We then use nonlinear optimization to identify the values of and in Eq. that best transform region ’s time series into that of region .
We note that, while Eq.
was derived for the transformation of a single distribution under changes in observational scale, the same operator can be used to formalize transformations between the marginal distributions of two regions. Specifically, given regions
and
with empirical distributions
and
, we approximate their relationship as:
where
denotes the mixed entropic–expectation flow operator defined in Eq.
. In this way, the inter-regional transformation is treated as the best-fitting reweighting and tilting of
that recovers
. This allows the flow coefficients
and
to be interpreted not only as within-distribution drives, but also as markers of how distributions from different regions are related under the same geometric framework.
The transformation is composed of two components: a nonlinear entropic flow term driven by the deviation of local log-density from its mean, and a linear expectation flow term driven by the deviation from the global mean. These two terms respectively capture how local compressions in probability mass and global shifts in signal level contribute to the transformation. Once the optimal and parameters are obtained for each pair of regions, the input signal is warped accordingly to produce an estimated output signal. The match between the estimated and actual target signals is quantified using the coefficient of determination .
To assess whether each of the transformations are statistically significant, we compare empirical results to null distributions obtained through temporal permutation. To generate null distributions, we circularly shift the input time series independently within each session and re-estimate the transformation parameters. This process is repeated 1000 times with random shift sizes to generate surrogate distributions. These surrogate datasets allow us to compare our empirical results with the null hypothesis that there is a lack of alignment between source and target signals. Significance values are computed as the proportion of surrogate values greater than or equal to the empirical result. These -values are corrected for multiple comparisons using the false discovery rate (FDR, ). Only values that survive FDR correction and exceeded a conservative threshold of are reported.
Discussion
In this study, we formalize the link between the geometric structure of probability distributions and their information-theoretic content. Specifically, we show that transformations between Gaussian probability distributions can be decomposed into orthogonal entropic and expectation-based components. We validated this framework on synthetic data, first with a noise-corrupted diffusing Gaussian distribution, and then with a one-dimensional Langevin process with a time-varying oscillatory drift. We then applied the methodology to empirical timeseries obtained from two-photon neuroimaging data collected in the murine visual cortex.
Our analysis of the neuroimaging data revealed a robust bi-directional transformation between the neural probability distributions derived from the rostrolateral area (RL) and the primary visual cortex (V1). These transformations exceeded an
threshold of 0.65 across all five mice after false discovery rate (FDR) correction. This finding indicates that information-theoretic relationships are preserved between the distributions produced by RL and V1. This reciprocity enables coordination between regions responsible for early sensory processing (V1) and higher-order contextual integration (RL)[
30,
31].
The RL region in mice integrates visual input from V1 with signals related to movement and task demands[
32]. In this regard, RL plays a functional role analogous to the parietal cortex in primates, facilitating visuomotor coordination[
33]. The bi-directional transformations observed between RL and V1 indicate a shared encoding of information between the two regions. V1 acts as a primary sensory area providing input to RL, which in turn serves as a higher-order region that modulates V1. This recurrent loop aligns with predictive coding theories, which propose that visual processing relies on a reciprocal interaction between brain regions and across hierarchical levels[
34].
The link between neural dynamics and information processing shown here aligns with existing computational theories. For instance, the efficient coding hypothesis proposes that neural systems adapt their responses to match the statistical structure of sensory input[
35]. In our framework, entropic and expectation flow achieve this adaptation by adjusting the spread and mean of neural activity distributions to match ongoing conditions. In communication-through-coherence (CTC) models[
36], information is most effectively exchanged between two brain regions when the signals from one arrive at moments when the other is most responsive — i.e., when its neurons are closer to firing. In the context of our work, times of unpredictable sensory stimuli are associated with a dominant entropic flow, which in turn increases the range of available signal responses. On the other hand, times of predictable activity during e.g., task focus are associated with a dominant expectation flow, which shifts responses towards relevant signal averages.
Neuroscience studies often employ methods such as Granger causality[
37] and mutual information[
38] to quantify statistical dependencies between activity in pairs of brain regions. These approaches indicate whether activity in one region predicts activity of another, but they do not specify the form of the transformation linking the two. Our framework addresses this gap by modelling how one region’s probability distribution is geometrically transformed into that of another. The orthogonality of entropic and expectation flows further ensures (under the assumption of Gaussianity) that these transformation components can be interpreted independently.
In summary, we introduce a framework that decomposes information-geometric transformations between neural probability distributions into information-theoretic flow components. We demonstrated this in the murine visual cortex. However, the approach is applicable to any system from which probability distributions can be estimated. For example, it can be used to characterise transformations between different neuroimaging modalities, revealing scale-dependent aspects of neural organisation. Alternatively, our methodology can be used to detect disease progression, for instance in disrupted network coordination in epilepsy[
39] or in altered representational scaling in Alzheimer’s disease[
40]. By linking geometric transformations directly to information-theoretic content, our framework provides a versatile method for testing theories of neural function across species, modalities, and scales.