1. Introduction
Galaxy evolution is commonly modeled as a history-dependent process that becomes effectively
Markovian when conditioned on a sufficiently rich present-day state (stellar mass, halo mass, environment). In this view, past configurations matter only insofar as they shaped the current state, and deviations from population relations are treated as transient fluctuations [
1,
2,
3].
An alternative possibility is regulated evolution, where internal structure encodes predictive information beyond instantaneous macroscopic variables. In complex systems, regulation does not imply stasis but bounded behavior enforced by feedback and memory. For galaxies, this could manifest as: (1) bounded metallicity scatter despite varying accretion histories, (2) characteristic recovery times after perturbations correlated with pre-perturbation structure, or (3) preferred states in phase space that systems tend to return to.
The central methodological challenge is that stellar mass dominates correlations and can induce apparent relationships between quantities sharing mass dependence. A credible search therefore requires explicit controls and time-directional tests rather than reliance on correlations.
This paper presents a falsification-oriented analysis testing whether non-Markovian predictive information survives once a minimal instantaneous state is accounted for, and it identifies the timescale of any such persistence. Two observables are defined from IllustrisTNG: a chemical disorder proxy and a recent relaxation score R. The analysis proceeds through fixed stages: raw association, mass entanglement diagnostics, mass-controlled residual association, and time-directional conditional mutual information across two lags.
2. Methods
2.1. Simulation Data
All analyses use the public IllustrisTNG cosmological simulation suite, specifically TNG100-1 at
(snapshot 99) [
1,
2,
3]. For each subhalo, the following are extracted:
No selection is applied on morphology, environment, or star-forming state beyond data availability and pipeline validity. Sample sizes are reported at each stage.
2.2. Chemical Memory Proxy: Age–Metallicity Disorder
Internal chemical disorder is quantified using the dimensionless statistic
where
is the weighted least-squares slope of
against an age proxy
, and
is the mean within-age-bin metallicity scatter (
Appendix A.1). The normalization suppresses trivial scale effects, making
interpretable as scatter per unit enrichment trend.
2.3. Recent Relaxation Score R
Recent relaxation is measured using deviations from the star-forming main sequence (SFMS) [
5,
6]. For fitted SFMS parameters
,
computed over the recent window
. The 4 Gyr window captures approximately two to three galactic dynamical times for Milky Way-mass systems [
7], sufficient to observe recovery from typical perturbations while remaining computationally tractable. The relaxation score
is a bounded transform of late deviation
and recovery trend
(
Appendix A.2). Larger
R indicates closer alignment with SFMS or recovery toward it.
2.4. Test Sequence and Decision Rules
The analysis follows a fixed test sequence:
T1 (Raw association): Pearson r and Spearman for .
T2 (Mass entanglement): Correlations of with and R.
T3 (Mass-controlled association): Linear residualization against m, then correlation of residuals.
T4 (Decomposition): Associations with components and .
T5 (Time-directional test): Bias-corrected k-nearest neighbor conditional mutual information with conditional permutation nulls.
A practical-zero detectability floor is fixed at
bits. Full falsification logic is in
Appendix B.
2.5. Non-Markovian Probe: kNN Conditional Mutual Information
To test time-directional predictive information across lags, the following quantity is estimated:
using a kNN conditional mutual information estimator (Frenzel-Pompe) under
metric [
8,
9].
Finite-sample bias is addressed via conditional permutation:
where
results from permuting
within quantile bins of
, preserving the
–mass relation while destroying conditional linkage to
. The permutation
p-value is
, with robustness evaluated across
.
3. Results
3.1. Sample Construction
Two datasets are used:
Pilot mass-matched sample (): Pipeline validation.
Full merged sample (): All subhalos with successful cutouts and valid relaxation tracks, merged on subhalo_id.
3.2. Raw Association Is Weak (T1)
In the full sample, raw association between
and
R is weak:
with bootstrap confidence intervals including zero [
10].
3.3. Mass Entanglement Is Significant (T2)
Mass diagnostics show entanglement:
where
.
3.4. Association Emerges After Mass Control (T3)
After linear residualization against
m:
This motivates time-directional testing, as T3 alone is not a memory test.
3.5. Relaxation Component Diagnostics (T4)
Component checks using
do not isolate a single dominant driver. The relaxation score encodes a blend of offset magnitude and recovery trend [
5,
6].
3.6. Temporal Predictive Information and Lag-Level Falsification (T5)
The temporal probe uses , , conditioning on with bias-corrected kNN CMI under conditional permutation nulls.
3.6.0.1. Lag Gyr.
This lag is not falsified under the registered rule. Lag Gyr.
satisfying lag-level falsification under bits.
Interpretation of decay curve.
A decay visualization was produced (Fig.
Figure 6) by fitting
to the lag-wise robust statistic. With only two distinct lag points, the fitted parameters are underdetermined; therefore
is treated as descriptive, not inferential. Formal estimation requires denser lag sampling.
3.6.0.4. Status of the decay timescale estimate.
At the present stage, the decay model
is included for descriptive visualization only. With only two distinct lag points (
and
), the parameter
is not statistically identifiable and should not be interpreted as an inferred physical timescale. Any fitted value merely summarizes the observed short–lag decay trend and serves as a placeholder for future densely sampled lag scans. A meaningful estimation of
requires at least three, and preferably more, independent lag points spanning the transition from detectable to negligible conditional predictive information, as outlined in Appendix C.
4. Figures and Interpretation (Current Best Set)
This section consolidates the most informative figures produced by the current pipeline. All files are referenced by their exact simulation output names. Interpretations are written to match the program logic: descriptive plots are not treated as evidence of non–Markovianity unless they survive explicit controls and time-directional tests.
4.1. Pipeline Sanity and Pilot Context (Mass-Matched)
Interpretation. This pilot is not used for inference. Its role is diagnostic: it verifies that varies non-degenerately at small N and that the cutout-based measurement does not collapse to a constant under mass-matching. Any apparent trend here is treated as unstable by design because N is small. The pilot exists to prevent mistaking a broken metric for a null scientific result.
Interpretation. This figure checks whether the two quantities can be merged and visualized without pathological behaviour (e.g. constant R, constant , obvious data-entry errors). No claim is extracted. At best, it motivates scaling up to the full sample where mass structure is allowed to exist and must be explicitly controlled.
4.2. Full-Sample Structure and the Mass Dominance Problem
Interpretation. The main message is methodological. In raw space, and R do not show a strong monotonic relationship. However, the accompanying mass diagnostics demonstrate that mass is a structural organizer: it correlates strongly with and also correlates (more weakly) with R. A naïve –R correlation can therefore be masked, induced, or sign-flipped by shared dependence on mass. For that reason, full-sample raw correlation plots are treated as descriptive only, motivating mass control and time-directional testing.
4.3. Time-Directional Falsification: The Killer CMI Test
Interpretation. This is the first plot in the paper that is capable, by construction, of supporting a non–Markovian claim under the program definition. It is time-directional (past → future) and conditioned so that the null preserves Markovian structure via mass. At Gyr, the bias-corrected conditional predictive information is detectable under multiple k values with small permutation p-values, indicating that earlier-time chemical disorder retains predictive information about present-day relaxation beyond mass-at-time. At Gyr, the signal collapses to practical zero and is not statistically distinguishable from the conditional null across the tested k values. Under the registered rules, that lag is marked falsified while the short lag is not, so the program continues. The current picture is short-horizon predictive memory rather than long persistence.
4.4. Lag-Aggregated Memory Points (v2 Scan Summary)
Interpretation. This figure summarizes the lag dependence in the most compact form the pipeline can support at the current stage. With only two lags executed (2 and 4 Gyr), the correct interpretation is qualitative: predictive information is visible at 2 Gyr and absent at 4 Gyr under the present conditioning. A meaningful estimate of a decay timescale requires at least three distinct lags; until then, any fitted curve is treated as descriptive, not inferential. The next phase is therefore a denser lag scan (e.g. 1–6 Gyr) combined with an expanded conditioning vector to test whether the apparent horizon is a real decay or a confounding artifact.
4.5. What These Figures Jointly Imply (Without Overclaiming)
Joint reading.Figure 4,
Figure 5 and
Figure 6 jointly support a narrow, auditable interim statement. Raw association is weak because mass entanglement dominates. Once a time-directional test is enforced with a conditional null that preserves the
–mass relationship, conditional predictive information is detected at short lag and not at longer lag. This pattern is consistent with a bounded memory horizon in this observable family, under this simulation and this minimal conditioning set. The program therefore continues, but it is already constrained: “homeostasis” does not appear as long-lived predictive information in
beyond 4 Gyr under mass-at-time conditioning.
5. Discussion
5.0.0.5. Numerical and Subgrid Limitations
All results reported here are subject to the finite resolution and subgrid physics of the TNG100–1 simulation. The baryonic mass resolution () implies that stellar populations and metallicity distributions are sampled discretely, which can introduce artificial smoothing or stochasticity into age–metallicity relations. In addition, feedback and star formation subgrid models operate on timescales significantly shorter than the ∼Gyr memory horizon probed here, potentially imprinting correlated structure that mimics or masks true long–term regulation. These limitations do not invalidate the conditional tests performed, but they do caution against over–interpreting the detected short–lag signal as a direct analogue of physical memory without cross–validation in higher–resolution runs or alternative simulation suites.
5.1. Summary of Findings
The results support three concrete statements:
(1) Mass dominance is nontrivial.
Raw correlations between chemical disorder and relaxation are not interpretable without controls, as stellar mass is entangled with both observables.
(2) A non-mass linkage exists at residual level.
Moderate association after mass residualization indicates that and R encode partially independent variation.
(3) Time-directional test indicates finite predictive horizon.
Conditional predictive information is detectable at Gyr under minimal conditioning but is consistent with zero by Gyr. This suggests finite memory rather than indefinite predictability for this observable pair.
Table 2.
Comparison of detected memory timescale to known galactic processes. The Gyr horizon aligns with gas depletion and orbital mixing timescales.
Table 2.
Comparison of detected memory timescale to known galactic processes. The Gyr horizon aligns with gas depletion and orbital mixing timescales.
| Physical Process |
Typical Timescale (Gyr) |
Relation to Gyr memory |
| Dynamical time |
0.1–1 |
Memory –
|
| Gas depletion time |
1–3 |
Memory
|
| Orbital mixing time |
1–10 |
Memory within this range |
| Feedback cycle time |
0.01–1 |
Memory
|
5.2. Connection to Homeostasis Concept
The detected finite predictive horizon ( Gyr) suggests finite-memory regulation rather than perfect homeostasis. In complex systems, regulation often involves feedback loops with characteristic timescales; the observed memory decay may reflect the timescale of internal processes (e.g., gas cycling, feedback cycles) that gradually erase historical information. This represents a third regime between Markovian drift and perfect memory.
5.3. Statistical Power and Detectability Limits
The present analysis operates close to the minimum detectability threshold imposed by finite sample size. With galaxies, the bias–corrected conditional mutual information detected at ( bits) lies only modestly above the practical resolution floor of the estimator under conditional permutation ( bits). This proximity implies that the detected signal should be interpreted as marginal but non–negligible evidence rather than a high–signal regime detection. Increased sample size, denser lag sampling, or expanded conditioning vectors are expected to improve power and sharpen discrimination between true short–memory behaviour and residual finite–sample fluctuations.
5.4. Limitations and Scope
These conclusions are deliberately narrow. The CMI configuration conditions only on . A richer instantaneous state vector may absorb part of the short-lag signal, or it may clarify whether signal persists with additional confounders controlled. Alternative memory proxies may yield different horizons.
Numerical considerations: Subgrid feedback models in TNG100 inject energy on characteristic timescales (typically Gyr). While the detected Gyr memory exceeds typical feedback cycles, comparison with higher-resolution simulations (e.g., TNG50) could help separate physical memory from potential numerical artifacts. The finite particle resolution ( M⊙) sets a lower bound on detectable structural information.
5.5. Interpreting Outcomes in a Falsification Framework
In this framework, both outcomes are informative:
The intermediate result—finite memory decaying by 4 Gyr—is consistent with regulated evolution with a finite memory horizon, as expected in dissipative but history-aware systems.
6. Conclusion
This study implements a falsification-oriented empirical test for non-Markovian regulatory signatures in galaxy evolution using IllustrisTNG TNG100-1. A chemical disorder proxy and relaxation score R show weak raw association, strong mass entanglement, and moderate residual association after mass control.
The time-directional conditional mutual information probe detects predictive information at Gyr but yields practical zero by Gyr under current conditioning. This finite predictive horizon suggests regulated evolution with a characteristic memory timescale rather than perfect homeostasis or Markovian drift.
Complete computational definitions and decision rules are documented for replication. The foundation is laid for systematic extension: denser lag sampling, expanded conditioning, alternative memory proxies, and larger samples.
Data Availability Statement
Acknowledgments
The IllustrisTNG team is acknowledged for making the simulations publicly available. This research used resources of the National Energy Research Scientific Computing Center (NERSC).
Appendix A. Mathematical Definitions
Appendix A.1. Age–Metallicity Disorder: Ξ AMR
Let each stellar particle have formation scale factor , metallicity , and weight . Define:
Global enrichment slope. Fit weighted linear model via weighted least squares.
Age–binned scatter.
Partition
into
B quantile bins. Within each bin
b:
Mean within-bin scatter: .
Dimensionless disorder metric.
Appendix A.2. Recent Relaxation Score
Let
and
be along the main progenitor branch. For SFMS parameters
:
Over
:
is a bounded transform of fixed by code.
Appendix B. Appendix B: Registered Tests and Falsification Logic
Appendix B.1. Executed Test Register
| Test |
Sample |
Outcome |
Decision |
| T1–T3 |
|
Weak raw; mass entangled; residual emerges |
Proceed to T5 |
| T5 @ 2 Gyr |
|
bits,
|
Not falsified |
| T5 @ 4 Gyr |
|
bits,
|
Lag falsified |
| Decay fit |
|
Illustrative only; not identifiable |
Descriptive only |
Appendix B.2. Operational Non-Markovian Condition
Let
,
,
. The operational condition is:
distinguishable from conditional permutation null and above threshold
.
Appendix B.3. Bias Correction and Null Construction
For each lag and k:
Compute (kNN CMI).
Generate by permuting X within quantile bins of .
, .
.
Appendix B.4. Lag-Level Falsification Rule
Fix
bits,
. Lag
is falsified if for all
:
Appendix B.5. Program-Level Termination
For observable pair
and conditioning
in a phase, test lag set
. Phase terminates as null if:
Here Gyr: Gyr falsified, Gyr not, so no termination.
Appendix C. Appendix C: Specific Upgrades for Follow-Up Study
Lag density: Test Gyr for proper decay fitting.
Conditioning expansion: Test .
Sample expansion: Use TNG300 () for better CMI estimation.
Alternative X proxies: Test stellar age gradients, kinematic memory, merger tree asymmetry.
Observational test: Apply framework to SDSS/MaNGA with careful uncertainty propagation.
References
- Nelson, D.; Pillepich, A.; Springel, V.; et al. The IllustrisTNG Simulations: Public Data Release. Computational Astrophysics and Cosmology 2019, 6. [Google Scholar] [CrossRef]
- Springel, V.; Pillepich, A.; Weinberger, R.; et al. First results from the IllustrisTNG simulations: matter and galaxy distributions. Monthly Notices of the Royal Astronomical Society 2018, 475, 676–698. [Google Scholar] [CrossRef]
- Pillepich, A.; Springel, V.; Nelson, D.; et al. First results from the IllustrisTNG simulations: the galaxy color bimodality. Monthly Notices of the Royal Astronomical Society 2018, 475, 648–675. [Google Scholar] [CrossRef]
- Rodríguez-Gómez, V.; Genel, S.; Vogelsberger, M.; et al. The merger rate of galaxies in the Illustris simulation: a comparison with observations and semi-empirical models. Monthly Notices of the Royal Astronomical Society 2015, 449, 49–64. [Google Scholar] [CrossRef]
- Noeske, K.G.; Weiner, B.J.; Faber, S.M.; et al. Star formation in AEGIS field galaxies since z=1.1: the dominance of gradually declining star formation, and the main sequence of star-forming galaxies. The Astrophysical Journal Letters 2007, 660, L43–L46. [Google Scholar] [CrossRef]
- Whitaker, K.E.; van Dokkum, P.G.; Brammer, G.; Franx, M. The Star Formation Mass Sequence Out to z=2.5. The Astrophysical Journal Letters 2012, 754, L29. [Google Scholar] [CrossRef]
- Leaman, R.; VandenBerg, D.A.; Mendes de Oliveira, C. Galactic Dynamics Timescales and Chemical Evolution. The Astrophysical Journal 2012, 750, 114. [Google Scholar] [CrossRef]
- Frenzel, S.; Pompe, B. Partial Mutual Information for Coupling Analysis of Multivariate Time Series. Physical Review Letters 2007, 99, 204101. [Google Scholar] [CrossRef] [PubMed]
- Kraskov, A.; Stögbauer, H.; Grassberger, P. Estimating mutual information. Physical Review E 2004, 69, 066138. [Google Scholar] [CrossRef] [PubMed]
- Efron, B.; Tibshirani, R.J. An Introduction to the Bootstrap; Chapman and Hall/CRC, 1994. [Google Scholar]
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).