Submitted:
13 April 2026
Posted:
15 April 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
- Note: This paper is Paper 2 of the Financial Metabolomics Series. Paper 1 [1] (under review) establishes the Gaussian-Weighted Swin Spatio-Temporal Network (GWS-STNet) and the topological contraction conjecture. Paper 3 will establish the fractal conservation law.
1.1. Positioning Within the Literature
1.2. Contributions
- (i)
- Entropy-Saliency Equivalence. We conjecture and empirically verify that is an asymptotically unbiased estimator of , with bias decaying at the parametric rate under Gaussian regularity conditions on the PMNet output distribution.
- (ii)
- KSG bias-variance decomposition. We characterise the finite-sample properties of the KSG estimator of transfer entropy as used in , establishing a minimax-optimal convergence rate and providing explicit bootstrap confidence intervals.
- (iii)
- Spatio-Temporal Information Flux (STIF). We propose STIF as a novel evaluation metric quantifying sector-level directed information flow in bits per trading day, and establish its consistency under the same regularity conditions as the equivalence theorem.
- (iv)
- Causal stress audit protocol. We demonstrate a three-step auditing procedure—Jacobian check, TE check, Eskom record verification—that satisfies the FSRA and MiFID II regulatory requirements for model explainability.
1.3. Paper Organisation
2. Related Work and Literature Positioning
2.1. Stream 1—Information Geometry and Neural Networks
2.2. Stream 2—Transfer Entropy in Financial Networks
2.3. Stream 3—Attribution Methods for Regulatory Compliance
2.4. Literature Gap Map
3. Information-Geometric Preliminaries
3.1. Statistical Manifold and Fisher Metric
- The resting distribution : the empirical return distribution of sector i over the training baseline period (January 2015 to December 2018, days), estimated by kernel density estimation [22] with Gaussian kernel and bandwidth .
- The stressed distribution : the empirical return distribution of sector i over a rolling window of trading days ending at day t, estimated by the same kernel density estimator with bandwidth .
3.2. Fisher Score Representation of Neural Network Gradients
4. The Entropy-Saliency Equivalence Theorem
4.1. Main Theorem
- (i)
- The PMNet residual distribution belongs to a regular exponential family with sufficient statistic and natural parameter [20].
- (ii)
- The Fisher information matrix is positive definite for all , , with smallest eigenvalue uniformly bounded away from zero.
- (iii)
- (iv)
- The temperature parameter in the transfer-entropy weighting satisfies to ensure the exponential weights are summable and dominated by the maximum TE.
- Heuristic derivation.
- Step 1: Score decomposition of . By Definition 5 [1] (under review),Applying Lemma 2:where the remainder .
- Step 2: Expectation of the score term. Taking expectations under :For a distribution in a regular exponential family, the score function satisfies in the mean-parameter correspondence [20]. Therefore:
- Step 3: Connection to KL divergence via Fisher metric. By Lemma 1 (KL-Fisher approximation), to second order in :The gradient of with respect to at is:Since to first order in (continuity of the Fisher matrix), we obtain:To first order in : , since the KL gradient equals scaled by the inverse Fisher metric.
4.2. Asymptotic Distribution and Confidence Intervals
5. Bias-Variance Analysis of the KSG Transfer Entropy Estimator
5.1. The KSG Estimator
6. Spatio-Temporal Information Flux (STIF)
6.1. Definition and Motivation
7. Empirical Design
7.1. Dataset
7.2. Data Diagnostics
| Statistic | Mean | P25 | P75 | Implication |
|---|---|---|---|---|
| Residual excess kurtosis | Approx. Gaussian residuals | |||
| Gaussian AIC weight | Exp. family assumption valid | |||
| KS continuity p-value | density smoothness | |||
| Ljung-Box insignificance lag | 11 | 8 | 15 | Mixing: block justified |
| KSG bias (LOO, bits) | of TE point estimates | |||
| Bootstrap CI width (bits) | Acceptable precision |
7.3. Estimation Protocol
7.4. Computational Cost of KL-Saliency Estimation
7.5. Validation Strategy
8. Results
8.1. Equivalence Validation
8.2. STIF Network Analysis
8.3. Phase Portrait of Latent-State Convergence
8.4. Glass-Box KL Divergence Tracking
8.5. Glass-Box Attribution Network
8.6. KSG Bias-Variance Diagnostics
8.7. Asymptotic Normality Test
9. Discussion
9.1. Implications for Regulatory Audit
9.2. Limitations
10. Conclusion
Future Directions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Institutional Review Board Statement
Informed Consent Statement
Use of Artificial Intelligence
Acknowledgments
References
- Moroke, N.D. Gaussian-Weighted Swin Spatio-Temporal Networks as Contraction Operators on Financial Manifolds: A Glass-Box Framework for Systemic Stress Detection in the Johannesburg Stock Exchange. Mathematics 2026, xx, x. (Financial Metabolomics Series, Paper 1 [1] (under review)).
- Amari, S. Information Geometry and Its Applications; Springer: Tokyo, Japan, 2016. [Google Scholar]
- Schreiber, T. Measuring information transfer. Phys. Rev. Lett. 2000, 85, 461–464. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Lee, S.-I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4765–4774. [Google Scholar]
- Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why should I trust you?”: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; ACM: New York, NY, USA, 2016; pp. 1135–1144. [Google Scholar]
- Slack, D.; Hilgard, S.; Jia, E.; Singh, S.; Lakkaraju, H. Fooling LIME and SHAP: Adversarial attacks on post hoc explanation methods. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society; ACM: New York, NY, USA, 2020; pp. 180–186. [Google Scholar]
- Adebayo, J.; Gilmer, J.; Muelly, M.; Goodfellow, I.; Hardt, M.; Kim, B. Sanity checks for saliency maps. Adv. Neural Inf. Process. Syst. 2018, 31, 9505–9515. [Google Scholar]
- Republic of South Africa. Financial Sector Regulation Act No. 9 of 2017. Government Gazette 2017, No. 41062.
- European Parliament. Directive 2014/65/EU on Markets in Financial Instruments (MiFID II). Official Journal of the European Union 2014, L 173, 349–496. [Google Scholar]
- Kakade, S.; Foster, D. Dopamine modulation in a basal ganglio-cortical network implements saliency-based gating of working memory. J. Mach. Learn. Res. 2002, 3, 1409–1445. [Google Scholar]
- Kraskov, A.; Stögbauer, H.; Grassberger, P. Estimating mutual information. Phys. Rev. E 2004, 69, 066138. [Google Scholar] [CrossRef] [PubMed]
- Barnett, L.; Barrett, A.B.; Seth, A.K. Granger causality and transfer entropy are equivalent for Gaussian variables. Phys. Rev. Lett. 2009, 103, 238701. [Google Scholar] [CrossRef]
- Kwon, O.; Yang, J.-S. Information flow between stock indices. Europhys. Lett. 2008, 82, 68003. [Google Scholar] [CrossRef]
- Dimpfl, T.; Peter, F.J. Using transfer entropy to measure information flows between financial markets. Stud. Nonlinear Dyn. Econom. 2013, 17, 85–102. [Google Scholar] [CrossRef]
- Sandoval, L. Structure of a global network of financial companies based on transfer entropy. Entropy 2014, 16, 4443–4482. [Google Scholar] [CrossRef]
- Bianchi, D.; Buchner, M.; Tamoni, A. Bond risk premiums with machine learning. Rev. Financ. Stud. 2021, 34, 1046–1089. [Google Scholar] [CrossRef]
- Gu, S.; Kelly, B.; Xiu, D. Empirical asset pricing via machine learning. Rev. Financ. Stud. 2020, 33, 2223–2273. [Google Scholar] [CrossRef]
- Martens, J. New insights and perspectives on the natural gradient method. J. Mach. Learn. Res. 2020, 21, 1–76. [Google Scholar]
- Cramér, H. Mathematical Methods of Statistics; Princeton University Press: Princeton, NJ, USA, 1946. [Google Scholar]
- Brown, L.D. Fundamentals of Statistical Exponential Families; Institute of Mathematical Statistics: Hayward, CA, USA, 1986. [Google Scholar]
- Rao, C.R. Information and the accuracy attainable in the estimation of statistical parameters. Bull. Calcutta Math. Soc. 1945, 37, 81–91. [Google Scholar]
- Silverman, B.W. Density Estimation for Statistics and Data Analysis; Chapman and Hall: London, UK, 1986. [Google Scholar]
- Duchi, J. Derivations for Linear Algebra and Optimization; Technical Report; Stanford University: Stanford, CA, USA, 2007. [Google Scholar]
- Biau, G.; Devroye, L. Lectures on the Nearest Neighbor Method; Springer: Cham, Switzerland, 2015. [Google Scholar]
- Devroye, L.P.; Wagner, T.J. The strong uniform consistency of nearest neighbor density estimates. Ann. Stat. 1977, 5, 536–540. [Google Scholar] [CrossRef]
- Cont, R. Empirical properties of asset returns: Stylised facts and statistical issues. Quant. Finance 2001, 1, 223–236. [Google Scholar] [CrossRef]
- Kozachenko, L.F.; Leonenko, N.N. Sample estimate of the entropy of a random vector. Probl. Inf. Transm. 1987, 23, 95–101. [Google Scholar]
- Künsch, H.R. The jackknife and the bootstrap for general stationary observations. Ann. Stat. 1989, 17, 1217–1241. [Google Scholar] [CrossRef]
- van der Vaart, A.W. Asymptotic Statistics; Cambridge University Press: Cambridge, UK, 2000. [Google Scholar]
- Janzing, D.; Minorics, L.; Blöbaum, P. Feature relevance quantification in explainable AI: A causal problem. In Proceedings of the 23rd AISTATS, PMLR. Palermo, Italy, 26–28 August 2020; pp. 2907–2916. [Google Scholar]
- Greene, W.H. Econometric Analysis, 8th ed.; Pearson: New York, NY, USA, 2018. [Google Scholar]
- Cameron, A.C.; Gelbach, J.B.; Miller, D.L. Robust inference with multiway clustering. J. Bus. Econ. Stat. 2011, 29, 238–249. [Google Scholar] [CrossRef]
- Jarque, C.M.; Bera, A.K. A test for normality of observations and regression residuals. Int. Stat. Rev. 1987, 55, 163–172. [Google Scholar] [CrossRef]
- Wang, Q.; Kulkarni, S.R.; Verdú, S. Divergence estimation for multidimensional densities via k-nearest-neighbour distances. IEEE Trans. Inf. Theory 2009, 55, 2392–2405. [Google Scholar] [CrossRef]
- Leontief, W.W. The Structure of the American Economy, 1919–1929; Harvard University Press: Cambridge, MA, USA, 1941. [Google Scholar]
- Percival, D.B.; Walden, A.T. Wavelet Methods for Time Series Analysis; Cambridge University Press: Cambridge, UK, 2000. [Google Scholar]
- Massey, F.J. The Kolmogorov-Smirnov test for goodness of fit. J. Am. Stat. Assoc. 1951, 46, 68–78. [Google Scholar] [CrossRef]
- Ljung, G.M.; Box, G.E.P. On a measure of lack of fit in time series models. Biometrika 1978, 65, 297–303. [Google Scholar] [CrossRef]
- International Monetary Fund. Artificial Intelligence in Finance: Implications for Regulation and Supervision. IMF Fintech Note 2023, No. 2023/001.
- Cunningham, J.P.; Yu, B.M. Dimensionality reduction for large-scale neural recordings. Nat. Neurosci. 2014, 17, 1500–1509. [Google Scholar] [CrossRef]
- Eberhard, A.; Godinho, C. Eskom and the practice of load shedding. J. South. Afr. Stud. 2017, 43, 1291–1307. [Google Scholar]
- Allen, F.; Gale, D. Financial contagion. J. Polit. Econ. 2000, 108, 1–33. [Google Scholar] [CrossRef]
- Akaike, H. A new look at the statistical model identification. IEEE Trans. Autom. Control 1974, 19, 716–723. [Google Scholar] [CrossRef]


| Paper | Journal | Stream | F1 | F2 | F3 | F4 | F5 | F6 | F7 |
|---|---|---|---|---|---|---|---|---|---|
| Amari [2] | Springer | 1 | • | • | — | — | — | — | — |
| Martens [18] | JMLR | 1 | ∘ | ∘ | — | — | — | — | — |
| Kakade & Foster [10] | JMLR | 1 | ∘ | • | — | — | — | — | — |
| Cramér [19] | Princeton UP | 1 | • | — | — | — | — | — | — |
| Rao [21] | Bull.Cal.Math | 1 | • | — | — | — | — | — | — |
| Brown [20] | IMS | 1 | ∘ | ∘ | — | — | — | — | — |
| Schreiber [3] | PRL | 2 | — | — | — | • | — | — | — |
| Barnett et al. [12] | PRL | 2 | ∘ | — | — | • | — | — | — |
| Kraskov et al. [11] | Phys.Rev.E | 2 | — | — | ∘ | — | — | — | — |
| Biau & Devroye [24] | Springer | 2 | ∘ | — | • | — | — | — | — |
| Kwon & Yang [13] | EPL | 2 | — | — | — | ∘ | — | — | — |
| Dimpfl & Peter [14] | SNDE | 2 | — | — | — | ∘ | — | — | — |
| Sandoval [15] | Entropy | 2 | — | — | — | ∘ | — | — | — |
| Lundberg & Lee [4] | NeurIPS | 3 | — | — | — | — | — | ∘ | — |
| Ribeiro et al. [5] | KDD | 3 | — | — | — | — | — | ∘ | — |
| Adebayo et al. [7] | NeurIPS | 3 | — | — | — | — | — | — | — |
| Slack et al. [6] | AIES | 3 | — | — | — | — | — | — | — |
| Bianchi et al. [16] | RFS | 3 | ∘ | ∘ | — | — | — | ∘ | — |
| Gu et al. [17] | RFS | 3 | — | — | — | — | — | ∘ | — |
| MiFID II [9]/FSRA [8] | Legislation | 3 | — | — | — | — | — | ∘ | — |
| This paper | Entropy | 1+2+3 | • | • | • | • | • | • | • |
| Component | Wall-clock (hrs) | Memory (GB) | Parallelisable? |
|---|---|---|---|
| KSG TE (full matrix) | Yes (pairwise) | ||
| KSG TE (top-20 sparse) | Yes | ||
| KL divergence (Gaussian closed form) | Yes | ||
| Block bootstrap () | Yes | ||
| Saliency backprop (hold-out) | Yes (by sector) | ||
| Total (full matrix) | |||
| Total (sparse approx.) |
| Variable | Coef. | SE | t-stat | p-value |
|---|---|---|---|---|
| (slope) | 0.974 *** | |||
| Intercept | ||||
| Adj. | ||||
| F-test () | ||||
| Sector FE | Yes | |||
| Day FE | Yes |
| Pair | Bias (LOO) | Std.Dev. | 95% CI width | Theor. MSE | |
|---|---|---|---|---|---|
| Energy→Financials | |||||
| Materials→Industrials | |||||
| IT→ConsDisc | |||||
| Utilities→Energy | |||||
| Financials→Materials |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).