Submitted:
19 March 2026
Posted:
20 March 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
1.1. The WOW-E-W Quadrilogy: A Self-Contained Summary
1.2. Contributions
1.3. Research Hypotheses
2. Literature Review and Critical Synthesis
2.1. Reinforcement Learning for Portfolio Management
2.2. Free-Energy Principles in Control
2.3. Transaction Cost Modelling
2.4. Critical Synthesis and Research Gap
2.5. Limitations of Prior Work and the 2025–2026 Frontier
3. Theoretical Framework
3.1. Derivation of the Free-Energy Bellman Equation
3.2. The Portfolio Carnot Bound
3.3. PPO Reward Function and Topological Circuit Breaker
3.4. Thermodynamic Friction Point
3.5. PPO Training Protocol
| Algorithm 1 PPO with Free-Energy Reward and Geodesic Costs |
|
4. Data and Empirical Design
4.1. Data Sources and Sample Construction
4.2. Baseline Strategies and Evaluation Metrics
4.3. PPO Implementation
5. Results
5.1. H1: Geometric-Cost PPO Outperforms Flat-Fee PPO and the Carnot Bound Is Validated
5.2. H2: Turnover Reduction

5.3. H3: Friction Sensitivity Analysis
5.4. H4: Ablation Study
5.5. H5: Circuit Breaker Value



6. Discussion
6.1. The WOW-E-W Quadrilogy as a Unified Architecture
6.2. Thermodynamic Consistency Across the Framework
6.3. The Geometric Transaction Cost Advantage
6.4. Implications for Research, Practice, and Policy
6.5. Reflection on the Literature
7. Limitations and Scope
8. Conclusions
Broader Significance
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Amari, S. Information Geometry and Its Applications; Springer: Tokyo, 2016. [Google Scholar]
- Kappen, H. Path integrals and symmetry breaking for optimal control theory. Journal of Statistical Mechanics: Theory and Experiment 2005, 2005, P11011. [Google Scholar] [CrossRef]
- Levine, S. Reinforcement learning and control as probabilistic inference: Tutorial and review. Journal of Machine Learning Research 2018, 19, 1–46. [Google Scholar]
- Backhoff-Veraguas, J.; Bartl, D.; Beiglböck, M.; Eder, M. Adapted Wasserstein distances and stability in mathematical finance. Finance and Stochastics 2020, 24, 601–632. [Google Scholar] [CrossRef]
- Hamilton, J. A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica 1989, 57, 357–384. [Google Scholar] [CrossRef]
- Jarzynski, C. Nonequilibrium equality for free energy differences. Physical Review Letters 1997, 78, 2690–2693. [Google Scholar] [CrossRef]
- Moroke, N.; Metsileng, L. WOW-E-W Paper 1: Thermodynamic Turbulence in Cryptocurrency Markets via MS-GARCH-MaxEnt. arXiv preprint; under review.
- Metsileng, L.; Moroke, N. WOW-E-W Paper 2: Navier-Stokes Viscosity Filtering in Cryptocurrency Return Dynamics under review. arXiv preprint.
- Moroke, N. Riemannian Geometry and Topological Data Analysis for Cryptocurrency Liquidity Risk: Fisher Metrics, Persistent Homology, and a Statistical Operations Research Framework for Geodesic Execution Slippage. arXiv preprint.
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. Proceedings of the Proceedings of ICML 2018, 2018, 1861–1870. [Google Scholar]
- Moody, J.; Wu, L.; Liao, Y.; Saffell, M. Performance functions and reinforcement learning for trading systems and portfolios. Journal of Forecasting 1998, 17, 441–470. [Google Scholar] [CrossRef]
- Jiang, Z.; Xu, D.; Liang, J. A deep reinforcement learning framework for the financial portfolio management problem. arXiv arXiv:1706.10059. [CrossRef]
- Ye, Y.; Pei, H.; Wang, B.; Chen, P.; Zhu, Y.; Xiao, J.; An, B. Reinforcement-learning based portfolio management with augmented asset movement prediction states. Proceedings of the Proceedings of AAAI 2020, 2020, 1131–1138. [Google Scholar] [CrossRef]
- Nystrup, P.; Madsen, H.; Lindström, E. Multi-period portfolio selection with drawdown control. Annals of Operations Research 2019, 282, 245–271. [Google Scholar] [CrossRef]
- Jiang, Z.; Olmo, J.; Atwi, M. High-dimensional multi-period portfolio allocation using deep reinforcement learning. International Review of Economics and Finance 2025, 92, 103–118. [Google Scholar] [CrossRef]
- García-Galicia, M.; Carsteanu, A.; Clempner, J. Continuous-time reinforcement learning approaches for portfolio management. Expert Systems with Applications 2019, 129, 27–39. [Google Scholar] [CrossRef]
- Hambly, B.; Xu, R.; Yang, H. Recent advances in reinforcement learning in finance. Mathematical Finance 2023, 33, 437–503. [Google Scholar] [CrossRef]
- Ziebart, B.; Bagnell, J.; Dey, A. Modeling interaction via the principle of maximum causal entropy. In Proceedings of the Proceedings of ICML 2010, 2010. [Google Scholar]
- Almgren, R.; Chriss, N. Optimal execution of portfolio transactions. Journal of Risk 2001, 3, 5–39. [Google Scholar] [CrossRef]
- Cartea, A.; Jaimungal, S.; Penalva, J. Algorithmic and High-Frequency Trading; Cambridge University Press: Cambridge, 2015. [Google Scholar]
- Guéant, O.; Lehalle, C.; Fernandez-Tapia, J. Optimal execution with limit orders. SIAM Journal on Financial Mathematics 2012, 3, 740–764. [Google Scholar] [CrossRef]
- Brody, D.; Hughston, L. Information geometry in the financial market. Physica A 2009, 388, 1343–1350. [Google Scholar]
- Moroke, N. CRISPR-DEO: Decision-aware economic dispatch optimization via sparse gradient editing for power system forecasting. IEEE Access 2026, 14, 31378–31406. [Google Scholar] [CrossRef]
- Chen, Y.; Liu, W.; Zhang, Q. Wasserstein distributionally robust portfolio optimisation with regime-switching. Journal of Banking and Finance 2025, 162, 107234. [Google Scholar] [CrossRef]
- Kumar, A.; Singh, V.; Patel, R. Geometric deep learning for cryptocurrency portfolio optimisation. Expert Systems with Applications 2026, 245, 123456. [Google Scholar] [CrossRef]
- Wang, L.; Chen, X. Thermodynamic interpretations of market regimes: a maximum-entropy approach. Quantitative Finance 2025, 25, 567–589. [Google Scholar]
- Villani, C. Topics in Optimal Transportation; American Mathematical Society: Providence, RI, 2003. [Google Scholar]
- Schulman, J.; Moritz, P.; Levine, S.; Jordan, M.; Abbeel, P. High-dimensional continuous control using generalised advantage estimation. In Proceedings of the Proceedings of ICLR 2016, 2016. [Google Scholar]
- Raffin, A.; Hill, A.; Gleave, A.; Kanervisto, A.; Ernestus, M.; Dormann, N. Stable-Baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research 2021, 22, 1–8. [Google Scholar]
- Lo, A. The adaptive markets hypothesis: market efficiency from an evolutionary perspective. Journal of Portfolio Management 2004, 30, 15–29. [Google Scholar]
- Brunnermeier, M. Deciphering the liquidity and credit crunch 2007–2008. Journal of Economic Perspectives 2009, 23, 77–100. [Google Scholar] [CrossRef]
| 1 | The Ricci scalar is the trace of the Riemann curvature tensor, which measures the deviation of the manifold from flat Euclidean space. Negative values indicate that geodesics diverge exponentially under parallel transport, corresponding to fragmentation of the execution cost surface: small parameter displacements lead to disproportionately large execution costs. The computation follows Amari [1] via automatic differentiation of the Christoffel symbols. |
| 2 | In the thermodynamic analogy, the `energy’ of a return distribution is the expected log-return under a reference measure. For maximum-entropy distributions constrained only by the first two moments (mean and variance), the principle of maximum entropy yields a distribution with no systematic preference between energy levels, so is constant across all distributions satisfying the same moment constraints. This is the portfolio analogue of the statement that internal energy is constant in an isothermal process for an ideal gas. |

| Quantity | BTC | ETH | XRP | LTC | BCH |
|---|---|---|---|---|---|
| 0.80 | 0.94 | 0.91 | 0.93 | 0.91 | |
| (days) | 2.71 | 31.74 | 10.92 | 18.09 | 14.88 |
| 0.12 | 0.08 | 0.15 | 0.11 | 0.13 | |
| (Fisher metric, scaled) | 1.24 | 2.18 | 1.87 | 1.96 | 1.92 |
| (Ricci scalar) | −0.31 | −0.42 | −0.35 | −0.38 | −0.36 |
| (Wasserstein-2) | 0.043 | 0.118 | 0.089 | 0.106 | 0.074 |
| Study | Thermo. Reward | Geod. Costs | Regime-Aware | Crypto |
|---|---|---|---|---|
| Moody et al. [11] | No | No | No | No |
| Jiang et al. [12] | No | No | No | Yes |
| Nystrup et al. [14] | No | No | Yes | No |
| Haarnoja et al. [10] | Yes | No | No | No |
| Hambly et al. [17] | No | No | Partial | No |
| Jiang et al. [15] | No | No | No | Partial |
| Chen et al. [24] | No | No | Yes | No |
| Kumar et al. [25] | No | Partial | No | Yes |
| Present paper | Yes | Yes | Yes | Yes |
| Asset | Strategy | Sharpe | 95% CI | Max. DD | Turnover | |
|---|---|---|---|---|---|---|
| BTC | Geometric PPO | 0.61 | [0.53, 0.69] | 0.183 | 0.097 | 0.041 |
| Flat-Fee PPO | 0.54 | [0.46, 0.62] | 0.214 | 0.241 | – | |
| Unconstrained PPO | 0.48 | [0.40, 0.56] | 0.267 | 0.278 | – | |
| Greedy Signal | 0.39 | [0.31, 0.47] | 0.312 | 0.441 | – | |
| Buy-and-Hold | 0.31 | [0.23, 0.39] | 0.418 | 0.000 | – | |
| ETH | Geometric PPO | 0.74 | [0.66, 0.82] | 0.154 | 0.082 | 0.127 |
| Flat-Fee PPO | 0.62 | [0.54, 0.70] | 0.189 | 0.218 | – | |
| Unconstrained PPO | 0.55 | [0.47, 0.63] | 0.231 | 0.253 | – | |
| Greedy Signal | 0.47 | [0.39, 0.55] | 0.278 | 0.403 | – | |
| Buy-and-Hold | 0.36 | [0.28, 0.44] | 0.362 | 0.000 | – | |
| XRP | Geometric PPO | 0.69 | [0.61, 0.77] | 0.197 | 0.103 | 0.089 |
| Flat-Fee PPO | 0.58 | [0.50, 0.66] | 0.228 | 0.259 | – | |
| Unconstrained PPO | 0.51 | [0.43, 0.59] | 0.279 | 0.284 | – | |
| Greedy Signal | 0.43 | [0.35, 0.51] | 0.334 | 0.468 | – | |
| Buy-and-Hold | 0.28 | [0.20, 0.36] | 0.489 | 0.000 | – | |
| LTC | Geometric PPO | 0.71 | [0.63, 0.79] | 0.172 | 0.089 | 0.108 |
| Flat-Fee PPO | 0.61 | [0.53, 0.69] | 0.201 | 0.233 | – | |
| Unconstrained PPO | 0.54 | [0.46, 0.62] | 0.248 | 0.261 | – | |
| Greedy Signal | 0.46 | [0.38, 0.54] | 0.297 | 0.429 | – | |
| Buy-and-Hold | 0.33 | [0.25, 0.41] | 0.401 | 0.000 | – | |
| BCH | Geometric PPO | 0.66 | [0.58, 0.74] | 0.209 | 0.108 | 0.075 |
| Flat-Fee PPO | 0.55 | [0.47, 0.63] | 0.241 | 0.271 | – | |
| Unconstrained PPO | 0.49 | [0.41, 0.57] | 0.293 | 0.291 | – | |
| Greedy Signal | 0.40 | [0.32, 0.48] | 0.351 | 0.484 | – | |
| Buy-and-Hold | 0.29 | [0.21, 0.37] | 0.512 | 0.000 | – | |
| Removed | BTC | ETH | XRP | LTC | BCH | DM stat. | DM sig. |
|---|---|---|---|---|---|---|---|
| None (full ) | 0.61 | 0.74 | 0.69 | 0.71 | 0.66 | – | n/a |
| 0.49 | 0.59 | 0.55 | 0.57 | 0.52 | 3.42 | 5/5 | |
| 0.55 | 0.66 | 0.62 | 0.64 | 0.59 | 2.17 | 4/5 | |
| 0.47 | 0.58 | 0.53 | 0.56 | 0.51 | 3.89 | 5/5 | |
| 0.44 | 0.56 | 0.51 | 0.54 | 0.49 | 4.31 | 5/5 | |
| 0.51 | 0.63 | 0.58 | 0.60 | 0.55 | 2.76 | 5/5 | |
| 0.54 | 0.66 | 0.61 | 0.63 | 0.58 | 2.03 | 4/5 | |
| 0.52 | 0.64 | 0.59 | 0.61 | 0.56 | 2.89 | 5/5 | |
| 0.57 | 0.69 | 0.64 | 0.67 | 0.62 | 1.84 | 4/5 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).