Submitted:
18 August 2025
Posted:
01 September 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Mathematical Preliminaries and Foundations
2.1. Setting and Basic Assumptions
2.2. -Averaged Operators and the Hierarchy
- (1)
- Non-learnableif no α-averaged operator exists with convergent fixed-point iteration toward a solution.
- (2)
- Learnableif there exists an α-averaged operator such that the iteration converges to afixed point(and to a minimizer when R is convex) for all .
- (3)
- Streamable at rank Kif it is learnable and the residual mapping satisfies the uniform low-rank approximation property.
3. The -Streamably-Learnable Class
3.1. Formal Definition and Characterization
3.2. Fundamental Theorems
4. ESL-Convert Algorithm: Theory and Implementation
4.1. Algorithm Description and Correctness
4.1.0.1. Global averagedness.
4.1.0.2. Gradient cross-term.
4.1.0.3. Gradient cross-term.
4.2. Complexity Analysis
5. Control Theory Applications with Rigorous Stability Analysis
5.1. Stability Assumptions and Bounds
5.2. Model Predictive Control Application
5.2.0.4. Norm alignment.
6. Comprehensive Case Studies with Reproducibility Details
6.1. Case Study 1: ResNet-50 Training on ImageNet
6.1.1. Experimental Setup
| Metric | Standard Training | ESL Training |
|---|---|---|
| Memory Usage (GB) | 32.4 | 4.2 |
| Training Time (hours) | 18.5 | 2.3 |
| Final Accuracy (%) | 76.2 | 75.8 |
| Convergence Rate | 1.0× | 0.95× |
| Communication (GB/epoch) | 2.1 | 0.21 |
6.2. Case Study 2: Federated Learning with GPT-2
6.2.1. Experimental Configuration
| Metric | Standard FedAvg | ESL-Fed |
|---|---|---|
| Communication per client (MB) | 496 | 5.2 |
| Total communication (GB) | 49.6 | 0.52 |
| Convergence rounds | 85 | 92 |
| Final perplexity | 24.3 | 24.8 |
| Compression ratio | 1× | 95× |
6.3. Implementation Templates Performance
| Template | Memory Reduction | Speed Improvement | Accuracy Loss |
|---|---|---|---|
| ReLU Networks | 50–85× | 12–25× | |
| Transformers | 20–40× | 8–15× | |
| Federated Learning | 80–120× | 5–10× |
7. Limitations and Future Directions
- (1)
- Rank Growth: may grow rapidly for some problem families, particularly those with inherently high-dimensional structure.
- (2)
- ALS Heuristics: The CP decomposition step uses nonconvex ALS, so convergence guarantees are heuristic rather than provable.
- (3)
- Conversion Overhead: The one-time ESL conversion cost can be significant for problems solved only once.
- (4)
- Approximation Quality: For problems requiring very high precision, the required rank may become prohibitive.
8. Conclusion
Data Availability Statement
Conflicts of Interest
Use of Artificial Intelligence
Appendix A. Complete Proof of Universal ESL Membership
References
- M. Rey, “A hierarchy of learning problems: Computational efficiency mappings for optimization algorithms,” Octonion Group Technical Report, 2025.
- M. Rey, “Dense Approximation of Learnable Problems with Streamable Problems,” Octonion Group Technical Report, 2025.
- H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces. New York: Springer, 2011.
- A. Beck, First-Order Methods in Optimization. Philadelphia: SIAM, 2017.
- T. G. Kolda and B. W. Bader, “Tensor decompositions and applications,” SIAM Review, vol. 51, no. 3, pp. 455–500, 2009.
- L. Bottou, F. E. L. Bottou, F. E. Curtis, and J. Nocedal, “Optimization methods for large-scale machine learning,” SIAM Review, vol. 60, no. 2, pp. 223–311, 2018.
| 1 | CIFAR-10 for ReLU (3-layer, 50K params), WikiText-2 for Transformers (6-layer, 25M params), synthetic non-IID for Federated (100 clients). All results averaged over 5 seeds with SGD, learning rates 0.01-0.1, 100-200 epochs. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
