Submitted:
31 August 2025
Posted:
01 September 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- A formal model for coherence-aware I/O complexity analysis
- Tight lower bounds for fundamental dense linear algebra operations
- Optimal algorithms that achieve these bounds
- Applications to quantum-inspired classical computing
2. Related Work
2.1. Memory Coherence in Computing Systems
2.2. Quantum-Inspired Computing
2.3. Pebble Games and Computational Complexity
3. The Red-Blue-Green Pebble Game
3.1. Game Rules
- Red pebbles: Represent data in fast memory (limited to S pebbles)
- Blue pebbles: Represent data in slow memory (unlimited)
- Green pebbles: Represent coherent state (limited to C pebbles)
3.2. Coherence-Critical Operations
- Y separates inputs from outputs (traditional cut property)
- The vertices in Y represent “coherence-critical” operations that must maintain live state
- In matrix factorizations, coherence-critical operations might be those that maintain partial factorizations
- In Krylov methods, they might be operations that maintain basis vectors
- In streaming algorithms, they might be operations that maintain sketches or summaries
4. Main Results
5. Applications to Dense Linear Algebra
5.1. Matrix Multiplication (GEMM)
5.2. QR Factorization
5.3. Cholesky Decomposition
6. Optimal Algorithms
6.1. Streaming Matrix Multiplication
6.2. Streaming QR Factorization
7. Discussion and Future Work
- Quantum-inspired classical algorithms
- Distributed computing systems
- Streaming data processing
- Machine learning workloads
8. Conclusions
Data Availability Statement
Use of Artificial Intelligence
Acknowledgments
Conflicts of Interest
References
- S. L. Altmann and P. Herzig. Point-Group Theory Tables. Oxford University Press, Oxford, UK, 2018. [CrossRef]
- G. Ballard, J. Demmel, O. Holtz, B. Lipshitz, and O. Schwartz. Graph expansion analysis for communication costs of fast rectangular matrix multiplication. In Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA ’11), pages 13–22. ACM, New York, NY, USA, 2011. [CrossRef]
- G. Ballard, J. Demmel, O. Holtz, and O. Schwartz. Minimizing communication in numerical linear algebra. SIAM Journal on Matrix Analysis and Applications, 32(3):866–901, 2011. [CrossRef]
- G. Ballard, J. Demmel, O. Holtz, and O. Schwartz. Communication-optimal parallel and sequential QR and LU factorizations. SIAM Journal on Scientific Computing, 34(1):A206–A239, 2012. [CrossRef]
- G. Ballard, E. Carson, J. Demmel, M. Hoemmen, N. Knight, and O. Schwartz. Communication lower bounds and optimal algorithms for numerical linear algebra. Acta Numerica, 23:1–155, 2014. [CrossRef]
- G. Ballard, J. Demmel, and N. Knight. Avoiding communication in successive band reduction. ACM Transactions on Mathematical Software, 50(1):1–31, 2024. [CrossRef]
- G. Bilardi and F. P. Preparata. Processor-time tradeoffs under bounded-speed message propagation: Part I, upper bounds. SIAM Journal on Computing, 28(4):1410–1431, 1999.
- E. Carson and N. J. Higham. Accelerating the solution of linear systems by iterative refinement in three precisions. SIAM Journal on Scientific Computing, 37(5):A2670–A2696, 2015. [CrossRef]
- L. Grigori, J. W. Demmel, and H. Xiang. CALU: A communication optimal LU factorization algorithm. SIAM Journal on Matrix Analysis and Applications, 32(4):1317–1350, 2011. [CrossRef]
- M. Herlihy, N. Shavit, V. Luchangco, and M. Spear. The Art of Multiprocessor Programming, Revised Reprint. Morgan Kaufmann, Burlington, MA, USA, 2020.
- J.-W. Hong and H. T. Kung. I/O complexity: The red-blue pebble game. In Proceedings of the 13th Annual ACM Symposium on Theory of Computing (STOC ’81), pages 326–333. ACM, New York, NY, USA, 1981. [CrossRef]
- D. Irony, S. Toledo, and A. Tiskin. Communication lower bounds for distributed-memory matrix multiplication. Journal of Parallel and Distributed Computing, 64(9):1017–1026, 2004. [CrossRef]
- T. Liu and V. Vaikuntanathan. Breaking the circuit-size barrier in secret sharing. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing (STOC ’18), pages 699–708. ACM, New York, NY, USA, 2018. [CrossRef]
- S. Muthukrishnan. Data streams: Algorithms and applications. Foundations and Trends in Theoretical Computer Science, 1(2):117–236, 2005. Now Publishers Inc. [CrossRef]
- M. A. Nielsen and I. L. Chuang. Quantum Computation and Quantum Information: 10th Anniversary Edition. Cambridge University Press, Cambridge, UK, 2010. [CrossRef]
- J. Nordström. Pebble games, proof complexity, and time-space trade-offs. Logical Methods in Computer Science, 9(3):1–63, 2013. [CrossRef]
- A. Ross and V. Vittal. Comprehensive review of communication-avoiding algorithms in numerical linear algebra. ACM Computing Surveys, 52(1):1–35, 2019. [CrossRef]
- J. E. Savage. Extending the Hong-Kung model to memory hierarchies. In Computing and Combinatorics (COCOON ’95), Lecture Notes in Computer Science, vol. 959, pages 270–281. Springer, Berlin, Heidelberg, 1995. [CrossRef]
- D. J. Sorin, M. D. Hill, and D. A. Wood. A primer on memory consistency and cache coherence. Synthesis Lectures on Computer Architecture, 6(3):1–212, 2011. Morgan & Claypool Publishers. [CrossRef]
- W. A. Wulf and S. A. McKee. Hitting the memory wall: Implications of the obvious. ACM SIGARCH Computer Architecture News, 23(1):20–24, 1995. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
