Preprint
Article

This version is not peer-reviewed.

Sparse Projection Attention: A Computationally Efficient Framework for Long Sequence Modeling

Submitted:

03 April 2026

Posted:

06 April 2026

You are already at the latest version

Abstract
The self-attention mechanism has revolutionized sequence modeling but suffers from quadratic computational complexity with respect to sequence length, limiting its applicability to long sequences. We propose Sparse Projection Attention (SPA), a novel attention variant that leverages learnable sparse projections to reduce the effective dimensionality of queries and keys while maintaining expressive power. Our method is grounded in the Johnson-Lindenstrauss lemma and provides theoretical guarantees on distance preservation. We introduce a comprehensive mathematical framework including error bounds, convergence analysis, and gradient dynamics. Experimental results on language modeling, machine translation, and long-range sequence classification demonstrate that SPA achieves up to 8 × computational speedup while maintaining competitive performance compared to standard attention and other sparse variants. The proposed approach offers an effective trade-off between computational efficiency and model expressivity for long-sequence tasks, making transformers more accessible for resource-constrained environments and real-time applications.
Keywords: 
;  ;  ;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated