Submitted:
07 November 2025
Posted:
07 November 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Scaled-Dot-Product Attention in Transformer
- is a positive scaling factor.
- t is a transpose operator, which transforms a column vector to a row vector.
- or the dot product denoted as <q,> is the inner product between the query vector and i-th key vector
- The denominator is the sum of the exponentials of all the inner products between the query q and the set of key vectors .
- is the softmax weight corresponding to the i-th key vector
3. Tensor Representation of Distributed Memory
3.1. Pairwise Associations
3.2. Extending Cooper’s Distributed Memory for Multi-Way Associations
3.3. The Key-Value Representation of the Memory Operator
3.4. Associative Retrieval via Query
4. The Linear Kernel: A Connection to RNNs
5. The Recursive Nature of Linear Memory
6. Kernel Function Selection
| Name | Expression | Type |
| Exponential | ψ (z)=exp(z) | Universal |
| Linear | ψ (z)=z | Valid |
| Geometric Series | ψ (z)=1/(1-z) | Conditionally universal if its input, z, is within in range of (-1,1). This condition is met when z is computed from cosine kernel |
| Cosine | k(, ) =< , > / | Valid when such as |
| Shift Cosine | k(, ) =(1+z) | Valid when z is computed from cosine kernel |
| RBF | k(, ) =exp() | Universal when |
| Clipped Geometric Series | k(, ) =1/(1+eps-c(<, >)/M) | Conditionally universal, c(<, >) =M when <, > >M, where M>0, such as M=300, eps is small positive number such as eps=1.e-4 |
| Relu | k(, =+ eps | Valid, where , and eps is small positive number such as eps=1.e-4 |
| Relu2 | k(, =+ eps | Valid, where , and eps is small positive number such as eps=1.e-4 |
| Elu | k(, = | Valid, where |
7. Experiments
8. Conclusion
References
- James A. Anderson. A simple neural network generating an interactive memory. Mathematical Biosciences Volume 14, Issues 3–4, August 1972, Pages 197-220.
- L. N. Cooper. A possible organization of animal memory and learning. Proceedings of the Nobel Symposium on Collective Properties of Physical Systems, B. Lundquist and S. Lundquist (Eds.), New York: Academic Press, pp. 252-264, 1973.
- Leon N Cooper. Distributed memory, 1985. Available online: https://apps.dtic.mil/sti/tr/pdf/ADA153364.pdf.
- JJ Hopfield. Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci USA. 1982.
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin. Attention Is All You Need. Advances in Neural Information Processing Systems, 2017.
- Jiyong Ma. Deriving the Scaled-Dot-Function via Maximum Likelihood Estimation and Maximum Entropy Approach, 2025, arXiv:2509.12285.
- Zhen Qin, XiaoDong Han, Weixuan Sun, Dongxu Li, Lingpeng Kong, Nick Barnes, Yiran Zhong. The Devil in Linear Transformer.2022, arXiv:2210.10340.
- Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, François Fleuret. Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention. Proceedings of the 37th International Conference on Machine Learning, PMLR 119:5156-5165, 2020.
- Weizhe Hua, Zihang Dai, Hanxiao Liu, Quoc Le. Transformer Quality in Linear Time. Proceedings of the 39th International Conference on Machine Learning, PMLR 162:9099-9117, 2022.
- Ingo Steinwart. On the Influence of the Kernel on the Consistency of Support Vector Machines, Journal of Machine Learning Research 2001.
- I. J. Schoenberg. Positive definite functions on spheres. Duke Mathematical Journal, 9(1):96–108, 1942.
- Andrej Karpathy. NanoGPT. Available online: https://github.com/karpathy/nanoGPT.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).