An Auto-Associative Unit Memory Network

Kieran Greer

doi:10.20944/preprints202412.1209.v1

Submitted:

13 December 2024

Posted:

16 December 2024

Read the latest preprint version here

Abstract

This paper describes a new auto-associative network called a Unit Memory network. It is so-called because unit nodes in a network store binary input as integer values, representing binary words. The word size is critical and specific to the dataset and it also provides a first level of cohesion over the word values. A second cohesion network then links the unit network nodes, through layers of decreasing dimension, until the top layer contains only 1 node for any pattern. Thus, a pattern can be found using a search and compare technique through the two networks. The unit memory is compared to a Hopfield network and a Sparse Distributed Memory (SDM). It is shown that the memory requirements are not unreasonable and that it has a much larger capacity than a discrete Hopfield network, for example. It can also store dense or sparse data and can deal with maybe 30% noise. This is demonstrated with test results for 3 benchmark datasets. Apart from unit size, the rest of the configuration is automatic, and its simplistic design could make it attractive for some applications.

Keywords:

auto-associative

;

Hopfield

;

sparse distributed memory

;

noise

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

1. Introduction

This paper describes a new design that can be used as an auto-associative memory. In this respect it is compared with a Discrete Hopfield Network [5], which was the original inspiration. Content-addressable memories can retrieve a full memory from any subpart of sufficient size. They are also able to retrieve the original memory when some of it has been corrupted by noise, which is of interest to this paper. As Hopfield also explains [5]: ‘Computational properties of use to biological organisms or to the construction of computers can emerge as collective properties of systems-having a large number of simple equivalent components (or neurons).’ This notes computational properties of neurons, which suggests some type of functional behaviour. Recent biological research suggests that it could be possible to have memory structures that are less functional as well, such as with glial cells. The main problem with discrete Hopfield networks is their memory capacity, which is only about 0.15 times the number of nodes. For example, a network with 100 nodes would only be able to store about 15 different patterns before they started to corrupt each other. This has now largely been solved with the new Modern Hopfield Network [8] that also does not need to be discrete. The memory structure tends to be dense however and so it may not deal so well with sparse data. The paper [6] presents one model designed to cope with sparse data, but it also notes that apart from some very nice advantages, the modern Hopfield models have been shown to be computationally heavy and vulnerable against noisy queries. In particular, the dense output alignments of the retrieval dynamics can be computationally inefficient, making models less interpretable and noise-sensitive.

This paper proposes a new model called a Unit Memory network. It is so-called because unit nodes in a network store binary input as integer values, representing binary words. Thus, a unit size is the first design feature and it is critical and specific to the dataset. It also provides a first level of cohesion over the word values, because specific word ‘patterns’ must be present for the related integer value to be flagged. A second cohesion network then links the unit network nodes, through layers of decreasing dimension, until the top layer contains only 1 node for any pattern. Thus, a pattern can be found using a search and compare technique through the two networks. The structure looks to be different, but one might think about a standard auto-encoder, with similar input and output layers and then a hidden layer that contains the common features. In this respect, that type of functionality is present in the unit memory as well.

The rest of the paper is organised as follows: Section 2 gives some related work. Section 3 describes the new Unit Memory network, including algorithms. Section 4 gives some test results, while section 5 gives some conclusions on the work.

2. Related Work

The paper [4] proposes to build a new Modern Hopfield Network that includes setwise links, creating a simplical complex. This is interesting because the simplical sets group base nodes, but into higher levels and degrees. Basically, the corollary 2.2 on page 5 states that the network capacity is proportional to the number of connections. To store more patterns, you need to add more connections. It cites earlier papers that also consider setwise links, but for a binary pairwise Hopfield network. The idea is to link non-linearly, in concentrations, where the nodes are associated. The linking process is entirely different to the cohesion network, however and continually changes the network topology. The appendices have biological reference, including neurons and synapses. A new design of Hopfield Network, described in [1], might in fact be closer. Their system still reduces an energy function, but it constructs a graph representation (discrete-graphical model) as the internal model, which is why it may be more similar. They argue for local learning rules, both in the human brain and their system. Because Deep Learning is often implemented with backpropagation, this is non-local learning and generally considered biologically implausible. They use a single forward-pass and backward-pass for retrieval, which looks to be similar, and also the one-shot integer value retrieval.

The paper [7] is the original work on sparse models. It uses a Hamming distance with the precept of: ‘The pursuit of a simple idea led to the discovery of the model, namely, that the distances between concepts in our minds correspond to the distances between points of a high-dimensional space. Strictly speaking, a mathematical space need not be a high-dimensional vector space to have the desired properties; it needs to be a huge space, with an appropriate similarity measure for pairs of points, but the measure need not define a metric on the space.’ Long (high-dimensional) binary vectors, or words, were used as the memory model and could handle 20% noise, but examples of noisy patterns were included in the memory, with a link to the correct one. Each memory address corresponded to a whole large word, not the smaller units that are used in this paper. Also with sparse data, the number of hard locations or addresses actually used was much less than the possible number from the size of the data, so a full list would contain a lot of redundancy, close to the Hopfield network capacity. Another problem with a flat list is if you are required to expand it at a particular point, then elements need to be moved. The tree structure in this paper can add new nodes more easily, but the problem still exists.

The paper [11] converted the binary numbers to decimal, which is also done in this paper, but again it was whole words not smaller units. Instead of cohesion, a Euclidean range around the selected memory was used to retrieve matching possibilities, which were then summed and a majority vote produced the final result. The paper [10] tested different methods of encoding the data stored into the SDM, including: Natural Binary Code (NBC), NBC with a different sorting of the numbers, integer values and a sum-code. But again, the stored patterns were whole images, not parts of images. They then also used a radius around the selected location, to compare with similar ones. The paper states that ‘Another big weakness of the original SDM model is that of using bit counters. This results in a low storage rate, which is about 0.1 bits per bit of traditional computer memory, huge consumption of processing power and a big complexity of implementation.’ The paper [11] notes problems with using Hamming distances, when converting binary to decimal: The integer representation has several advantages, such as a simpler encoding, it diminishes the effect of normalization when several vectors are combined and it avoids other undesirable effects. But decimal number and Hamming differences are not the same. For example, 0111 and 1000 would give decimal values of 7 and 8, with a hamming distance of 4, but a decimal difference of 1. The method of this paper still prefers the hamming difference. In their case, the decimal number is required to locate the position for a pattern in a vector list, whereas in this paper, patterns want to be clustered based on binary word similarity. The paper [10] however, describes that a conversion to the binary from some other form that also includes noise, is also a problem. For example, if there are greyscale images and 8-bit patterns are being stored, then a grey-level value of 127, or 01111111, could be converted with very little noise into 128, or 10000000, but the hamming distance between these two values is very large. The best method is therefore application-dependent, because either option is possible.

3. The Unit Memory Network

3.1. Network Construction

The model proposed in this paper is not a recursive Hopfield design, but is more like a Sparse Distributed Memory (SDM). The focus is on binary inputs only, with a value of 0 or 1. As the name suggests, the input pattern can be split up into units that each contain a certain number of the binary inputs. It could be 8 bits or 16 bits, for example and in fact the selected size is quite critical. When stored in the network, these binary numbers are converted to decimal first. Each network node therefore references words of this size and not individual binary values. This not only helps to reduce the size of the network but it adds a certain amount of cohesion, where these smaller ‘patterns’ need to occur for the unit value to be flagged. Then a second indexing network makes the retrieval and comparisons more efficient. It also adds cohesion across the units, by clustering the ones with shared values together. This cohesion network links all the base unit nodes through a number of layers that converges to a single final layer node. The dimension is reduced at each level by combining 2 values from the previous level into a compound one, and also leaving out 2 values. Figure 1 is a schematic of a unit memory network. Note that while a discrete Hopfield network is restricted to storing approximately 0.15N patterns or states, there is no such restriction with this design. A Modern Hopfield network does not have this restriction either, but it does need to add more dimensions to be able to store more information.

Binary is the chosen reduction method here, but it could be something else. The nodes at any level may therefore contain links for more than 1 base pattern and so there is a certain amount of search and comparison required, but the network structure is able to keep that to a minimum. While it may appear that this network simply stores every memory and then retrieves what is closest to the input, it does this using a lot of overlap in values that it stores and thereby reduces the memory requirements by a lot. In fact, it is probably more economic than a discrete Hopfield network for the same number of patterns. It is able to handle sparse or dense memory equally well and its simplistic structure makes it very quick to run. It is not completely deterministic however and can produce slightly different results on different test runs, but the only critical criterion is really the size of unit to use. The rest is mostly automatic. One unanswered question is how to cluster sibling nodes in a layer using the hamming distance, or at least the most economic way to do it. This would allow small traversals to neighbouring nodes, for example, but the current algorithm does not require that.

3.2. Network Use

Using the network therefore also consists of two stages. To train the network, the first stage creates the base unit nodes and assigns each node a list of integer values that represent any of the binary array words assigned to the unit. The second stage converts each pattern into a hierarchy of compound integer values and uses these to trace through the links in the trained network, that lead to the unit patterns at the base. Testing or using the network is essentially to match the new input with what base units are closest, but with some extra rules. The new input may be noisy and so if an input part does not match exactly with any values in the corresponding unit, then the unit value is set to ‘null’. A cohesion network is created for the new input pattern and the trained cohesion network is traced through to the base nodes using those values, where if a null value is encountered, then any match is allowed. This results in a small set of base unit values that are converted back to their binary equivalents, to produce a small set of whole train patterns. Each pattern is compared with the new input pattern and the one with the least hamming differences is selected as the correct pattern.

3.3. Train Algorithm

The structure consists of a unit network and a cohesion network and so both need to be built.

3.3.1. Unit Network

Create a unit network, where each node stores integer values relating to an indicated number x of base binary values.
For each input pattern, split it up into words of the indicated unit size x. Note that the unit size is important and will separate out the values (and features) differently.

o

Convert a list of base words, representing a unit, from binary to decimal.

o

Store the number, only once in a list, for the unit at the corresponding position.

3.3.2. Cohesion Network

Link and add all decimal numbers for the current pattern to the cohesion network.
o Create a layer of base compound keys by combining 2 adjacent unit node values.

o

Then to reduce the dimensions, take the second compound value of the first node and the first compound value of the second node and create a new compound value and node in a new layer from it.

o

Repeat until there is only 1 node in the top layer.

This creates a path which if followed, will lead to the input pattern at the end.

o: The paths are not unique however and because some information has been lost, there can be several instances at each node. But the numbers will be much smaller than for a flat list.
o: Thus, when there are several choices, a count of the most matching values can be used to select the best one.

The process is described in Figure 1.

Dataset	Dimensions	Unit Size
Chars74	32 x 32	12
Semeion	16 x 16	16
Silhouettes	64 x 64	4

Dataset	Row Accuracy for Noise %				Category Accuracy for Noise %
	0%	10%	20%	30%	0%	10%	20%	30%
Chars74	100	98.7	91	75.9	100	98.9	91.5	76.1
Semeion	100	96.2	90.2	85.1	99.7	95.8	90	84.9
Silhouettes	100	98.4	95.3	87.4	100	98.5	94.9	87.2

An Auto-Associative Unit Memory Network

Abstract

Keywords:

Subject:

1. Introduction

2. Related Work

3. The Unit Memory Network

3.1. Network Construction

3.2. Network Use

3.3. Train Algorithm

3.3.1. Unit Network

3.3.2. Cohesion Network

3.4. Test Algorithm

3.4.1. Unit network

3.4.2. Cohesion Network

4. Testing

4.1. Test Strategy

4.2. Test Results

5. Conclusions

References

MDPI Initiatives

Important Links

Subscribe