Preprint
Short Note

This version is not peer-reviewed.

MNIST Dataset Heatmap Classifier

Submitted:

10 January 2025

Posted:

14 January 2025

You are already at the latest version

Abstract
In this paper, a simple heatmap-based classifier is proposed that uses average pixel values from training data to assess image similarity to individual classes. The classifier was tested on the MNIST dataset, achieving an accuracy of 66.85% in the basic version and 81.31% in the version with region of interest (ROI) extraction and interpolation. The training process has a time complexity of O(n), and the evaluation of the classifier has a time complexity of O(1). The presented method is simple to implement and fast to train. The paper discusses potential directions for further research, including data augmentation and integration with deep learning models, such as convolutional networks, or other machine learning algorithms, e.g. k-nearest neighbors, which can significantly improve the effectiveness of the classifier.
Keywords: 
;  ;  ;  ;  

1. Introduction

Traditional image analysis methods required a complicated process of manual feature extraction. Thanks to convolutional networks, we got rid of this problem. They have revolutionized image recognition tasks in many key aspects, which has had a huge impact on the development of artificial intelligence. They are adapted to process large sets of visual data, are sufficiently resistant to translations or noise. CNNs have found applications not only in image recognition, but also in other fields, such as sound analysis, text analysis, genomics, as well as in new technologies such as speech recognition, video analysis or emotion recognition based on facial expressions.
One of the first deep convolutional networks is LeNet (LeCun et al., 1998). It was successful in the task of recognizing handwritten digits. It was a pioneer in the application of CNNs to image recognition; despite its limited computational resources and small number of parameters, it provided a foundation for further research on deep convolutional networks. The AlexNet architecture (Krizhevsky et al., 2017) was also a breakthrough achievement that contributed to the rapid growth of popularity of convolutional networks. This architecture was presented in the context of the ImageNet competition, where it achieved a revolutionary result, reducing the classification error by more than 10% compared to previous methods.
Alternative methods for classifying digits from the MNIST set have already been used, such as Variational AutoEncoder (Mak, Han & Yin, 2023), which achieved 85.4% accuracy, Neuromorphic Nanowire Networks with 97.8% accuracy (Zhu et al., 2021), or even the Cellular Automata mechanism for self-classifying digits (Levin et al., 2020). I propose to use the simplest heatmap in the form of averaging the values of all training data, and for classification purposes calculating how similar the image is to that map. The results of an exemplary implementation for recognizing digits from the MNIST set (LeCun, Cortes & Burges, 1994), which contains 60,000 training images and 10,000 testing images, were evaluated.

2. Classifier Design

2.1. Structure

To correctly construct the classifier, a heatmap must be initialized for each class i as a zero matrix of dimensions n × m .
X i = 0 0 0 0 0 0 0 0 0 where X i R n × m
After training the classifier on the selected samples, each matrix should already have specific values.
X i = x 11 x 12 x 1 m x 21 x 22 x 2 m x n 1 x n 2 x n m where X i R n × m
Next, to determine which class assign the tested sample to, you need to evaluate how well it fits each class. This means calculating the absolute value of the deviation of the image p from the heatmap x, so for each pixel j, we calculate
P j = p j x j
We calculate the K index (assuming that 255 is the maximum pixel value) to determine how well the image fits the selected class
K i = j = 1 n m ( 255 P j ) 255 n m
Choose the one with the highest K index, without using additional activation functions. It’s index is the result of which class should be assigned to an image.

2.2. Training

To update the zero matrix, we should calculate the average over all training samples S, where s is the number of such samples
X i = j = 1 s S j s
We update the value of the selected heatmap, preserving a number representing the total number of samples A it was trained on.
X i X i * A + j = 1 s S j A + s

2.3. Time Complexity

Assuming that the number of classifier samples is finite, all samples are known at the beginning and the amount of pixels is constant, training the classifier is achievable in O ( n ) , knowing n is the number of images. The evaluation of the classifier results will always be O ( 1 ) , because the number of classes and heatmaps is constant.
For example, feedforwarding of one layer of a convolutional neural network has time complexity O ( C i n C o u t K 2 ) , assuming that C i n is the number of input channels, C o u t is the number of output channels and K is the filter size, which makes the time complexity of classifier evaluation much superior.

3. Classifier Variants

After training the baseline model on 60,000 samples from the MNIST dataset, it achieved just 66.85% correct matches on 10,000 test samples. Nevertheless, this is a very impressive result for a model that is not a convolutional network or even any kind of deep network.
In this paper, different variants will be discussed in order to obtain the best possible results.
Figure 1. Trained heatmaps of the standard variant
Figure 1. Trained heatmaps of the standard variant
Preprints 145859 g001

3.1. No Zeros Variant

This model assumes that only pixels whose value is greater than zero will be used for heatmap update and classifier evaluation. Assuming that k is the number of pixels such that p j > 0 , the index K should be normalized as follows
K i = j = 0 p j > 0 n m 255 p j x j 255 k
Figure 2. Trained heatmaps of the no zeroes variant
Figure 2. Trained heatmaps of the no zeroes variant
Preprints 145859 g002

3.2. Adjusting Variant

Adjusting variant involves cutting out the region of interest from the sample and then scaling it to constant dimensions. By region of interest, we mean the smallest possible rectangle bounded by non-zero pixel values most prominent in each direction.
Figure 3. The process of extracting ROI and interpolating
Figure 3. The process of extracting ROI and interpolating
Preprints 145859 g003
Then, the cut sample fragment must be resized to fixed dimensions. In the case of this work, nearest neighbor interpolation was implemented due to its simplicity and computational speed, and the heatmaps have fixed dimensions of 32x32 which were determined empirically.
Figure 4. Trained heatmaps of the adjusting variant
Figure 4. Trained heatmaps of the adjusting variant
Preprints 145859 g004

3.3. Adjusting Variant with Exponential K Index

To emphasize the importance of matching, we can exponentially increase the difference in error between the sample pixel and the heatmap.
K i = j = 1 n m ( 255 k P j k ) 255 k n m
Increasing the difference in deviation allows to give a larger impact factor to pixels that have the same value and to give a much smaller impact factor to pixels with extreme values. In this work, an empirical value of k = 3 was found which gave measurably good results.

3.3.1. Simple Enhancement

Table 1 in the Adjusting enhanced section presents the results for the variant in which all heatmap pixel values were increased by 10%.
X i max ( 1.1 X i , 255 )
Table 1. Results of MNIST classification.
Table 1. Results of MNIST classification.
Accuracy in testing data Accuracy in training data
Standard variant 66.85% 65.1 3 ¯ %
No zeros variant 77.17% 76.195%
Adjusting standard 77.74% 76.20 3 ¯ %
Adjusting exponential K 80.78% 79.77%
Adjusting enhanced 81.31% 80.20 3 ¯ %

4. Results

Each classifier was trained on the same samples from the MNIST dataset.
The difference in results between the standard model and the no zeros variant is primarily due to the amount of information stored in the heatmaps. Comparing the heatmap visualizations (Figure 1 and 2), you can see that no zeroes variant’s heatmaps are more complete and have a greater ability to match similarly shaped digits.
The adjusting variant allows to achieve similar accuracy as the no zeros variant, however it has an important advantage - universality. By interpolating the cropped to the region of interest samples we get a classifier that preserves the most important information about a given object such as shape or saturation of individual pixels and normalizes them to a common resolution. This solution could potentially expand classifier’s domain to other dataset, e.g. DIDA (Kusetogullari, Yavariabdi, Hall, & Lavesson, 2021).

5. Discussion and Conclusions

Adjusting variant achieves 81.31% accuracy, which translates into an error of 18.69%. This is an impressive result considering the training time, time complexity, space complexity and simplicity of implementation. I believe that this classifier has great potential and could achieve much higher accuracy if properly combined with convolutional networks. Combining several of the techniques mentioned in the paper in a different way can also positively influence the result. Unlike a neural network, this architecture allows us to completely bypass problems related to the gradient, computational efficiency and, above all, problems with the interpretation of results.
As a research direction in this area, I propose to analyze the results of training with data augmentation such as shearing, skewing, rotating or edge enhancement. In addition, classifiers on heatmaps can be combined with other machine learning methods - for example, the basic k-Nearest Neighbor algorithm achieved 96.91% accuracy on the MNIST dataset (Yepdjio Nkouanga & Vajda, 2023), so combining it appropriately with heatmaps classifiers could lead to better results.

Funding

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

Not applicable.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. An, S.; Lee, M.; Park, S.; Yang, H.; So, J. An ensemble of simple convolutional neural network models for MNIST digit recognition. arXiv 2020. [Google Scholar] [CrossRef]
  2. Krizhevsky, A.; Sutskever, I.; Hinton, G. E. ImageNet classification with deep convolutional neural networks. Communications of the ACM 2017, 60(6), 84–90. [Google Scholar] [CrossRef]
  3. Kusetogullari, H.; Yavariabdi, A.; Hall, J.; Lavesson, N. DIGITNET: A deep handwritten digit detection and recognition method using a new historical handwritten digit dataset. Big Data Research 2020, 100182. [Google Scholar] [CrossRef]
  4. Kusetogullari, H.; Yavariabdi, A.; Hall, J.; Lavesson, N. DIDA: The largest historical handwritten digit dataset with 250k digits. June 2021. Available online: https://github.com/didadataset/DIDA/.
  5. Levin, M.; Greydanus, S.; Niklasson, E.; Randazzo, E.; Mordvintsev, A. Self-classifying MNIST digits. Distill 2020, 5. [Google Scholar] [CrossRef]
  6. LeCun, Y.; Cortes, C.; Burges, C. J. MNIST handwritten digit database. AT&T Labs. 1994. Available online: http://yann.lecun.com/exdb/mnist.
  7. Mak, H. W. L.; Han, R.; Yin, H. H. F. Application of variational autoencoder (VAE) model and image processing approaches in game design. Sensors 2023, 23(7), 3457. [Google Scholar] [CrossRef]
  8. Yepdjio Nkouanga, H.; Vajda, S. Optimization strategies for the k-nearest neighbor classifier. SN Computer Science 2023, 4(47). [Google Scholar] [CrossRef]
  9. Zhu, R.; Loeffler, A.; Hochstetter, J.; Diaz-Alvarez, A.; Nakayama, T.; Stieg, A.; Gimzewski, J.; Lizier, J. T.; Kuncic, Z. MNIST classification using neuromorphic nanowire networks. International Conference on Neuromorphic Systems 2021. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2025 MDPI (Basel, Switzerland) unless otherwise stated