Submitted:
20 October 2025
Posted:
21 October 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Related Work
3. The Step Nearest Neighbour Algorithm
3.1. Pseudo-Code
- For each data row dr in the dataset:
- 2.
- Determine the row dr is correctly classified:
- 3.
- If row dr is not correctly classified:
- 4.
- When completed, generate closest row clusters for each data row and check if the categories match.
4. Testing
4.1. Test Results
5. Conclusions
References
- Brownlee, J. (2019). 10 Standard Datasets for Practicing Applied Machine Learning, https://machinelearningmastery.com/standard-machine-learning-datasets/.
- Chawla, N.V. , Bowyer, K.W., Hall, L.O. and Kegelmeyer, W. P. (2011), SMOTE: Synthetic Minority Over-sampling Technique, Journal of Artificial Intelligence Research, 16:321–357.
- Ertoz, L., Steinbach, M. and Kumar, V. (2003). Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In Proceedings of the 2003 SIAM international conference on data mining (pp. 47-58). Society for Industrial and Applied Mathematics.
- Fix, E. and Hodges, J.L. (1951). Discriminatory Analysis. Nonparametric Discrimination: Consistency Properties (PDF) (Report). USAF School of Aviation Medicine, Randolph Field, Texas.
- Goodfellow, I.J. , Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A. and Bengio, Y. Generative adversarial nets. Advances in neural information processing systems, 2014, 27.
- Greer, K. (2020). A Pattern-Hierarchy Classifier for Reduced Teaching, WSEAS Transactions on Computers, ISSN / E-ISSN: 1109-2750 / 2224-2872, Volume 19, 2020, Art. #23, pp. 183-193.
- Halder, R.K. , Uddin, M.N., Uddin, M.A., Aryal, S. and Khraisat, A. (2024). Enhancing K-nearest neighbor algorithm: a comprehensive review and performance analysis of modifications. Journal of Big Data, 11(1), p.113.
- Jain A.K. and Dubes, R.C. (1988). Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs, New Jersey, March 1988.
- Jarvis, R.A. and E. A. Patrick, E.A. (1973). Clustering using a similarity measure based on shared nearest neighbors, IEEE Transactions on Computers, C-22(11).
- UCI Machine Learning Repository (2019). http://archive.ics.uci.edu/ml/.
| Dataset | Percentage Correct | ||
|---|---|---|---|
| Step NN Train 80% | Step NN Test 20% | K-NN Train/Test 20% | |
| Iris | 100 | 95.7 | 94.2 |
| Wine | 100 | 95.6 | 94.4 |
| Abalone | 100 | 49.8 | 47.9 |
| Hayes-Roth | 100 | 65.5 | 36.7 |
| Liver | 100 | 61.9 | 56.1 |
| Cleveland | 100 | 53.1 | 48.6 |
| Breast | 100 | 95.4 | 95.6 |
| Sonar | 100 | 86.8 | 67 |
| Wheat | 100 | 93.3 | 90.8 |
| Car | 100 | 97.5 | 89.5 |
| Wine Quality | 100 | 63.1 | 48.6 |
| UM | 100 | 85 | 69.7 |
| Bank | 100 | 99.9 | 99.3 |
| SPECT | 99.3 | 93.8 | 82.5 |
| Letters | 100 | 96.2 | 85.9 |
| Monks-1 | 100 | 83.5 | 64 |
| Solar | 100 | 100 | 100 |
| Diabetes | 100 | 69.9 | 69.8 |
| Ionosphere | 100 | 87.3 | 81.4 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).