On the Optimality of k=√n in k-Nearest Neighbor Classification: Sub-Optimality Rates, Dimension-Aware Selection, and Hassanat Distance Comparison

Ahmad B. Hassanat; Anas A. Alkasasbeh; Esra’a Alkafaween; Omar Lasassmeh; Khalid Almohammadi; Malek Alrashidi; Abdulkareem Alzahrani; Maha Alamri; Ahmad S. Tarawneh

doi:10.20944/preprints202605.1121.v1

Submitted:

15 May 2026

Posted:

18 May 2026

You are already at the latest version

Abstract

The k-nearest neighbor (KNN) algorithm remains one of the most fundamental and widely-used methods in machine learning. A common rule of thumb sets the number of neighbors as k = √n, where n is the size of the training set. Despite its widespread adoption, the theoretical justification for this choice has remained obscure. We provide a comprehensive rate-based analysis. First, we derive the minimax-optimal exponent β★ = 4/(d+4) for k = n^β under standard Hölder-smoothness assumptions, recovering as a special case Theorem 1: k = √n is minimax-optimal if and only if the feature space has dimensionality d = 4. Second, Theorem 2 quantifies the sub-optimality of any fixed β as R_n(n^β) = Θ(n^(−r(β,d))) with r(β,d) = min{β, 4(1−β)/d}, yielding an asymmetric penalty for the classical rule when d ≠ 4 that we make precise in Corollary 1. The predicted rate is empirically verified across d ∈ {2,…,20} on controlled synthetic data. On 48 datasets from the OpenML-CC18 benchmark suite, the dimension-aware rule k = ⌊n^(4/(d+4))⌋ outperforms the classical √n rule in 32 of 48 head-to-head comparisons (paired Wilcoxon p = 4.6 × 10⁻⁴, mean accuracy gain +2.5 percentage points), demonstrating that the theoretical improvement translates to a practical one. We further test the Hassanat distance metric against Euclidean across all KNN variants on the same 48 datasets, finding that Hassanat outperforms Euclidean in five of six configurations (paired Wilcoxon p < 0.05), with the largest gains on unstandardized data. Cross-validation remains the strongest k-selection strategy when computationally feasible, and the theoretical results provide a principled non-cross-validated alternative.

Keywords:

k-nearest neighbors

;

optimal k selection

;

minimax rate

;

sub-optimality rates

;

Hassanat distance

;

bias–variance tradeoff

;

nonparametric classification

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

On the Optimality of k=√n in k-Nearest Neighbor Classification: Sub-Optimality Rates, Dimension-Aware Selection, and Hassanat Distance Comparison

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe