Preprint
Article

This version is not peer-reviewed.

On the Optimality of k=√n in k-Nearest Neighbor Classification: Sub-Optimality Rates, Dimension-Aware Selection, and Hassanat Distance Comparison

Submitted:

15 May 2026

Posted:

18 May 2026

You are already at the latest version

Abstract
The k-nearest neighbor (KNN) algorithm remains one of the most fundamental and widely-used methods in machine learning. A common rule of thumb sets the number of neighbors as k = √n, where n is the size of the training set. Despite its widespread adoption, the theoretical justification for this choice has remained obscure. We provide a comprehensive rate-based analysis. First, we derive the minimax-optimal exponent β★ = 4/(d+4) for k = n^β under standard Hölder-smoothness assumptions, recovering as a special case Theorem 1: k = √n is minimax-optimal if and only if the feature space has dimensionality d = 4. Second, Theorem 2 quantifies the sub-optimality of any fixed β as R_n(n^β) = Θ(n^(−r(β,d))) with r(β,d) = min{β, 4(1−β)/d}, yielding an asymmetric penalty for the classical rule when d ≠ 4 that we make precise in Corollary 1. The predicted rate is empirically verified across d ∈ {2,…,20} on controlled synthetic data. On 48 datasets from the OpenML-CC18 benchmark suite, the dimension-aware rule k = ⌊n^(4/(d+4))⌋ outperforms the classical √n rule in 32 of 48 head-to-head comparisons (paired Wilcoxon p = 4.6 × 10⁻⁴, mean accuracy gain +2.5 percentage points), demonstrating that the theoretical improvement translates to a practical one. We further test the Hassanat distance metric against Euclidean across all KNN variants on the same 48 datasets, finding that Hassanat outperforms Euclidean in five of six configurations (paired Wilcoxon p < 0.05), with the largest gains on unstandardized data. Cross-validation remains the strongest k-selection strategy when computationally feasible, and the theoretical results provide a principled non-cross-validated alternative.
Keywords: 
;  ;  ;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated