Automated cephalometric landmark detection using deep learning has the potential to transform routine orthodontic diagnosis. However, the clinical relevance of AI localization accuracy depends critically on how detection errors propagate into derived angular measurements and skeletal classifications. This study presents a systematic clinical validation of 14 YOLO-based model configurations, evaluating the effects of architecture (YOLOv5/YOLOv11), bounding box size (40-150 px), dataset scale (235-4255 images) and training duration on landmark detection accuracy with specific focus on the four clinically critical landmarks that define the ANB angle: Sella (S), Nasion (N), A-point (A) and B-point (B). The best-performing model (YOLOv11s, 40×40 px bounding box, 4255 training images) achieved a mean radial error of 3.10 ± 1.00 mm and a Successful Detection Rate of 87.2% at the 4 mm threshold for S, N, A, and B. Despite this error magnitude, ANB-based skeletal classification demonstrated 96.9% concordance with expert assessments (95% bootstrap CI: 93.8–99.2%, n = 130 classifications), with all discordances confined to borderline cases within 1◦ of diagnostic thresholds. Notably, the localization accuracy achieved by the best AI models falls within the inter-operator variability range reported for experienced human clinicians (1.5–3.5 mm), indicating that the AI system has reached a threshold of clinical equivalence for skeletal classification purposes. Bounding box size emerged as the single most influential hyperparameter, with a 3.4-fold increase in mean radial error from 40×40 to 150×150 px configurations. These findings support the clinical deployment of YOLO-based AI systems for automated ANB-based skeletal classification, while highlighting the need for human oversight in borderline cases.