Geometric Foundations of Racing Dynamics: How Gradient Descent Adapts Network Capacity on Data Manifolds with Application to Bayesian R-LayerNorm

Mohsen Mostafa

doi:10.20944/preprints202603.0760.v1

Submitted:

09 March 2026

Posted:

12 March 2026

You are already at the latest version

Abstract

Understanding how gradient descent shapes neural network representations remains a fun-damental challenge in deep learning theory. Recent work has revealed that neural networks behave as “racing” systems: neurons compete to align with task-relevant directions, and those that succeed experience exponential norm growth. However, the geometric principles govern-ing this race—particularly when data lies on low-dimensional manifolds and networks employ adaptive normalization—remain poorly understood. This paper establishes a mathematical framework that unifies and extends these insights. We prove three fundamental theorems: (1) neuron weight vectors converge exponentially to the tangent space of the data manifold, with a rate determined by local curvature and gating dynamics; (2) for rotation-equivariant tasks, an angular momentum tensor is conserved under gradient flow, imposing topological constraints on neuronal rearrangements; (3) the distribution of high-norm “winning” neurons follows a von Mises-Fisher concentration on the manifold, with concentration parameter linked to initial angular variance. As a case study, we integrate Bayesian R-LayerNorm—a provably stable nor- malization method—into our framework, deriving a modified norm growth law that explains its empirical robustness on corrupted datasets. Together, these results provide a geometric foun-dation for understanding capacity adaptation, lottery tickets, and uncertainty-aware learning in neural networks.

Keywords:

racing dynamics

;

lottery ticket hypothesis

;

gradient descent

;

neural network theory

;

geometric deep learning

;

manifold hypothesis

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Geometric Foundations of Racing Dynamics: How Gradient Descent Adapts Network Capacity on Data Manifolds with Application to Bayesian R-LayerNorm

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe