Understanding how gradient descent shapes neural network representations remains a fun-damental challenge in deep learning theory. Recent work has revealed that neural networks behave as “racing” systems: neurons compete to align with task-relevant directions, and those that succeed experience exponential norm growth. However, the geometric principles govern-ing this race—particularly when data lies on low-dimensional manifolds and networks employ adaptive normalization—remain poorly understood. This paper establishes a mathematical framework that unifies and extends these insights. We prove three fundamental theorems: (1) neuron weight vectors converge exponentially to the tangent space of the data manifold, with a rate determined by local curvature and gating dynamics; (2) for rotation-equivariant tasks, an angular momentum tensor is conserved under gradient flow, imposing topological constraints on neuronal rearrangements; (3) the distribution of high-norm “winning” neurons follows a von Mises-Fisher concentration on the manifold, with concentration parameter linked to initial angular variance. As a case study, we integrate Bayesian R-LayerNorm—a provably stable nor- malization method—into our framework, deriving a modified norm growth law that explains its empirical robustness on corrupted datasets. Together, these results provide a geometric foun-dation for understanding capacity adaptation, lottery tickets, and uncertainty-aware learning in neural networks.