The convergence characteristics of an online gradient learning technique for training Pi–Sigma higher-order neural networks under a smoothed L1 regularization framework with adaptive momentum are examined in this research. Although Pi-Sigma networks can effectively represent high-order nonlinear interactions, nonconvexity, parameter coupling, and noise sensitivity in online contexts make training them difficult. In order to overcome these problems, we avoid the nondifferentiability of the traditional L1 norm while promoting sparsity by using a differentiable approximation of the L1 penalty. Furthermore, an adaptive momentum term is added to speed up convergence and stabilize weight updates. We construct important lemmas that describe the behavior of the smoothed regularizer and momentum dynamics, make modest assumptions on the activation functions, data sequence, and learning parameters, and create a unified mathematical model for the suggested learning rule. These findings allow us to demonstrate that the online approach guarantees the weight sequence's boundedness, the gradients' convergence to zero, and the corresponding energy function's monotonic reduction. In comparison to existing models, numerical studies show that the suggested approach produces stable convergence, enhanced sparsity, and decreased gradient oscillation. Empirical plots of loss evolution, gradient norms, and weight norms are used to validate the theoretical results. The robustness of the suggested learning framework and its applicability for nonlinear function approximation and online learning tasks in higher-order neural networks are highlighted by the agreement between theoretical results and actual data.