Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Evidence-Based Regularization for Neural Networks

Version 1 : Received: 14 September 2022 / Approved: 15 September 2022 / Online: 15 September 2022 (13:06:13 CEST)

A peer-reviewed article of this Preprint also exists.

Nuti, G.; Cross, A.-I.; Rindler, P. Evidence-Based Regularization for Neural Networks. Mach. Learn. Knowl. Extr. 2022, 4, 1011-1023. Nuti, G.; Cross, A.-I.; Rindler, P. Evidence-Based Regularization for Neural Networks. Mach. Learn. Knowl. Extr. 2022, 4, 1011-1023.

Abstract

Numerous approaches address over-fitting in neural networks: by imposing a penalty on the parameters of the network (L1, L2, etc); by changing the network stochastically (drop-out, Gaussian noise, etc.); or by transforming the input data (batch normalization, etc.). In contrast, we aim to ensure that a minimum amount of supporting evidence is present when fitting the model parameters to the training data. This, at the single neuron level, is equivalent to ensuring that both sides of the separating hyperplane (for a standard artificial neuron) have a minimum number of data points — noting that these points need not belong to the same class for the inner layers. We firstly benchmark the results of this approach on the standard Fashion-MINST dataset, comparing it to various regularization techniques. Interestingly, we note that by nudging each neuron to divide, at least in part, its input data, the resulting networks make use of each neuron, avoiding a hyperplane completely on one side of its input data (which is equivalent to a constant into the next layers). To illustrate this point, we study the prevalence of saturated nodes throughout training, showing that neurons are activated more frequently and earlier in training when using this regularization approach. A direct consequence of the improved neuron activation is that deep networks are now easier to train. This is crucially important when the network topology is not known a priori and fitting often remains stuck in a suboptimal local minima. We demonstrate this property by training a network of increasing depth (and constant width): most regularization approaches will result in increasingly frequent training failures (over different random seeds) whilst the proposed evidence-based regularization significantly outperforms in its ability to train deep networks.

Keywords

neural networks; regularization; deep networks

Subject

Computer Science and Mathematics, Probability and Statistics

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.