Preprint Article Version 2 Preserved in Portico This version is not peer-reviewed

Accelerating Symmetric Rank-1 Quasi-Newton Method with Nesterov’s Gradient for Training Neural Networks

Version 1 : Received: 2 December 2021 / Approved: 7 December 2021 / Online: 7 December 2021 (10:58:43 CET)
Version 2 : Received: 7 December 2021 / Approved: 8 December 2021 / Online: 8 December 2021 (17:51:54 CET)

A peer-reviewed article of this Preprint also exists.

Indrapriyadarsini, S.; Mahboubi, S.; Ninomiya, H.; Kamio, T.; Asai, H. Accelerating Symmetric Rank-1 Quasi-Newton Method with Nesterov’s Gradient for Training Neural Networks. Algorithms 2022, 15, 6. Indrapriyadarsini, S.; Mahboubi, S.; Ninomiya, H.; Kamio, T.; Asai, H. Accelerating Symmetric Rank-1 Quasi-Newton Method with Nesterov’s Gradient for Training Neural Networks. Algorithms 2022, 15, 6.

Journal reference: Algorithms 2021, 15, 6
DOI: 10.3390/a15010006

Abstract

Gradient based methods are popularly used in training neural networks and can be broadly categorized into first and second order methods. Second order methods have shown to have better convergence compared to first order methods, especially in solving highly nonlinear problems. The BFGS quasi-Newton method is the most commonly studied second order method for neural network training. Recent methods have shown to speed up the convergence of the BFGS method using the Nesterov’s acclerated gradient and momentum terms. The SR1 quasi-Newton method though less commonly used in training neural networks, are known to have interesting properties and provide good Hessian approximations when used with a trust-region approach. Thus, this paper aims to investigate accelerating the Symmetric Rank-1 (SR1) quasi-Newton method with the Nesterov’s gradient for training neural networks and briefly discuss its convergence. The performance of the proposed method is evaluated on a function approximation and image classification problem.

Keywords

Neural networks; quasi-Newton; symmetric rank-1; Nesterov’s accelerated gradient; limited memory; trust-region

Subject

MATHEMATICS & COMPUTER SCIENCE, Numerical Analysis & Optimization

Comments (1)

Comment 1
Received: 8 December 2021
Commenter: S. Indrapriyadarsini
Commenter's Conflict of Interests: Author
Comment: Revised version including the results sections
+ Respond to this comment

We encourage comments and feedback from a broad range of readers. See criteria for comments and our diversity statement.

Leave a public comment
Send a private comment to the author(s)
Views 0
Downloads 0
Comments 1
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.