Submitted:
20 April 2023
Posted:
20 April 2023
You are already at the latest version
Abstract
Keywords:
Notations
| weight | |
| learning rate | |
| loss function | |
| gradient | |
| weight decay parameter, regularization factor | |
| momentum | |
| sum of gradients | |
| exponential moving average of | |
| horizontal direction converging, exponential moving average of | |
| running average , where is a decay rate parameter | |
| schedule multiplier | |
| immediate discount factor | |
| momentum buffer’s discount factor | |
| moments | |
| variance | |
| variance rectification | |
| DiffGrad friction coefficient (DFC) | |
| Hessian matrix | |
| inverse BFGS Hessian approximation | |
| curvature pairs | |
| Hessian diagonal matrix | |
| Hessian diagonal matrix with momentum | |
| Riemannian manifold with n-dimensional topological space and metric g | |
| ∇ | affine connection, gradient |
| tangent bundle | |
| proximity function | |
| Bregman |
1. Introduction
2. First Order Optimization Algorithms
2.1. SGD-Type Algorithms
2.2. Adam-Type Algorithms
2.3. Positive-Negative Momentum
3. Second Order Optimization Algorithms
3.1. Newton Algorithms
3.2. Quasi-Newton Algorithms
4. Information-Geometric Optimization Methods
4.1. Natural Gradient Descent
4.2. Mirror Descent
5. Application of Optimization Methods in Modern Neural Networks
6. Challenges and Potential Research
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Shorten, C.; Khoshgoftaar, T.M. A survey on Image Data Augmentation for Deep Learning. J Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
- Qian, K.; Pawar, A.; Liao, A. et al. Modeling neuron growth using isogeometric collocation based phase field method. Sci Rep 2022, 12, 8120. [Google Scholar] [CrossRef] [PubMed]
- Liu, Y.; Shi, Y.; Mu, F.; Cheng, J.; Li, C.; Chen, X. Multimodal MRI Volumetric Data Fusion With Convolutional Neural Networks. IEEE Transactions on Instrumentation and Measurement 2022, 71, 1–15. [Google Scholar] [CrossRef]
- Li, Q.; Xiong, D.; Shang, M. Adjusted stochastic gradient descent for latent factor analysis. Information Sciences 2022, 588, 196–213. [Google Scholar] [CrossRef]
- Dogo, E. M.; Afolabi, O.J.; Nwulu, N.I.; Twala, B.; Aigbavboa, C.O. A Comparative Analysis of Gradient Descent-Based Optimization Algorithms on Convolutional Neural Networks. 2018 International Conference on Computational Techniques, Electronics and Mechanical Systems (CTEMS), Belgaum, India, 2018, pp. 92-99. [CrossRef]
- Ward, R.; Wu, X.; Bottou, L. AdaGrad stepsizes: sharp convergence over nonconvex landscapes. The Journal of Machine Learning Research 2020, 21, 9047–9076. [Google Scholar]
- Xu, D.; Zhang, S.; Zhang, H.; Mandic, D.P. Convergence of the RMSProp deep learning method with penalty for nonconvex optimization. Neural Networks 2021, 139, 17–23. [Google Scholar] [CrossRef] [PubMed]
- Zeiler, M.D. Adadelta: an adaptive learning rate method. aeXiv 2012, arXiv:1212.5701. [Google Scholar]
- Singarimbun, R. N; Nababan, E.B.; Sitompul, O.S. Adaptive Moment Estimation To Minimize Square Error In Backpropagation Algorithm.2019 International Conference of Computer Science and Information Technology (ICoSNIKOM), Medan, Indonesia, 2019, pp. 1-7. [CrossRef]
- Seredynski, F.; Zomaya, A.Y.; Bouvry, P. Function Optimization with Coevolutionary Algorithms. Intelligent Information Processing and Web Mining. Advances in Soft Computing 2003, 22, 13–22. [Google Scholar]
- Osowski, S.; Bojarczak, P.; M. Stodolskia, M. Fast Second Order Learning Algorithm for Feedforward Multilayer Neural Networks and its Applications. Neural Networks 1996, 9, 1583–1596. [Google Scholar] [CrossRef]
- Tyagi, K.; Rane, C.; Irie, B. et al. Multistage Newton’s Approach for Training Radial Basis Function Neural Networks. SN COMPUT. SCI. 2021, 2, 366. [Google Scholar] [CrossRef]
- Likas, A.; Stafylopatis, A. Training the random neural network using quasi-Newton methods. European Journal of Operational Research 2000, 126, 331–339. [Google Scholar] [CrossRef]
- Arbel, M.; Korba, A.; Salim, A.; Gretton, A. Maximum Mean Discrepancy Gradient Flow. arXiv 2019, arXiv:1906.04370. [Google Scholar]
- Ay, N.; Jost, N.J.; Lê, H.V.; Schwachhöfe, L. Information Geometry; Springer: Berlin, Heidelberg, Germany, 2008. [Google Scholar]
- Gattone, S.A.; Sanctis, A.D.; Russo, T.; Pulcini, D. A shape distance based on the Fisher–Rao metric and its application for shapes clustering. Physica A: Statistical Mechanics and its Applications 2017, 487, 93–102. [Google Scholar] [CrossRef]
- Hua, X.; Fan, H.; Cheng, Y.; Wang, H.; Qin, Y. Information Geometry for Radar Target Detection with Total Jensen–Bregman Divergence. Entropy 2018, 20, 256. [Google Scholar] [CrossRef] [PubMed]
- Osawa, K.; Tsuji, Y.; Ueno, Y.; A. Naruse, A.; Foo. C. -S; Yokota, R. Scalable and Practical Natural Gradient for Large-Scale Deep Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 2022, 44, 404–415. [Google Scholar] [CrossRef] [PubMed]
- Orabona, F.; Crammer, K.; Cesa-Bianchi, N. A generalized online mirror descent with applications to classification and regression. Mach Learn 2015, 99, 411–435. [Google Scholar] [CrossRef]
- Lu, L.; Pestourie, R.; Yao, W.; Wang, Z.; Verdugo, F.; Johnson, S.G. Physics-Informed Neural Networks with Hard Constraints for Inverse Design. SIAM Journal on Scientific Computing 2021, 43, 1105–1132. [Google Scholar] [CrossRef]
- Shi, C.; Tan, C.; Wang, T.; Wang, L. A Waste Classification Method Based on a Multilayer Hybrid Convolution Neural Network. Appl. Sci. 2021, 11, 8572. [Google Scholar] [CrossRef]
- Hacker, C.; Aizenberg, I.; Wilson, J. Gpu simulator of multilayer neural network based on multi-valued neurons. 2016 International Joint Conference on Neural Networks (IJCNN), 2016, 4125–4132. 4125. [Google Scholar] [CrossRef]
- Chen, S.; McLaughlin, S.; Mulgrew, B. Complex-valued radial basis function network, part i: Network architecture and learning algorithms. Signal Process. 1994, 35, 19–31. [Google Scholar] [CrossRef]
- Suzuki, Y.; Kobayashi, M. Complex-valued bidirectional auto-associative memory. he 2013 International Joint Conference on Neural Networks (IJCNN), Dallas, TX, USA, 2013, pp. 1-7. [CrossRef]
- Traore, C; Pauwels, E. Sequential convergence of AdaGrad algorithm for smooth convex optimization. Operations Research Letters 2021, 49, 452–458. [Google Scholar] [CrossRef]
- Dogo, E.M.; Afolabi, O.J.; Nwulu, N.I.; Twala, B.; Aigbavboa, C.O. A Comparative Analysis of Gradient Descent-Based Optimization Algorithms on Convolutional Neural Networks. 2018 International Conference on Computational Techniques, Electronics and Mechanical Systems (CTEMS), Belgaum, India, 2018, pp. 92-99. [CrossRef]
- Gu, P.; Tian, S.; Chen, Y. Iterative Learning Control Based on Nesterov Accelerated Gradient Method. IEEE Access 2019, 7, 115836–115842. [Google Scholar] [CrossRef]
- Van Laarhoven, T. L2 Regularization versus Batch and Weight Normalization. arXiv 2017, arXiv:1706.05350. [Google Scholar]
- Byrd, J.; Lipton, Z.C. What is the Effect of Importance Weighting in Deep Learning? Proceedings of the 36th International Conference on Machine Learning, PMLR, 2019, 97, 872–881. [Google Scholar]
- Vrbančič, G.; Podgorelec, V. Efficient ensemble for image-based identification of Pneumonia utilizing deep CNN and SGD with warm restarts. Expert Systems with Applications 2022, 187, 115834. [Google Scholar] [CrossRef]
- Heo, B.; Chun, S.; Oh, S.J.; Han, D.; Yun, S.; Kim, G.; Uh, Y.; Ha, J.-W. AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights. arXiv 2021, arXiv:2006.08217. [Google Scholar]
- Sun, J.; Yang, Y.; Xun, G.; Zhang, A. Scheduling Hyperparameters to Improve Generalization: From Centralized SGD to Asynchronous SGD. ACM Transactions on Knowledge Discovery from Data (accepted paper). [CrossRef]
- Wu, S. et al. "L1 -Norm Batch Normalization for Efficient Training of Deep Neural Networks. IEEE Transactions on Neural Networks and Learning Systems 2019, 30, 2043–2051. [Google Scholar] [CrossRef] [PubMed]
- Yu, Z.; Sun, G.; Lv, J. A fractional-order momentum optimization approach of deep neural networks. Neural Comput and Applic 2022, 34, 7091–7111. [Google Scholar] [CrossRef]
- Gokcesu, K.; Gokcesu, H. Regret Analysis of Global Optimization in Univariate Functions with Lipschitz Derivatives. arXiv 2021, arXiv:2108.10859. [Google Scholar]
- Gower, R.M.; Loizou, N.; Qian, X.; Sailanbayev, A.; Shulgin, E.; Richtárik, P. SGD: General Analysis and Improved Rates. Proceedings of Machine Learning Research 2019, 97, 5200–5209. [Google Scholar]
- Mukkamala, M.C.; Hein, M. Variants of RMSProp and Adagrad with Logarithmic Regret Bounds. Proceedings of Machine Learning Research 2017, 70, 2545–2553. [Google Scholar]
- Wang, G.; Lu, S.; Tu, W.; Zhang, L. Sadam: A variant of adam for strongly convex functions. arXiv 2019, arXiv:1905.02957. [Google Scholar]
- Kingma, D.P. ; Ba. J. Adam: A Method for Stochastic Optimization. arXiv 2015, arXiv:1412.6980. [Google Scholar]
- Loshchilov, I.; Hutter, F. Decoupled weight decay regularization. arXiv 2017, arXiv:1711.05101. [Google Scholar]
- <, *!!! REPLACE !!!*; i>, *!!! REPLACE !!!*; Kalfaoglu, M.E.; Kalkan, S.; Alatan, A.A. (2020). Late Temporal Modeling in 3D CNN Architectures with BERT for Action Recognition. Computer Vision – ECCV 2020 Workshops. ECCV 2020. Lecture Notes in Computer Science, 2020, 12539, 731–747. ,. [CrossRef]
- Herrera-Alcántara, O. Fractional Derivative Gradient-Based Optimizers for Neural Networks and Human Activity Recognition. Appl. Sci. 2022, 12, 9264. [Google Scholar] [CrossRef]
- Jia, X.; Feng, X.; Yong, H.; Meng, D. Weight Decay With Tailored Adam on Scale-Invariant Weights for Better Generalization. IEEE Transactions on Neural Networks and Learning Systems 2022, 1–12. [Google Scholar] [CrossRef]
- Heo, B.; Chun, S.; Oh, S.J.; Han, D.; Yun, S.; Kim, G.; Uh, Y.; Ha, J.-W. AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights. arXiv 2021, arXiv:2006.08217v3. [Google Scholar]
- Ma, J.; Yarats, D. Quasi-hyperbolic momentum and Adam for deep learning. arXiv 2019, arXiv:1810.06801v4. [Google Scholar]
- Tang, S.; Shen, C.; Wang, D.; Li, S.; Huang, W.; Zhu, Z. Adaptive deep feature learning network with Nesterov momentum and its application to rotating machinery fault diagnosis. Neurocomputing 2018, 305, 1–14. [Google Scholar] [CrossRef]
- Li, L.; Xu, W.; Yu, H. Character-level neural network model based on Nadam optimization and its application in clinical concept extraction. Neurocomputing 2020, 414, 182–190. [Google Scholar] [CrossRef]
- Melinte, D.O.; Vladareanu, L. Facial Expressions Recognition for Human–Robot Interaction Using Deep Convolutional Neural Networks with Rectified Adam Optimizer. Sensors 2020, 20, 2393. [Google Scholar] [CrossRef]
- Gholamalinejad, H,; Khosravi, H. Whitened gradient descent, a new updating method for optimizers in deep neural networks. Journal of AI and Data Mining 2022, 10, 467–477.
- Shanthi, T.; Sabeenian, R.S. Modified Alexnet architecture for classification of diabetic retinopathy images. Computers and Electrical Engineering 2019, 76, 56–64. [Google Scholar] [CrossRef]
- Wu, Z.; Shen, C.; Van Den Hengel, A. Wider or Deeper: Revisiting the ResNet Model for Visual Recognition. Pattern Recognition 2019, 90, 119–133. [Google Scholar] [CrossRef]
- Das, D.; Santosh, K.C.; Pal, U. Truncated inception net: COVID-19 outbreak screening using chest X-rays. Phys Eng Sci Med 2020, 43, 915–925. [Google Scholar] [CrossRef] [PubMed]
- Tang, P.; Wang, H.; Kwong, S. G-MS2F: GoogLeNet based multi-stage feature fusion of deep CNN for scene recognition. Neurocomputing 2017, 225, 188–197. [Google Scholar] [CrossRef]
- Lin, L.; Liang, L.; Jin, L. R2-ResNeXt: A ResNeXt-Based Regression Model with Relative Ranking for Facial Beauty Prediction. 2018 24th International Conference on Pattern Recognition (ICPR), pp. 85-90, 2018. [CrossRef]
- Dubey, S.R.; Chakraborty, S.; Roy,S.K.; Mukherjee, S.; Singh, S.K.; Chaudhuri, B.B. "diffGrad: An Optimization Method for Convolutional Neural Networks. IEEE Transactions on Neural Networks and Learning Systems 2020, 31, 4500–4511. [CrossRef]
- Panait, L.; Luke, S. A comparison of two competitive fitness functions. GECCO’02: Proceedings of the 4th Annual Conference on Genetic and Evolutionary ComputationJuly 2002, pp. 503–511.
- Khan, W.; Ali, S.; Muhammad, U.S.K.; Jawad, M.; Ali, M.; Nawaz, R. AdaDiffGrad: An Adaptive Batch Size Implementation Technique for DiffGrad Optimization Method. 2020 14th International Conference on Innovations in Information Technology (IIT), Al Ain, United Arab Emirates, 2020, pp. 209-214. [CrossRef]
- Valova, I.; Harris, C.; Mai, T.; Gueorguieva, N. Optimization of Convolutional Neural Networks for Imbalanced Set Classification. Procedia Computer Science 2020, 176, 660–669. [Google Scholar] [CrossRef]
- Zaheer, M.; Reddi, S.; Sachan, D.; Kale, S.; Kumar, S. Adaptive Methods for Nonconvex Optimization. Advances in Neural Information Processing Systems 2018, 31. [Google Scholar]
- Zhuang, J.; Tang, T.; Ding, Y.; Tatikonda, S.C.; Dvornek, N. Papademetris, X.; Duncan, J. AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients. Advances in Neural Information Processing Systems 2020, 33.
- Liu, J.; Kong, J.; Xu, D.; Qi, M.; Lu, Y. Convergence analysis of AdaBound with relaxed bound functions for non-convex optimization. Neural Networks 2022, 145, 300–307. [Google Scholar]
- Wang, Y.; Liu, J.; Chang, X.; Wang, J.; Rodríguez, R.J. AB-FGSM: AdaBelief optimizer and FGSM-based approach to generate adversarial examples. Journal of Information Security and Applications 2022, 68, 103227. [Google Scholar] [CrossRef]
- Wang, Y.; Liu, J.; Chang, X. Generalizing Adversarial Examples by AdaBelief Optimizer. arXiv 2021, arXiv:2101.09930v1. [Google Scholar]
- Dubey, S.R.; Basha, S.H.S.; Singh, S.K.; Chaudhuri, B.B. AdaInject: Injection Based Adaptive Gradient Descent Optimizers for Convolutional Neural Networks. IEEE Transactions on Artificial Intelligence 2022, 1–10. [Google Scholar] [CrossRef]
- Li, G. A Memory Enhancement Adjustment Method Based on Stochastic Gradients. 2022 41st Chinese Control Conference (CCC), Hefei, China, 2022, pp. 7448-7453. [CrossRef]
- Xie, Z.; Yuan, L.; Zhu, Z.; Sugiyama, M. Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization. Proceedings of the 38th International Conference on Machine Learning, PMLR, 2021, 139, 11448-11458.
- Zavriev, S.; Kostyuk, F. Heavy-ball method in nonconvex optimization problems. Computational Mathematics and Modeling 1993, 4, 336–341. [Google Scholar] [CrossRef]
- Wright, L.; Demeure, N. Ranger21: a synergistic deep learning optimizer. arXiv 2021, arXiv:2106.13731v2. [Google Scholar]
- Xie, X.; Zhou, P.; Li, H.; Lin, Z.; Yan, S. Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models. arXiv 2022, arXiv:2208.06677v3. [Google Scholar]
- Burke, J.V.; Ferris, M.C. A Gauss—Newton method for convex composite optimization. Mathematical Programming 1995, 71, 179–194. [Google Scholar] [CrossRef]
- Berahas, A.S.; Bollapragada, R.; Nocedal, J. An investigation of Newton-Sketch and subsampled Newton methods. Optimization Methods and Software 2020, 35, 661–680. [Google Scholar] [CrossRef]
- Hartmann, W.M.; Hartwig, R.E. Computing the Moore–Penrose Inverse for the Covariance Matrix in Constrained Nonlinear Estimation. SIAM Journal on Optimization 1996, 6, 727–747. [Google Scholar] [CrossRef]
- Gupta, V.; Kadhe, S.; Courtade, T.; Mahoney, M.W.; Ramchandran, K. OverSketched Newton: Fast Convex Optimization for Serverless Systems. 2020 IEEE International Conference on Big Data (Big Data), 2020, 288-297. [CrossRef]
- Yang, Z. Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 2022, 206, 117719. [Google Scholar] [CrossRef]
- Faber, V.; Joubert, W.; Knill, E.; Manteuffel, T. Minimal Residual Method Stronger than Polynomial Preconditioning. SIAM Journal on Matrix Analysis and Applications 1996, 17, 707–729. [Google Scholar] [CrossRef]
- Jia, Z.; Ng, M.K. Structure Preserving Quaternion Generalized Minimal Residual Method. SIAM Journal on Matrix Analysis and Applications 2021, 42, 616–634. [Google Scholar] [CrossRef]
- Mang, A.; Biros, G. An Inexact Newton–Krylov Algorithm for Constrained Diffeomorphic Image Registration. SIAM Journal on Imaging Sciences 2015, 8, 1030–1069. [Google Scholar] [CrossRef] [PubMed]
- Hestenes, M.R.; Stiefel, E.L. Methods of conjugate gradients for solving linear systems. J. Research Nat. Bur. Standards 1952, 49, 409–436. [Google Scholar] [CrossRef]
- Fletcher, R.; Reeves, C. Function minimization by conjugate gradients. Comput. J 1964, 7, 149–154. [Google Scholar]
- Daniel, J.W. The conjugate gradient method for linear and nonlinear operator equations. SIAM J. Numer. Anal. 1967, 4, 10–26. [Google Scholar] [CrossRef]
- Polak, E.; Ribiere, G. Note sur la convergence de directions conjuge´es. Rev. Francaise Informat Recherche Opertionelle, 3e Ann´ee 16, 1969, 35–43.
- Polyak, B.T. The conjugate gradient method in extreme problems. USSR Comp. Math.Math. Phys. 1969, 9, 94–112. [Google Scholar] [CrossRef]
- Fletcher, R. Practical Methods of Optimization vol. 1: Unconstrained Optimization; John Wiley and Sons: New York, USA, 1987. [Google Scholar]
- Liu, Y.; Storey, C. Efficient generalized conjugate gradient algorithms. J. Optim. Theory Appl. 1991, 69, 129–137. [Google Scholar] [CrossRef]
- Dai, Y.H.; Yuan, Y. A nonlinear conjugate gradient method with a strong global convergence property. SIAM J. Optim. 1999, 10, 177–182. [Google Scholar] [CrossRef]
- Hager, W.W.; Zhang, H. A new conjugate gradient method with guaranteed descent and an efficient line search. SIAM J. Optim. 2005, 16, 170–192. [Google Scholar] [CrossRef]
- Dai, Y.-H. Convergence Properties of the BFGS Algoritm. SIAM Journal on Optimization 2002, 13, 693–701. [Google Scholar] [CrossRef]
- Liu, D.C.; Nocedal, J. On the limited memory BFGS method for large scale optimization. Mathematical Programming 1989, 45, 503–528. [Google Scholar] [CrossRef]
- Shi, H.-J. M.; Xie, Y.; Byrd, R.; Nocedal, J. A Noise-Tolerant Quasi-Newton Algorithm for Unconstrained Optimization. SIAM Journal on Optimization 2022, 32, 29–55. [Google Scholar]
- Byrd, R.H.; Khalfan, H.F.; Schnabel, R.B. Analysis of a Symmetric Rank-One Trust Region Method. SIAM Journal on Optimization 1996, 6, 1025–1039. [Google Scholar] [CrossRef]
- Rafati, J.; Marcia, R.F. Improving L-BFGS Initialization for Trust-Region Methods in Deep Learning. 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), 2018, 501-508. [CrossRef]
- Ma, X. Apollo: An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization. arXiv 2021, arXiv:2009.13586v6. [Google Scholar]
- Yao, Z.; Gholami, A.; Shen, S.; Mustafa, M.; Keutzer, K.; Mahoney, M. ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning. Proceedings of the AAAI Conference on Artificial Intelligence 2021, 35, 10665–10673. [Google Scholar] [CrossRef]
- Shen, J.; Wang, C.; Wang, X.; Wise, S.M. Second-order Convex Splitting Schemes for Gradient Flows with Ehrlich–Schwoebel Type Energy: Application to Thin Film Epitaxy. SIAM Journal on Numerical Analysis 2012, 50, 105–125. [Google Scholar] [CrossRef]
- Martens, J. New insights and perspectives on the natural gradient method. Journal of Machine Learning Research 2020, 21, 5776–5851. [Google Scholar]
- Amari, Si. Information geometry in optimization, machine learning and statistical inference. Front. Electr. Electron. Eng. China 2010, 5, 241–260. [Google Scholar] [CrossRef]
- Wang, S.; Teng, Y.; Perdikaris, P. Understanding and Mitigating Gradient Flow Pathologies in Physics-Informed Neural Networks. SIAM Journal on Scientific Computing 2021, 43, 3055–3081. [Google Scholar] [CrossRef]
- Nielsen, F. An Elementary Introduction to Information Geometry. Entropy 2020, 22, 1100. [Google Scholar] [CrossRef] [PubMed]
- Wald, A. Statistical decision functions. Ann. Math. Stat. 1949, 165–205. [Google Scholar] [CrossRef]
- Wald, A. Statistical Decision Functions; Wiley: Chichester, UK, 1950. [Google Scholar]
- Rattray, M.; Saad, D.; Amari, S. Natural Gradient Descent for OnLine Learning. Phys. Rev. Lett. 1998, 81, 5461–5464. [Google Scholar] [CrossRef]
- Duchi, J.C.; Agarwal, A.; Johansson, M.; Jordan, M.I. Ergodic Mirror Descent. SIAM Journal on Optimization 2012, 22, 1549–1578. [Google Scholar] [CrossRef]
- Wang, Y.; Li, W. Accelerated Information Gradient Flow. J. Sci. Comput. 2022, 90, 11. [Google Scholar] [CrossRef]
- Goldberger; Gordon; Greenspan. An efficient image similarity measure based on approximations of KL-divergence between two gaussian mixtures. Proceedings Ninth IEEE International Conference on Computer Vision 2003, 1, 487-493. [CrossRef]
- Joyce, J.M. Kullback-Leibler Divergence. In International Encyclopedia of Statistical Science; Lovric, M., Ed.; Springer: Berlin, Heidelberg, Germany, 2011. [Google Scholar]
- Nielsen, F. Statistical Divergences between Densities of Truncated Exponential Families with Nested Supports: Duo Bregman and Duo Jensen Divergences. Entropy 2022, 24, 421. [Google Scholar] [CrossRef]
- Stokes, J.; Izaac, J.; Killoran, N.; Carleo, G. Quantum Natural Gradient. Open journal for quantum science 2020, 4, 269–284. [Google Scholar] [CrossRef]
- Abdulkadirov, R.; Lyakhov, P.; Nagornov, N. Accelerating Extreme Search of Multidimensional Functions Based on Natural Gradient Descent with Dirichlet Distributions. Mathematics 2022, 10, 3556. [Google Scholar] [CrossRef]
- Abdulkadirov, R.I.; Lyakhov, P.A. A new approach to training neural networks using natural gradient descent with momentum based on Dirichlet distributions. Computer Optics 2023, 2023. 47, 160–170. [Google Scholar]
- Lyakhov, P,; Abdulkadirov, R. Accelerating Extreme Search Based on Natural Gradient Descent with Beta Distribution. 2021 International Conference Engineering and Telecommunication (En&T), Dolgoprudny, Russian Federation, 2021, pp. 1-5. [CrossRef]
- Abdulkadirov, R.I.; Lyakhov, P.A. Improving Extreme Search with Natural Gradient Descent Using Dirichlet Distribution. Mathematics and its Applications in New Computer Systems. MANCS 2021. Lecture Notes in Networks and Systems 2022, 424, 19–28. [Google Scholar]
- Kesten, H.; Morse, N. A Property of the Multinomial Distribution. The Annals of Mathematical Statistics 1959, 30, 120–127. [Google Scholar] [CrossRef]
- D’Orazio, R.; Loizou, N.; Laradji, I.; Mitliagkas, I. Stochastic Mirror Descent: Convergence Analysis and Adaptive Variants via the Mirror Stochastic Polyak Stepsize. arXiv 2021, arXiv:2110.15412v2. [Google Scholar]
- Gessert, N.; Nielsen, M.; Shaikh, M.; Werner, R.; Schlaefer, A. Skin lesion classification using ensembles of multi-resolution EfficientNets with meta data. MethodsX 2020, 7, 100864. [Google Scholar] [CrossRef] [PubMed]
- Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0. 5MB model size. arXiv 2016, arXiv:1602.07360v4. [Google Scholar]
- Ke, H.; Chen, D.; Li, X.; Tang, Y.; Shah, T.; Ranjan, R. Towards Brain Big Data Classification: Epileptic EEG Identification With a Lightweight VGGNet on Global MIC. IEEE Access 2018, 6, 14722–14733. [Google Scholar] [CrossRef]
- Zhu, Y.; Newsam, S. DenseNet for dense flow. 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 2017, pp. 790-794. [CrossRef]
- Chollet, F. Xception: Deep Learning With Depthwise Separable Convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 1251-1258. [CrossRef]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 6848-6856. [CrossRef]
- Paoletti, M.E.; Haut, J.M.; Pereira, N.S.; Plaza, J.; Plaza, A. Ghostnet for Hyperspectral Image Classification. IEEE Transactions on Geoscience and Remote Sensing 2021, 59, 10378–10393. [Google Scholar] [CrossRef]
- Liu, Y. Novel volatility forecasting using deep learning–Long Short Term Memory Recurrent Neural Networks. Expert Systems with Applications 2019, 132, 99–109. [Google Scholar] [CrossRef]
- Lai, CH.; Liu, DR.; Lien, KS. A hybrid of XGBoost and aspect-based review mining with attention neural network for user preference prediction. Int. J. Mach. Learn. and Cyber. 2021, 12, 1203–1217. [Google Scholar] [CrossRef]
- Sherstinsky, A. Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) network. Physica D: Nonlinear Phenomena 2020, 404, 132306. [Google Scholar] [CrossRef]
- Lynn, H.H.; Pan, S.B.; Kim, P. A Deep Bidirectional GRU Network Model for Biometric Electrocardiogram Classification Based on Recurrent Neural Networks. IEEE Access 2019, 7, 145395–145405. [Google Scholar] [CrossRef]
- Kim, T.Y.; Cho, S.B. Predicting residential energy consumption using CNN-LSTM neural networks. Energy 2019, 182, 72–81. [Google Scholar] [CrossRef]
- Sajjad, M. et al. A Novel CNN-GRU-Based Hybrid Approach for Short-Term Residential Load Forecasting. IEEE Access 2020, 8, 143759–143768. [Google Scholar]
- Hu, C.; Cheng, F.; Ma, L.; Li, B. State of Charge Estimation for Lithium-Ion Batteries Based on TCN-LSTM Neural Networks. Journal of The Electrochemical Society 2022, 169, 0305544. [Google Scholar] [CrossRef]
- Lu, L.; Jin, P.; Pang, G. et al. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nat Mach Intell 2021, 3, 218–229. [Google Scholar]
- Meng, X.; Karniadakis, G.T. A composite neural network that learns from multi-fidelity data: Application to function approximation and inverse PDE problems. Journal of Computational Physics 2020, 401, 109020. [Google Scholar] [CrossRef]
- Gao, C.; Lui, W.; Yang, X. Convolutional neural network and riemannian geometry hybrid approach for motor imagery classification. Neurocomputing 2022, 2022, 180–190. [Google Scholar] [CrossRef]
- Hosseini, M.S.; Tuli, M.; Plataniotis, K.N. Exploiting Explainable Metrics for Augmented SGD. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 10296-10306. [CrossRef]
- Singh, H.; Bhatt, P.; Jacob, S.; Kaur, A.; Vashist, A. ; Vij. D. Stock Prediction on Historical Data based on SGD and LSTM. 2022 2nd International Conference on Innovative Practices in Technology and Management (ICIPTM), 2022, pp. 200-204. [CrossRef]
- Mu, Z.; Tang, S.; Zong, C.; Yu, D.; Zhuang, Y. Graph neural networks meet with distributed graph partitioners and reconciliations. Neurocomputing 2023, 518, 408–417. [Google Scholar] [CrossRef]
- Li, J.; Chen, J.; Li, B. Gradient-optimized physics-informed neural networks (GOPINNs): a deep learning method for solving the complex modified KdV equation. Nonlinear Dyn 2022, 107, 781–792. [Google Scholar] [CrossRef]
- Volinski, A. Volinski, A.; Zaidel,Y.; Shalumov, A.; DeWolf, T.; Supic, L.; Tsur, E.E. Data-driven artificial and spiking neural networks for inverse kinematics in neurorobotics. Patterns 2022, 3, 100391. [Google Scholar] [CrossRef]
- Wang, R.; Liu, Z.; Zhang, B. et al. Few-Shot Learning with Complex-Valued Neural Networks and Dependable Learning. Int J Comput Vis 2023, 131, 385–404. [Google Scholar] [CrossRef]
- Chen, M.; Shi, X.; Zhang, Y.; Wu, D.; Guizani, M. Deep Feature Learning for Medical Image Analysis with Convolutional Autoencoder Neural Network. IEEE Transactions on Big Data 2021, 7, 750-758. IEEE Transactions on Big Data 2021, 7, 750–758. [Google Scholar] [CrossRef]
- Taqi, A.M.; Awad, A.; Al-Azzo, F.; Milanova, M. The Impact of Multi-Optimizers and Data Augmentation on TensorFlow Convolutional Neural Network Performance. 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), 2018, pp. 140-145. [CrossRef]
- Qu, Z.; Yuan, S.; Chi, R.; Chang, L.; Zhao, L. Genetic Optimization Method of Pantograph and Catenary Comprehensive Monitor Status Prediction Model Based on Adadelta Deep Neural Network. IEEE Access 2019, 7, 23210–23221. [Google Scholar] [CrossRef]
- Huang, Y.; Peng, H.; Liu, Q.; Yang, Q.; Wang, J.; Orellana-Martin, D.; Perez-Jimenez, M.J. Attention-enabled gated spiking neural P model for aspect-level sentiment classification. Neural Networks 2023, 157, 437–443. [Google Scholar] [CrossRef] [PubMed]
- Taqi, A.M.; Awad, A.; Al-Azzo, F.; Milanova, M. The Impact of Multi-Optimizers and Data Augmentation on TensorFlow Convolutional Neural Network Performance. 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), 2018, pp. 140-145. [CrossRef]
- Huk, M. Stochastic Optimization of Contextual Neural Networks with RMSprop. Lecture Notes in Computer Science 2020, 12034, 343–352. [Google Scholar]
- Gautam, A.; Singh, V. CLR-based deep convolutional spiking neural network with validation based stopping for time series classification. Appl Intell 2020, 50, 830–848. [Google Scholar] [CrossRef]
- Liu, B.; Zhang, Y.; He, D.; Li, Y. Identification of Apple Leaf Diseases Based on Deep Convolutional Neural Networks. Symmetry 2018, 10, 11. [Google Scholar] [CrossRef]
- Kisvari, A.; Lin, Z.; Liu, X. Wind power forecasting – A data-driven method along with gated recurrent neural network. Renewable Energy 2021, 163, 1895–1909. [Google Scholar] [CrossRef]
- Kim, K.-S.; Choi, Y.-S. HyAdamC: A New Adam-Based Hybrid Optimization Algorithm for Convolution Neural Networks. Sensors 2021, 21, 4054. [Google Scholar] [CrossRef] [PubMed]
- Shankar, K.; Kumar, S.; Dutta, A.K.; Alkhayyat, A.; Jawad, A.J.M.; Abbas, A.H.; Yousif, Y.K. An Automated Hyperparameter Tuning Recurrent Neural Network Model for Fruit Classification. Mathematics 2022, 10, 2358. [Google Scholar] [CrossRef]
- Wu, J.; Chua, Y.; Zhang, M.; Yang, Q.; Li, G.; Li, H. Deep Spiking Neural Network with Spike Count based Learning Rule. 2019 International Joint Conference on Neural Networks (IJCNN), 2019, pp. 1-6. [CrossRef]
- Gong, M.; Zhou, H.; Qin, A.K.; Liu, W.; Zhao, Z. Self-Paced Co-Training of Graph Neural Networks for Semi-Supervised Node Classification. IEEE Transactions on Neural Networks and Learning Systems 2022, 1–14. [Google Scholar] [CrossRef]
- Bararnia, H.; Esmaeilpour, M. On the application of physics informed neural networks (PINN) to solve boundary layer thermal-fluid problems. International Communications in Heat and Mass Transfer 2022, 132, 105890. [Google Scholar] [CrossRef]
- Lu, S.; Sengupta, A. Exploring the Connection Between Binary and Spiking Neural Networks. Front. Neurosci. 2020, 14, 535. [Google Scholar] [CrossRef] [PubMed]
- Freire, P.J. et al. Complex-Valued Neural Network Design for Mitigation of Signal Distortions in Optical Links. Journal of Lightwave Technology 2021, 39, 1696–1705. [Google Scholar] [CrossRef]
- Jiang, J.; Ren, H.; Zhang, M. A Convolutional Autoencoder Method for Simultaneous Seismic Data Reconstruction and Denoising. IEEE Geoscience and Remote Sensing Letters 2022, 19, 1-5. IEEE Geoscience and Remote Sensing Letters 2022, 19, 1–5. [Google Scholar]
- Khan, W.; Ali, S.; Muhammad, U.S.K.; Jawad, M.; Ali, M.; Nawaz, R. AdaDiffGrad: An Adaptive Batch Size Implementation Technique for DiffGrad Optimization Method. 2020 14th International Conference on Innovations in Information Technology (IIT), 2020, pp. 209-214. [CrossRef]
- Sun, W.; Wang, Y.; Chang, K.; Meng, K. IdiffGrad: A Gradient Descent Algorithm for Intrusion Detection Based on diffGrad. 2021 IEEE 20th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), 2021, pp. 1583-1590. [CrossRef]
- Roy, S.K.; Manna, S.; Dubey, S.R.; Chaudhuri, B.B. LiSHT: Non-parametric linearly scaled hyperbolic tangent activation function for neural networks. arXiv 2022, arXiv:1901.05894v3, Yogi. [Google Scholar]
- Valova, I.; Harris, C. Valova, I.; Harris, C.; Mai,T.; Gueorguieva, N. Optimization of Convolutional Neural Networks for Imbalanced Set Classification. Procedia Computer Science 2020, 176, 660–669. [Google Scholar] [CrossRef]
- Yogi, S.C.; Tripathi, V.K.; Behera, L. Adaptive Integral Sliding Mode Control Using Fully Connected Recurrent Neural Network for Position and Attitude Control of Quadrotor. IEEE Transactions on Neural Networks and Learning Systems 2021, 32, 5595–5609. [Google Scholar] [CrossRef]
- Shi, H.; Wang, L.; Scherer, R.; Woźniak, M.; Zhang, P.; Wei, W. Short-Term Load Forecasting Based on Adabelief Optimized Temporal Convolutional Network and Gated Recurrent Unit Hybrid Neural Network. IEEE Access 2021, 9, 66965–66981. [Google Scholar] [CrossRef]
- Guo, J.; Liu, Q.; Guo, H.; Lu, X. Ligandformer: A Graph Neural Network for Predicting Ligand Property with Robust Interpretation. arXiv 2022, arXiv:2202.10873v3, AdaBound. [Google Scholar]
- Wu, D.; Yuan, Y.; Huang, J.; Tan, Y. Optimize TSK Fuzzy Systems for Regression Problems: Minibatch Gradient Descent With Regularization, DropRule, and AdaBound (MBGD-RDA). IEEE Transactions on Fuzzy Systems 2020, 28, 1003–1015. [Google Scholar] [CrossRef]
- Demertzis, K.; Iliadis, L.; Pimenidis, E. Large-Scale Geospatial Data Analysis: Geographic Object-Based Scene Classification in Remote Sensing Images by GIS and Deep Residual Learning. Proceedings of the International Neural Networks Society 2020, 2, 274–291. [Google Scholar]
- Wang, C. et al. Distributed Newton Methods for Deep Neural Networks. Neural Computation 2018, 30, 1673–1724. [Google Scholar] [CrossRef]
- Kim, H., Wang, C.; Byun, H.; Hu, W.; Kim, S.; Jiao, Q.; Lee, T.H. Variable three-term conjugate gradient method for training artificial neural networks.Neural Networks 2022(accepted paper). [CrossRef]
- Peng, C.-C.; Magoulas, G.D. Adaptive Nonmonotone Conjugate Gradient Training Algorithm for Recurrent Neural Networks. 19th IEEE International Conference on Tools with Artificial Intelligence(ICTAI 2007), 2007, pp. 374-381. (L-)BFGS. [CrossRef]
- Franklin, T.S.; Souza, L.S.; Fontes, R.M.; Martins, M.A.F. A Physics-Informed Neural Networks (PINN) oriented approach to flow metering in oil wells: an ESP lifted oil well system as a case study. Digital Chemical Engineering 2022, 5, 100056. [Google Scholar] [CrossRef]
- Koshimizu, H.; Kojima, R.; Kario, K.; Okuno, Y. Prediction of blood pressure variability using deep neural networks. International Journal of Medical Informatics 2020, 136, 104067. [Google Scholar] [CrossRef] [PubMed]
- Osawa, K.; Tsuji, Y.; Ueno, Y.; Naruse, A.; Foo, C.-S.; Yokota, R. Scalable and Practical Natural Gradient for Large-Scale Deep Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 2022, 44, 404–415. [Google Scholar] [CrossRef] [PubMed]
- Sun, F.; Sun, J.; Zhao, Q. A deep learning method for predicting metabolite–disease associations via graph neural network. Briefings in Bioinformatics 2022, 23. [Google Scholar] [CrossRef]
- Boso, F.; Tartakovsky, D.M. Information geometry of physics-informed statistical manifolds and its use in data assimilation. Journal of Computational Physics 2022, 467, 111438. [Google Scholar] [CrossRef]
- Becker, S.; Li, W. Quantum Statistical Learning via Quantum Wasserstein Natural Gradient. J Stat Phys 2021, 182, 1–26. [Google Scholar] [CrossRef]
- You, J.-K.; Cheng, H.-C.; Li, Y.-H. Minimizing Quantum Rényi Divergences via Mirror Descent with Polyak Step Size. 2022 IEEE International Symposium on Information Theory (ISIT), 2022, pp. 252-257. [CrossRef]
- Chen, Y.; Chang, H.; Meng, J.; Zhang, D. Ensemble Neural Networks (ENN): A gradient-free stochastic method. Neural Networks 2019, 110, 170–185. [Google Scholar] [CrossRef]
- Han, D.; Yuan, X. A Note on the Alternating Direction Method of Multipliers. J Optim Theory Appl 2012, 155, 227–238. [Google Scholar] [CrossRef]
- Zhang, S.; Liu, M.; Yan, J. The Diversified Ensemble Neural Network.Advances in Neural Information Processing Systems 33, 2020.
- Dominic, S.; Das, R.; Whitley, D.; Anderson, C. Genetic reinforcement learning for neural networks. IJCNN-91-Seattle International Joint Conference on Neural Networks 1991, 2, 71–76. [Google Scholar]
- Kanwar, S.; Awasthi, L.K.; Shrivastava, V. Feature Selection with Stochastic Hill-Climbing Algorithm in Cross Project Defect Prediction. 2022 2nd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE), pp. 632-635, 2022. [CrossRef]
- Sexton, R.S.; Dorsey, R.E.; Johnson, J.D. Optimization of neural networks: A comparative analysis of the genetic algorithm and simulated annealing. European Journal of Operational Research 1999, 114, 589–601. [Google Scholar] [CrossRef]
- Maehara, N.; Shimoda, Y. Application of the genetic algorithm and downhill simplex methods (Nelder–Mead methods) in the search for the optimum chiller configuration. Applied Thermal Engineering 2013, 61, 433–442. [Google Scholar] [CrossRef]
- Huang, G.B.; Chen, L. Enhanced random search based incremental extreme learning machine. Neurocomputing 2008, 71, 3460–3468. [Google Scholar] [CrossRef]
- Pontes, F.J.; Amorim, G.F.; Balestrassi, P.P.; Paiva, A.P.; Ferreira, J.R. Design of experiments and focused grid search for neural network parameter optimization. Neurocomputing 2016, 186, 22–34. [Google Scholar] [CrossRef]
- Farfán, J.F.; Cea, L. Improving the predictive skills of hydrological models using a combinatorial optimization algorithm and artificial neural networks. Model. Earth Syst. Environ. 2022. [Google Scholar] [CrossRef]
- Zerubia, J.; Chellappa, R. Mean field annealing using compound Gauss-Markov random fields for edge detection and image estimation. IEEE Transactions on Neural Networks 1993, 4, 703–709. [Google Scholar] [CrossRef]
- Ihme, M.; Marsden, A.L.; Pitsch, H. Generation of Optimal Artificial Neural Networks Using a Pattern Search Algorithm: Application to Approximation of Chemical Systems. Neural Computation 2008, 20, 573–601. [Google Scholar] [CrossRef] [PubMed]
- Vilovic, I.; Burum, N.; Sipus, Z. Design of an Indoor Wireless Network with Neural Prediction Model. The Second European Conference on Antennas and Propagation, EuCAP 2007, pp. 1-5, 2007. [CrossRef]
- Bagherbeik, M.; Ashtari, P.; Mousavi, S.F.; Kanda, K.; Tamura, H.; Sheikholeslami, A. A Permutational Boltzmann Machine with Parallel Tempering for Solving Combinatorial Optimization Problems. Lecture Notes in Computer Science 2020, 12269, 317–331. [Google Scholar]
- Poli, R.; Kennedy, J.; Blackwell, T. Particle swarm optimization. Swarm Intell 2007, 1, 33–57. [Google Scholar] [CrossRef]
- Wang, Q.; Perc, M.; Duan, Z.; Chen, G. Delay-enhanced coherence of spiral waves in noisy Hodgkin–Huxley neuronal networks. Physics Letters A 2008, 372, 5681–5687. [Google Scholar] [CrossRef]
- Fernandes Jr, F.E.; Yen, G.G. Pruning deep convolutional neural networks architectures with evolution strategy. Information Sciences 2021, 552, 29–47. [Google Scholar] [CrossRef]
- Cho, H.; Kim, Y.; Lee, E.; Choi, D.; Lee, Y.; Rhee, W. Basic Enhancement Strategies When Using Bayesian Optimization for Hyperparameter Tuning of Deep Neural Networks. IEEE Access 2020, 8, 52588–52608. [Google Scholar] [CrossRef]
- Pauli, P.; Koch, A.; Berberich, J.; Kohler, P.; Allgöwer, F. Training Robust Neural Networks Using Lipschitz Bounds. IEEE Control Systems Letters 2022, 6, 121–126. [Google Scholar] [CrossRef]
- Rong, G.; Li, K.; Su, Y.; Tong, Z.; Liu, X.; Zhang, J.; Zhang, Y.; Li, T. Comparison of Tree-Structured Parzen Estimator Optimization in Three Typical Neural Network Models for Landslide Susceptibility Assessment. Remote Sens. 2021, 13, 4694. [Google Scholar] [CrossRef]
- Chen, Y.; Chang, H.; Meng, J.; Zhang, F. Ensemble Neural Networks (ENN): A gradient-free stochastic method. Neural Networks 2019, 110, 170–185. [Google Scholar] [CrossRef] [PubMed]
- Yang, X.-J. General Fractional Derivatives. Theory, Methods and Applications; CRC Press, Taylor and Francis Group: Boca Raton, 2019. [Google Scholar]
- Wang, J.; Wen, Y.; Gou, Y.; Ye, Z.; Chen, H. Fractional-order gradient descent learning of BP neural networks with Caputo derivative. Neural Networks 2017, 89, 19–30. [Google Scholar] [CrossRef] [PubMed]
- Sales Teodoro, G.; Tenreiro Machado, J.A.; Capelas de Oliveira, E. A review of definitions of fractional derivatives and other operators. Journal of Computational Physics 2019, 388, 195–208. [Google Scholar] [CrossRef]
- Louati, H.; Bechikh, S.; Louati, A.; Hung, C.C.; Said, L.B. Deep convolutional neural network architecture design as a bi-level optimization problem. Neurocomputing 2021, 439, 44–62. [Google Scholar] [CrossRef]
- Yang, J.; Ji, K.; Liang, Y. Provably Faster Algorithms for Bilevel Optimization. Part of Advances in Neural Information Processing Systems 34, 2021. .
- Hong, M.; Wai, H.T.; Wang, Z.; Yang, Z. A two-timescale framework for bilevel optimization: Complexity analysis and application to actor-critic. arXiv 2020, arXiv:2007.05170. [Google Scholar] [CrossRef]
- Khanduri, P.; Zeng, S.; Hong, M.; Wai, H.-T.; Wang, Z.; Yang, Z. A Near-Optimal Algorithm for Stochastic Bilevel Optimization via Double-Momentum, Part of Advances in Neural Information Processing Systems 34, 2021. .
- Grazzi, R.; Franceschi, L.; Pontil, M.; Salzo, S. On the iteration complexity of hypergradient computation. In International Conference on Machine Learning (ICML), pp. 3748–3758, 2020.
- Sow, D.; Ji, K.; Liang, Y. Es-based jacobian enables faster bilevel optimization. arXiv 2021, arXiv:2110.07004. [Google Scholar]
- Ji, K.; Yang, J.; Liang, Y. Bilevel Optimization: Convergence Analysis and Enhanced Design. Proceedings of the 38th International Conference on Machine Learning, PMLR, 2021, 139, 4882-4892.
- Pang, G.; Lu, L.; Karniadakis, G.E. fPINNs: Fractional physics-informed neural networks. SIAM Journal on Scientific Computing 2019, 41, 2603–2626. [Google Scholar] [CrossRef]
- Gupta, V.; Koren, T.; Singer, Y. Shampoo: Preconditioned Stochastic Tensor Optimization. Proceedings of Machine Learning Research 2018, 80, 1842–1850. [Google Scholar]
- Henderson, M.; Shakya, S.; Pradhan, S. et al. Quanvolutional neural networks: powering image recognition with quantum circuits. Quantum Mach. Intell. 2020, 2, 2. [Google Scholar] [CrossRef]
- Guo, Y.; Liu, M.; Yang, T.; Rosing, T. Improved Schemes for Episodic Memory-based Lifelong Learning. Part of Advances in Neural Information Processing Systems 33, 2020. .
- Zhang, D.; Liu, L.; Wei, Q.; Yang, Y.; Yang, P.; Liu, Q. Neighborhood Aggregation Collaborative Filtering Based on Knowledge Graph. Appl. Sci. 2020, 10, 3818. [Google Scholar] [CrossRef]
- Zhou, J.; Cui, G.; Hu, S.; Zhang, Z.; Yang, C.; Liu, Z.; Wang, L.; Li, C.; Sun, M. Graph neural networks: A review of methods and applications. AI Open 2020, 1, 57–81. [Google Scholar] [CrossRef]
- Wang, G.; Deb, S.; Cui, Z. Monarch butterfly optimization. Neural Comput and Applic 2019, 31, 1995–2014. [Google Scholar] [CrossRef]

| CG Update Parameter | Authors | Year |
|---|---|---|
| Hestenes and Stiefel [78] | 1952 | |
| Fletcher and Reeves [79] | 1964 | |
| Daniel [80] | 1967 | |
| Polak and Ribière [81] and by Polyak [82] | 1969 | |
| Fletcher [83], CD stands for “Conjugate Descent” | 1987 | |
| Liu and Storey [84] | 1991 | |
| Dai and Yuan [85] | 1999 | |
| Hager and Zhang [86] | 2005 |
| Probability Density Function | Fisher Information Matrix | Probability Distribution |
|---|---|---|
| Gauss [110] | ||
| Multinomial [111] | ||
| Dirichlet [108,112] | ||
| for | Generalized Dirichlet [108] | |
| and | -zero matrix |
| Potential Function | Bregman Divergence | Algorithm |
|---|---|---|
| Gradient Descent | ||
| Exponentiated Gradient Descent |
| Type of Optimization Algorithm | Optimizer | Application |
|---|---|---|
| SGD-type | CNN [131], RNN [132], GNN [133], | |
| SGD | PINN [134], SNN [135], CVNN [136], | |
| AE [137] | ||
| AdaGrad | CNN [138] | |
| AdaDelta | CNN[139], RNN [139], SNN [140] | |
| RMSProp | CNN [141], RNN [142], SNN [143] | |
| SGDW | CNN [30] | |
| SGDP | CNN [31] | |
| QHM | CNN [32] | |
| NAG | CNN [144], RNN [145] | |
| Adam-type | CNN [146], RNN [147], SNN [148], | |
| Adam | PINN [150], GNN [151], | |
| CVNN [152] | ||
| AdamW | CNN [40] | |
| AdamP | CNN [42] | |
| QHAdam | CNN [45] | |
| Nadam | CNN [47] | |
| Radam | CNN [48] | |
| DiffGrad | CNN [154], RNN [155], GNN [156] | |
| Yogi | CNN [157], RNN [158] | |
| AdaBelief | CNN [159], RNN [159], GNN [160] | |
| AdaBound | CNN [161], RNN [162] | |
| AdamInject | CNN [64] | |
| PNM-type | PNM | CNN [68] |
| AdaPNM | CNN [68] | |
| Adan | CNN [69] | |
| Newton | Newton approach | CNN [163] |
| CG | CNN [164], GNN [165] | |
| Quasi-Newton | (L-)BFGS | PINN [166] |
| SR1 | CNN [167] | |
| Apollo | CNN [92] | |
| AdaHessian | CNN [93] | |
| Information geometry | CNN [168], RNN [109], | |
| NGD | GNN [169], PINN [170], | |
| QNN [170] | ||
| MD | CNN, RNN [172] |
| Type of Optimization Algorithm | Optimizer |
|---|---|
| Local optimization | Hill Climbing [176], |
| Stochastic Hill Climbing [177], | |
| Simulated Annealing [178], | |
| Downhill Simplex Optimization [179] | |
| Global optimization | Random Search [180], |
| Grid Search [181], | |
| Random Restart Hill Climbing [182], | |
| Random Annealing [183], | |
| Pattern Search [184], | |
| Powell’s Method [185] | |
| Population-based optimization | Parallel Tempering [186], |
| Particle Swarm Optimization [187], | |
| Spiral Optimization [188], | |
| Evolution Strategy [189] | |
| Sequential model-based optimization | Bayesian Optimization [190], |
| Lipschitz Optimization [191], | |
| Tree of Parzen Estimators [192] |
| Type of fractional derivatives | Formulas |
|---|---|
| Riemann-Liouville | , |
| , | |
| where | |
| Liouville-Sonine-Caputo | , |
| , | |
| where and | |
| Tarasov | , |
| , | |
| where and , | |
| Hadamard | , |
| , | |
| where , and | |
| Marchaud | , |
| , | |
| where | |
| Liouville-Weyl | , |
| , | |
| where and | |
| Sabzikar-Meerschaert-Chen | , |
| , | |
| where and | |
| Katugampola | , |
| , | |
| where and |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).