Submitted:
14 October 2024
Posted:
15 October 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. First-Order Features Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations (1st-FASAM-NODE)
- (i)
- The quantity is a time-like independent variable which parameterizes the dynamics of the hidden/latent neuron units; the initial value is denoted as (which can be considered to be an initial measurement time) while the stopping value is denoted as (which can be considered to be the next measurement time).
- (ii)
- The -dimensional vector-valued function represents the hidden/latent neural networks. In this work, all vectors are considered to be column vectors and the dagger “” symbol will be used to denote “transposition.” The symbol “” will be used to denote “is defined as” or, equivalently, “is by definition equal to.”
- (iii)
- The-dimensional vector-valued nonlinear function models the dynamics of the latent neurons. The components of the vector represents learnable scalar adjustable weights, where denotes the total number of adjustable weights in all of the latent neural nets. The components of the vector-valued function represent the ”feature” functions of the respective weights, which are considered to be “primary parameters;” the quantity denotes the “total number of feature/functions of the primary model parameters” comprised in the NODE. Evidently, the total number of feature functions must necessarily be smaller than the total number of primary parameters (weights), i.e., . In the extreme case when there are no feature functions, it follows that , for all .
- (iv)
- The -dimensional vector-valued function represents the “encoder” which is characterized by “inputs” and “learnable” scalar adjustable weights , where denotes the total number of “inputs” and denotes the total number of “learnable encoder weights” that define the “encoder.”
- (v)
- The -dimensional vector-valued function represents the vector of “system responses.” The vector-valued function represents the “decoder” with learnable scalar adjustable weights, which are represented by the components of the vector , where denotes the total number of adjustable weights that characterize the “decoder.” Each component can be represented in integral form as follows:
3. Second-Order Features Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations (2nd-FASAM-NODE)
3.1. Second-Order Sensitivities Stemming from , ;
- Consider that is an element in a Hilbert space denoted as , , comprising as elements 2-block vectors having the following structure: , with and . The Hilbert space is considered to be endowed with an inner product denoted as , between two vectors and , with , , which is defined as follows:
- 2.
- Use the definition of the inner product provided in Eq. (51) to form the inner product of Eq. (48) with a vector , with and , where the superscript “(2)” indicates “2nd-level”, to obtain the following relationship:
- 3.
- Using the definition of the adjoint operator in , the left-side of Eq. (52) is transformed as follows, after integrating by parts over the independent variable :
- 4.
- The two integral terms on the right-side of Eq. (53) are now required to represent the “indirect-effect” term defined in Eq. (39), which is achieved by imposing the following requirements:
- 5.
- The definition of the 2nd-level adjoint sensitivity function is now completed by requiring it to satisfy the following boundary conditions, which eliminates the respective unknown terms on the right-side of Eq. (53)
- 6.
- Using the results obtained in Eqs. (52), (54)‒(56) in Eq. (53) yields the following alternative expression for the “indirect-effect” term , which no longer involves the 2nd-level variational sensitivity function but involves the 2nd-level adjoint sensitivity function :
- 7.
- Using in Eq. (57) the expression provided for in Eq. (15), and adding the resulting expression for the indirect-effect term to the expression for the direct-effect term provided in Eq. (38), yields the following expression for the total first-order G-differential , for each :
3.2. Second-Order Sensitivities Stemming from ,;
- Using Eq. (20), form the inner product of Eq. (14) with a vector , where the superscript “(2)” indicates “2nd-Level”, to obtain the following relationship:
- Using the definition of the adjoint operator in , the left-side of Eq. (21) is integrated by parts over the independent variable to obtain the following relation:
- The last term on the right-side of Eq. (68) is now required to represent the “indirect-effect” term defined in Eq. (66), which is achieved by requiring that the 2nd-level adjoint sensitivity function satisfy the following relation written in NODE-format:
- 4.
- The definition of the 2nd-level adjoint sensitivity function is now completed by requiring it to eliminate the term containing the unknown values in Eq. (68), which is accomplished by requiring to satisfy the following boundary condition at the final time :
- 5.
- Using the results obtained in Eqs. (69), (70), and (67) in Eq. (68) yields the following alternative expression for the “indirect-effect” term, which does not involve the 1st-level variational sensitivity function but instead involves the 2nd level adjoint function :
- 6.
- Adding the expression obtained in Eq. (71) for the “indirect-effect term” together with the expression of the direct-effect term provided by Eq. (65) yields the following expression for the G-variation defined in Eq. (64), which is to be evaluated at the nominal values of all functions and parameters/weights, for :
3.3. Second-Order Sensitivities Stemming from , ;
- Use the definition of the inner product provided in Eq. (51) to form the inner product of Eq. (48) with a vector , with and , where the superscript “(2)” indicates “2nd-level”, to obtain the following relationship:
- Using the definition of the adjoint operator in , the left-side of Eq. (80) is transformed as follows, after integrating by parts over the independent variable :
- The two integral terms on the right-side of Eq. (81) are now required to represent the “indirect-effect” term defined in Eq. (79), which is achieved by imposing the following requirements on the components of the 2nd-level adjoint sensitivity function :
- 4.
- The definition of the 2nd-level adjoint sensitivity function is now completed by requiring that it satisfy the following boundary conditions, which eliminates the respective unknown terms on the right-side of Eq. (53):
- 5.
- Using the results obtained in Eqs. (80), (82)‒(84) in Eq. (81) yields the following alternative expression for the “indirect-effect” term , which no longer involves the 2nd-level variational sensitivity function but involves the 2nd-level adjoint sensitivity function :
- 6.
- Adding the expression obtained in Eq. (85) to the expression for the direct-effect term provided in Eq. (78) yields the following expression for the total first-order G-differential , for each :
3.4. Second-Order Sensitivities Stemming from , ;
3.5. Discussion: Double-Computation of the Mixed Second-Order Sensitivities
- The mixed second-order sensitivities are obtained in Eq. (60) in terms of the 2nd-level adjoint sensitivity functions ,. On the other hand, the mixed second-order sensitivities are obtained in Eq. (94) in terms of the 2nd-level adjoint sensitivity functions and , for . Due to the symmetry property of the mixed second-order sensitivities, the numerical results obtained for the corresponding mixed second-order sensitivities by computing them using Eq. (60), on the one hand, and using Eq. (94), on the other hand, provides a verification mechanism for assessing the accuracy of the computation of the 2nd-level adjoint functions involved in the respective computations.
- The mixed second-order sensitivities are obtained in Eq. (61) in terms of the 2nd-level adjoint sensitivity functions , for . On the other hand, the mixed second-order sensitivities are obtained in Eq. (87) in terms of the 2nd-level adjoint sensitivity functions and , for . Due to the symmetry property of the mixed second-order sensitivities, the numerical results obtained for the corresponding mixed second-order sensitivities by computing them using Eq. (61), on the one hand, and using Eq. (87), on the other hand, provides a verification mechanism for assessing the accuracy of the computation of the 2nd-level adjoint functions involved in the respective computations.
- The mixed second-order sensitivities are obtained in Eq. (62) in terms of the 2nd-level adjoint sensitivity functions , for . On the other hand, the mixed second-order sensitivities are obtained in Eq. (73) in terms of the 2nd-level adjoint sensitivity functions , for . Due to the symmetry property of the mixed second-order sensitivities, the numerical results obtained for the corresponding mixed second-order sensitivities by computing them using Eq. (62), on the one hand, and using Eq. (73), on the other hand, provides a verification mechanism for assessing the accuracy of the computation of the 2nd-level adjoint functions involved in the respective computations.
- The mixed second-order sensitivities are obtained in Eq. (88) in terms of the adjoint sensitivity functions and , for . On the other hand, the mixed second-order sensitivities are obtained in Eq. (96) in terms of the adjoint sensitivity functions and , for . Due to the symmetry property of the mixed second-order sensitivities, the numerical results obtained for the corresponding mixed second-order sensitivities by computing them using Eq. (74), on the one hand, and using Eq. (97), on the other hand, provides a verification mechanism for assessing the accuracy of the computation of the adjoint functions involved in the respective computations.
- The mixed second-order sensitivities are obtained in Eq. (74) in terms of the 2nd-level adjoint sensitivity functions , for . On the other hand, the mixed second-order sensitivities are obtained in Eq. (97) in terms of the 2nd-level adjoint sensitivity functions , for . Due to the symmetry property of the mixed second-order sensitivities, the numerical results obtained for the corresponding mixed second-order sensitivities by computing them using Eq. (74), on the one hand, and using Eq. (97), on the other hand, provides a verification mechanism for assessing the accuracy of the computation of the 2nd-level adjoint functions involved in the respective computations.
- The mixed second-order sensitivities are obtained in Eq. (75) in terms of the 2nd-level adjoint sensitivity functions , for . On the other hand, the mixed second-order sensitivities are obtained in Eq. (90) in terms of the 2nd-level adjoint sensitivity functions , for . Due to the symmetry property of the mixed second-order sensitivities, the numerical results obtained for the corresponding mixed second-order sensitivities by computing them using Eq. (75), on the one hand, and using Eq. (90), on the other hand, provides a verification mechanism for assessing the accuracy of the computation of the 2nd-level adjoint functions involved in the respective computations.
4. Discussion and Conclusions
Funding
Conflicts of Interest
References
- Lu, Y.; Zhong, A.; Li, Q.; Dong, B. Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; PMLR; pp. 3276–3285. [Google Scholar]
- Ruthotto, L.; Haber, E. Deep neural networks motivated by partial differential equations. J. Math. Imaging Vis. 2018, 62, 352–364. [Google Scholar] [CrossRef]
- Chen, R.T.Q.; Rubanova, Y.; Bettencourt, J.; Duvenaud, D.K. Neural ordinary differential equations. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: New York, NY, USA, 2018; Volume 31, pp. 6571–6583. [Google Scholar] [CrossRef]
- Dupont, E.; Doucet, A.; The, Y.W. Augmented neural odes. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Volume 32, pp. 14–15. [Google Scholar]
- Kidger, P. On Neural Differential Equations. arXiv 2022, arXiv:2202.02435. [Google Scholar]
- Zhong, Y.D.; Dey, B.; Chakraborty, A. Symplectic ode-net: Learning Hamiltonian dynamics with control. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 30 April 2020. [Google Scholar]
- Grathwohl, W.; Chen, R.T.Q.; Bettencourt, J.; Sutskever, I.; Duvenaud, D. Ffjord: Free-form continuous dynamics for scalable reversible generative models. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Kidger, P.; Morrill, J.; Foster, J.; Lyons, T. Neural controlled differential equations for irregular time series. In Proceedings of the Advances in Neural Information Processing Systems, Virtual, 6–12 December 2020; Volume 33, pp. 6696–6707. [Google Scholar]
- Morrill, J.; Salvi, C.; Kidger, P.; Foster, J. Neural rough differential equations for long time series. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; PMLR; pp. 7829–7838. [Google Scholar]
- Tieleman, T.; Hinton, G. Lecture 6.5—RMSProp: Divide the gradient by a running average of its recent magnitude. COURSERA Neural Netw. Mach. Learn. 2012, 4, 26. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Pontryagin, L.S. Mathematical Theory of Optimal Processes; CRC Press: Boca Raton, FL, USA, 1987. [Google Scholar]
- LeCun, Y. A theoretical framework for back-propagation. In Proceedings of the Connectionist Models Summer School; Touresky, D., Hinton, G., Sejnowski, T., Eds. Morgan Kaufmann Publishers, Inc.: San Mateo, CA, USA, 1988. [Google Scholar]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Norcliffe, A.; Deisenroth, M.P. Faster training of neural ODEs using Gauss–Legendre quadrature. arXiv, arXiv:2308.10644.
- Cacuci, D.G. First-Order Comprehensive Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations: Mathematical Framework and Illustrative Application to the Nordheim–Fuchs Reactor Safety Model. J. Nucl. Eng. 2024, 5, 347–372. [Google Scholar] [CrossRef]
- Cacuci, D.G. Introducing the nth-Order Features Adjoint Sensitivity Analysis Methodology for Nonlinear Systems (nth-FASAM-N): I. Mathematical Framework. Am. J. Comput. Math. 2024, 14, 11–42. [Google Scholar] [CrossRef]
- Cacuci, D.G. Introducing the Second-Order Features Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations. II: Illustrative Application to Heat and Energy Transfer in the Nordheim-Fuchs Phenomenological Model for Reactor Safety, Processes, submitted, October 2024.
- Lamarsh, J.R. Introduction to Nuclear Reactor Theory; Adison-Wesley Publishing, Co. : Reading, MA, USA, 1966; pp. 491–492.
- Hetrick, D.L. Dynamics of Nuclear Reactors; American Nuclear Society, Inc.: La Grange Park, IL, USA, 1993; pp. 164–174. [Google Scholar]
- Cacuci, D.G. (1981). Sensitivity theory for nonlinear systems: I. Nonlinear functional analysis approach. J. Math. Phys., 1981, 22, 2794–2812. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).