Working Paper Article Version 1 This version is not peer-reviewed

On the Theory of Deep Learning: A Theoretical Physics Perspective (Part I)

Version 1 : Received: 6 October 2020 / Approved: 13 October 2020 / Online: 13 October 2020 (14:32:18 CEST)

How to cite: Chinea Manrique de Lara, A. On the Theory of Deep Learning: A Theoretical Physics Perspective (Part I). Preprints 2020, 2020100285 Chinea Manrique de Lara, A. On the Theory of Deep Learning: A Theoretical Physics Perspective (Part I). Preprints 2020, 2020100285

Abstract

Deep learning machines are computational models composed of multiple processing layers of adaptive weights to learn representations of data with multiple levels of abstraction. Their structure is mainly reflecting the intuitive plausibility of decomposing a problem into multiple levels of computation and representation since it is believed that higher layers of representation allow a system to learn complex functions. Surprisingly, after decades of research, from learning and design perspectives these models are still deployed in a heuristic manner. In this paper, deep learning feed-forward machines are modeled from a statistical mechanics point of view as disordered physical systems where its macroscopic behavior is determined in terms of the interactions defined between the basic constituent of these models, namely, the artificial neuron. They are viewed as the equilibrium states of a theoretical body that is subject to the law of increase of the entropy. The study of the changes in energy of the body when passing from one equilibrium state to another is used to understand the structure and role of the phase space of the system, the stability of the equilibrium states, and the resulting degree of disorder. It is shown that the topology of these models is strongly linked to their stability and resulting level of disorder. Furthermore, the proposed theoretical characterization permit to assess the thermodynamic efficiency with which information can be processed by these models, and to provide a practical methodology to quantitatively estimate and compare their expected learning and generalization capabilities. These theoretical results provides new insights to the theory of deep learning and their implications are shown to be consistent through a set of benchmarks designed to experimentally assess their validity.

Subject Areas

Deep Learning; Thermodynamics; Learning and Generalization; Diophantine equations

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our diversity statement.

Leave a public comment
Send a private comment to the author(s)
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.