Definitions
The base term reason derives from a statement offered in explanation or justification [
1], while reasoning is the drawing of inference or conclusion through its use [
2], and finally, the attribute of generalized refers to applicability of kind to a well-defined group [
3].
The properties and processes of general reasoning are undefined in the literature, leading to a perspective relegated at best to the boundaries of pure philosophy [
4]. It is therefore of interest to confine the term as a process of information flow instead [
5]. This is a dependence of the flow on the elements of matter and energy, a conceptual link between the human brain and the phenomenon itself, allowing for the hypothesis that an analogous form is possible by artificial design. However, it is also possible that the term is not grounded in the physical world and instead merely resembles a concept with sole existence in the Mind, and therefore its applicability is bounded by the constraints of metaphysics and its methods.
Reasoning as Circuitry in Computation
If this knowledge of biological computation is robustly and relatively equivalent to that of an artificial form, as can be represented by a human engineering and artificial design [
11], then the artificial neural network serves as the analogous form of the biological kind [
5].
Recent work in machine interpretability of transformer circuitry [
13] shows examples of information processing in this artificial setting and includes a learning process that can lead to a general form of an algorithm and its putative computation, as in the skill of generalized reasoning, as observed in a large language model [
14]. An example is shown where two circuits are formed in parallel, a process that is a type of "grokking" [
15], that leads to the formation of circuitry dedicated to this higher form of information processing and search of the algorithmic space. The study further contends that the generalized circuit spanned particular but local layers across the neural network, but it is undetermined as to whether this circuitry is confined by locality, and therefore a gradual building upon of the increasingly larger features of the network, or that the circuit can span without interruption across the network layers, and therefore bridge and "bind" the lower order features with that of the higher order features [
16].
The above example shows the blurring of the boundary between the neural network approaches in engineering and the alternative practices of neurosymbolic ones [
16]. However, the underlying process can be stated as originating and emerging from an unstructured neural network as opposed to any
a prior design as exemplified by a neurosymbolic approach. This suggests that a neural network is a basis for forming and emerging of higher order concepts, and that they are not defined
a priori. A hypothesis to fully test this concept is difficult because it depends on reachability, that any experiment is a robust measure of circuit design. This concept, and that of the others in the above sections, refer to the elements of higher cognition and to the recognition of information for the processes of advanced computation. Therefore, it can be said that the priors of cognition are a neural network that is "programmed" for construction of the informational pathways and patterns that serve as the intermediate basis for advanced computation [
11].
A Critique of a Pure Reasoning Process
The pillars of knowledge of the natural world are derived from a theoretical foundation at the suggestion of the Cartesians and reinforced by the practices of experience and experimentation. Its construction lacks in permanence, a consequence of the corrosive power of dogma and the pressure from a loss of utility from erosion by the forces of misrepresentation and misinterpretation of ideas. The twin virtues of elegance and beauty have further served as a stray guide in a search for the truer paths, such as revealed in the “bitter lesson” [
17], a reminder of the problems in navigating an idealized scientific practice and a central role of data in the machine learning methodology. It suggests data and learning as the core property of interest, a kind of complex manifold, and against the pitfalls of a hand-crafted design or an over-indulgent methodology for modeling and obsession for tinkering. Instead, the general algorithm for learning with a viable space to search to search for optimality are central in finding the patterns of interest in the data.
Hindsight shows the great importance for a large quantity and high quality of samples of data for building a foundation for a model and in developing principles for a materialist epistemology based on a mechanistic understanding of the generation of knowledge. In this perspective, knowledge is represented by a physical process involving the particles of information flow and its cost of movement across space and time. This process is also a mathematical expectation that describes the optimization of data as a manifold and its geometry. Essentially, an optimal manifold of data may be described as a set of connections and paths resulting from the compression of data into a more concise state for higher efficiency in its effective processing by algorithms. The optimality is described by the restrictive forces of information flow, a process dependent on the concept of.entropy and informational complexity, a guide for the potentiality of data compressibility and an algorithmic dependence in finding the shortest paths for traversing the data and its manifold.
This compressibility serves as a guide for forming both hypotheses and a stronger form in the expression of an expectation which is central to the formulation of theory. One outcome is that data compressibility should result in an object that is simpler in its geometrical configuration as a manifold. Another is the optimization process which is expected to compress the navigable paths across its features in finding optimal solutions and efficient algorithmic designs for lowering of traversal costs, such as observed in the resulting linearity in the relationships of facial features among individuals in neuroscience [
18] and recapitulated in the engineering sciences [
19]. It follows that any circuitry described in this context derives from the processes of compression and the pressures of an evolutionary process. This is also a connectionist agenda for building a new manifold of the data, leading to the formation of circuits that may exist in parallel to one another while others defy the common expectations for their occurrence [
14]. The formation of algorithms in totality is the outcome of these processes and reflective of a form of cognition that is observable in deep learning, leading to models with deeply entangled parameters and dependence on empiricism for insight.
The newly formed circuitry is a mirror into the optimization process and its unexpected roles for handling the greater dynamics of information flow [
20]. It leads to a predictiveness on the occurrence of new formations of algorithmic designs and optimal structuring of data. Cognition is therefore defined as a manifold that results from a learning procedure and its attributes for enabling a search that leads to an optimal set of paths in a system.
This is also descriptive of a logical design and potential for computation of a manifold as a system. It also follows that the data is therefore expected to contain a priori the primitive elements for the formation of new circuits. The data in totality is representative of these elements as an implicit coding scheme in contrast to any explicit and direct coding of it in the manifold of the data itself.
Conclusion
These definitions and the definition of processes with a mechanical basis are suitable for hypothesizing and constraining the space of possibilities of experiments in the quest of validation of any theory of generalized reasoning. It also suggests that the problem of advancing the skill of it is not an unreasonable proposition, given the assumption it exists a priori in human cognition. Moreover, there is supporting evidence for it in the mechanisms of the binding of the lower to the higher order concepts, and a reminder of the categorical nature of generalized reasoning and its kinds, as it must depend upon the physics of information flow, as any of these advanced informational processes involve physical motion, and not depend on an abstract and undefined set of traits, but instead tractable to a formalization and definability as a physical process.
This closes the gap for expectations on efficiency among the artificial and natural forms of cognition, with the data of an artificial neural network as potentially compressible in reference to its parameterization of the data and the consequent search for the shortest manifold by a sufficiently capable deep learning method. This description is less a process of the force of sorcery and instead a reflection on cognition as a process of information flow across a set of paths in a manifold – not dependent on any
a priori neurosymbolic design that lacks in a tractability and explanatory power. The misattribution of a traditional definition of cognition with human agency and related behaviors may stem from an inherent resistance to the belief in oneself as central to the universe and its operations, rather than a more material view as an emergent phenomenon from a collection of particles and their motion to form an abstractive system of cognitive processes; however, the concepts of agency, vanity, and “free will” emerge from the deeply entwined modules and processes of the human brain, where the probable cause of this entwinement is likely in a dependence on sociality and the necessity for an establishment and continual measurement of status across a specific social group, and the ever-present dynamics for defining relationships among conspecifics. These behaviors are high in maintenance cost but consistent with notions of other costly displays in the many domains of life, the ones associated with sociality and its elements [
21].
Funding
This research received no external funding.
Conflicts of Interest
The author declares no conflict of interest.
References
- Merriam-Webster Dictionary (an Encyclopedia Britannica Company: Chicago, IL, USA). Available online: https://www.merriam-webster.com/dictionary/reason (accessed on 28 November 2024).
- Merriam-Webster Dictionary (an Encyclopedia Britannica Company: Chicago, IL, USA). Available online: https://www.merriam-webster.com/dictionary/reasoning (accessed on 28 November 2024).
- Merriam-Webster Dictionary (an Encyclopedia Britannica Company: Chicago, IL, USA). Available online: https://www.merriam-webster.com/dictionary/general (accessed on 28 November 2024).
- Friedman, R. Cognition as a Mechanical Process. NeuroSci 2021, 2, 141–150. [Google Scholar] [CrossRef]
- Friedman, R. A Perspective on Information Optimality in a Neural Circuit and Other Biological Systems. Signals 2022, 3, 410–427. [Google Scholar] [CrossRef]
- Friedman, R. Themes of advanced information processing in the primate brain. AIMS Neuroscience 2020, 7, 373–388. [Google Scholar] [CrossRef] [PubMed]
- Davidson, E.H.; Erwin, D.H. Gene regulatory networks and the evolution of animal body plans. Science 2006, 311, 796–800. [Google Scholar] [CrossRef] [PubMed]
- Hennig, W. Grundzüge einer Theorie der Phylogenetischen Systematik; Deutscher Zentralverlag: Berlin, Germany, 1950. [Google Scholar]
- Marcus, G.F. The algebraic mind: Integrating connectionism and cognitive science; MIT press, 2003. [Google Scholar]
- Balestriero, R.; Pesenti, J.; LeCun, Y. Learning in High Dimension Always Amounts to Extrapolation. arXiv arXiv:2110.09485, 2021.
- Friedman, R. Tokenization in the Theory of Knowledge. Encyclopedia 2023, 3, 380–386. [Google Scholar] [CrossRef]
- Friedman, R. Higher Cognition: A Mechanical Perspective. Encyclopedia 2022, 2, 1503–1516. [Google Scholar] [CrossRef]
- Nanda, N.; Chan, L.; Lieberum, T.; Smith, J.; Steinhardt, J. Progress measures for grokking via mechanistic interpretability. arXiv arXiv:2301.05217, 2023.
- Yang, S.; Gribovskaya, E.; Kassner, N.; Geva, M.; Riedel, S. Do Large Language Models Latently Perform Multi-Hop Reasoning? arXiv arXiv:2402.16837, 2024.
- Power, A.; Burda, Y.; Edwards, H.; Babuschkin, I.; Misra, V. Grokking: Generalization beyond overfitting on small algorithmic datasets. arXiv 2022, arXiv:2201.02177. [Google Scholar]
- Chughtai, B.; Chan, L.; Nanda, N. A toy model of universality: Reverse engineering how networks learn group operations. In International Conference on Machine Learning (pp. 6243-6267). PMLR, 2023.
- Sutton, R. The bitter lesson. Incomplete Ideas (blog) 2019, 13, 38. [Google Scholar]
- Chang, L.; Tsao, D.Y. The code for facial identity in the primate brain. Cell 2017, 169, 1013–1028. [Google Scholar] [CrossRef] [PubMed]
- Olah, C.; Satyanarayan, A.; Johnson, I.; Carter, S.; Schubert, L.; Ye, K.; Mordvintsev, A. The building blocks of interpretability. Distill 2018, 3, e10. [Google Scholar] [CrossRef]
- Olah, C.; Cammarata, N.; Schubert, L.; Goh, G.; Petrov, M.; Carter, S. Zoom in: An introduction to circuits. Distill 2020, 5, e00024-001. [Google Scholar] [CrossRef]
- Zahavi, A.; Zahavi, A. The handicap principle: A missing piece of Darwin's puzzle; Oxford University Press, 1999. [Google Scholar]
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).