1. Introduction
We will show that many aspects of cognition are rooted in functions, functional com- position, and functional type.
The first idea of the neural network was introduced in the seminal paper of Warren McCulloch and Walter Pitts [
35]. Neurons were considered containers of discrete val ues, and synapses connecting neurons were seen as functions sending values between connected neurons, depending on the values contained in the neurons. These networks were shown as equivalent to finite automata [
36]. An important change of perspective in this approach emerged after the seminal book of Donald Hebb [
5,
15], where the intrinsic functional perspective of neural networks shifted toward a dual vision of these systems. Neurons became places containing functions, and the synapses expressed arrows to which real values,
weights, were associated for denoting connection strengths. In the following years, seminal papers with different perspectives but with a common emphasis on the connectionist nature of these systems enforced this vision, giving more relevance to the Hebbian viewpoint focused on the organization of behavior as a result of the synaptic activity [
17,
24,
29,
43,
52,
53].
The mathematical notion of function is strongly related to that of the dependent variable, which assumes values depending on the values taken by an independent variable. In
Introductio in Analysin in Infinitorum (1748) [
11], Leonhard Euler gives the first account of a function related to the notion of algebraic expression. In the 1920s,
Alonzo Church introduced λ-notation, which, more rigorously, expresses functions in- dependently from the used variables. Moreover, in 1936, Church proved the equivalence between lambda functions and Turing machines by showing that Euler’s functional notation essentially provides all the computable functions.
Surely, the recent success of artificial neural networks is related to the specific structures of their internal organization, the architecture of these structures, the learning algorithms, the computational power of electronic devices, and the big data availability [
6,
12,
18,
19,
20,
21,
22,
23,
27]. However, a more abstract descriptive theory based on functions can enlighten a logic of cognition organization within a very general perspective, where cognitive levels correspond to functional types and computational concepts become relevant to understanding neurological phenomena and
vice versa.
The first important consequence of a functional description of cognition is the possibility of giving precise meanings to important concepts such as learning, meaning, comprehension, and knowledge.
The following words of the recent Nobel laureate Geoffrey Hinton (Physics Nobel Prize 2024) suggest that general models of cognition have a relevance going beyond the successes of artificial intelligence:
But sooner or later computational studies of learning in artificial neural networks will converge on the methods discovered by evolution. When that happens a lot of diverse empirical data about the brain will finally make sense and many new applications of artificial neural networks will become feasible [
18].
2. Cognitive Systems
A cognitive system acquires, elaborates, memorizes, and provides information through specific competencies defining its behavior due to its internal structure. However, exhibiting its competencies, it continuously reconfigures itself by changing its structure and competencies accordingly. In this sense, a cognitive system is not only a system elaborating information but an open reflexive informational system elaborating new possibilities of elaborating information.
In the following, we give more details and formal support to this initial intuition of a cognitive system in terms of Functional Networks (FN).
An FN is a weighted network of real functions belonging to a limited class (polynomial, sigmoid, hyperbolic tangent, zero-linear functions). Any function of the FN is a node or a circle with entering arrows and one exiting arrow. The entering arrows assume input weighted and summed arguments, and the exiting arrow provides the output results. Input values to a node correspond to exits of other functions or global inputs to the FN. Output values from a node correspond to global outputs of the FN or inputs to other nodes. No cycles are allowed in the node-arrow graph. The FN is completely expressed by a system of equations representing a composition of basic functions.
The composition is based on weights. Namely, assume that a function fi has entering arrows x1, x2, . . . , xk and yj is the result arrow. The real numbers, called
weights, are associated with the entering arrows in such a way that:
yj = f (w1x1 + w2x2 + . . . + wkxk)
and the argument of f is called its weighted input.
Under general hypotheses, the following proposition holds.
Proposition 1
Any continuous m-valued real function of n arguments can be approximated (at any given approximation level) by a suitable FN [9,25,37,39].
An FN is equivalently described by a system of functional equations on numerical variables. All the basic ingredients of mathematics are in such a system: Functions, Equations, Variables, and Numbers.
In the following FN, variables x, y take input values of the network, while u, w give its outputs, and basic functions are simple algebraic expressions. The other variables assume the values passed in function compositions:
z1 = 3x + 2y v1 = 2x × 5y
1
v2 = 2v1 − z1
z2 = 5v2 + 7z1
u = 5z2 + 10
w = 3v2
When we consider the equations above in more detail, we discover that they are organized at different levels. At the first level, we have the input variables (
x, y) that are weighted arguments of some functions that provide second-level variables (
v1, z1). In general, variables of level
i are weighted arguments of functions that are the variables of level
i + 1. Variables of the maximum level are the output variables of the system (
u, w). In other words, an FN is realized by a system of equations organized in a certain number of levels where, and at each level, some functions transform variables of a given level into variables of the next level, avoiding jumps of more than one level.The equation system above expresses a function from R2→ R2.
Figure 1 is the graph representation of the above system of equations (i = λx; q = λx, y.x2 + y).
Any part of an FN where we can distinguish n input and m output arrows represents a function between real hyperspaces. We say the system above of type FN (2, 2) (two input variables and two output variables) or, more precisely, of type FN (2, 2, 2, 2) if we want also to explicit the number of variables of the internal levels. Of course, the terminology extends naturally to a type F (n, m) for any positive n, m and a type FN (n1, n2, . . . , nk) for any FN of k levels with n1 input variables, nk output variables, and n2, . . . , nk−1 variables of internal levels.
Consider a simple example of behavior competence: crossing the road in a city. The crossing person can perform a safe crossing task when he/she evaluates the distances and speeds of vehicles approaching the crossing zone and consequently provides times, speeds, and the trajectory of a motion, with possible points and times of interruption. All this information is encoded in input and output values, whose association expresses the competence. It is coordinated with other sensorimotor competencies and translated into muscle forces, rotations, and tractions at many levels, involved in the movements necessary to the whole process. In general, we can state the following proposition.
Proposition 2
Any Cognition Competence identifies a real function F : Rn → Rm.
Therefore, the capability to adequately perform a process is ultimately represented by a function or several coordinated functions, giving the right correspondences be- tween arguments and results that the requested ability requires. These functions are compositions of elementary functions. This fact is based on an important result about neural networks ensuring that any continuous function fC : Rn→ Rn can be realized by suitable FN.
Given an FN the continuous real function that it computes is the point of a suitable real hyperspace, as asserted by the following proposition.
Proposition 3
Any continuous m-valued real function of n arguments computed an FN with S internal arrows (that are not input or output arrows) is completely identified by a vector in the real hyperspace RS. [9,12,25,37,39].
The last proposition has a strong significance because it entails that acquiring a given competence means acquiring a vector of a real hyperspace. In Mathematics, many results ensure the approximate representation of continuous functions using special classes of functions (polynomials, trigonometric polynomials, Bernstein polynomials
…). The FN approximation result is very important because the class of basic functions is limited and can be reduced to only one type of function (for example, the Rectified Linear Unit given by λx.max{ 0,→ x} ), and the level of approximation can be reduced by using many levels in the composition of functions. These two characteristics are essential for the Machine Learning approach to cognitive systems.
A short digression on the neuronal systems of animals is useful for appreciating the biological meaning of the picture given above. An animal neuron is a special cell and a neuronal tissue is an aggregate of connected neurons exchanging signals. However, signals are represented by numerical variables with a complex physical and biochemical nature. In particular, there is a strong difference between afferent and efferent signals. The neuron exit signal has an electrical nature due to a change of cellular electrical potential, and this potential through branches of central axons, called dendrites, propagates outside the neuron (toward other neurons, or sensorimotor cells that are not neurons). On the other hand, synapses are cell sites receiving stimuli (internal or external to neuronal tissue) where signals activate biochemical processes mediated by appropriate kinds of molecules that interact with the neuron cell dynamics, at many levels of its internal metabolism and its informational DNA-RNA based system.
Synapse plasticity was identified as the basis of learning processes and memory [
15]. Synapses are very complex structures expressing the levels of connection between neurons. These levels encode many aspects of the relational system that the neuronal tissue expresses. As Eric Kandel has shown [
26] in his investigations on Aplysia (a marine slug), an entire history of experiences can drive the state of a synapse, by originating seeds of associative memories that are the basis of many emergent aspects of neurons. The function of a single neuron is a real function with only one exit, but minimal aggregates of connected neurons can realize an elementary competence, for example, in the case of Aplysia, a circuit of four neurons is responsible for gill- withdrawal reflex, in which the slug will withdraw its gill when its adjacent water siphon is touched. This phenomenon unravels the molecular mechanisms for learning and memory. In this system, synapses exhibiting long-term potentiation provide the acquisition of a stable strength in synapse connections, which are responsible for the observed functional acquisition.
This evidence shows the biological relevance of weight synapses as the true encoding of competence acquisition, and functional moduli, which are real functions, exhibit the same structure as the biggest units of the whole network they belong to. In this model, functional modules correspond to groups of neurons with a functional identity, which mathematically are functions from real hyperspaces to real hyperspaces. The homogeneity between functional moduli and the whole system is a kind of reflexivity, which we could define as “holomorphy”, and it is probably related to the scaling-up in cognitive competencies [
1] (for the role of reflexivity in mathematics see [
10]). Finally, synapse modification through experiences is the basis of the open character of cognitive systems, where the right information provides correct behaviors, enforcing new possibilities of more and richer behaviors. This kind of reflexivity is a form of circularity, that captures the initial postulate about a cognitive system, which “exhibiting its competencies it continuously reconfigures itself by changing its structure and competencies”.
3. Attention and Learning
In our initial definition of a cognitive system two main aspects are present: 1) a cognitive system performs some competencies; 2) it continuously reconfigures its structure to extend and enrich its competencies.
This second aspect implies a continuous activity of control over behavior. This implies a dual nature of a cognitive system, where some parts have to control other parts. An attention FN is a functional module of an FN that controls other parts of it. Since the behavior of an FN is completely encoded by its weights, the only way to alter its competencies is to act on weights. We call Learning network a functional module that receives “correction stimuli” as inputs and provides outputs that change the weights for “improving” the behavior of an FN, following some criteria of adequacy.
Let us explain in more detail the possibility of describing learning in terms of functional moduli within a cognitive system.
A weight adjustment mechanism, coupled with a functional network, is a necessary condition for learning. In this sense, learning can be seen as a meta-competence, giving the ability to acquire competence through a finite number of pairs (input, output) and corresponding errors, evaluated in some way, committed to providing outputs different from those of a target function. The following proposition ensures this possibility.
Proposition 4
Given a real function F : Rn→Rm, there exist effective and efficient methods for determining, from a finite subset of the graphic of F, the weights of an FN approximating F, at a specified approximation level (see Backpropagation via Descendent Gradient [12,18,28,37,39,52,53]).
The proposition above entails the passage from learning via instructions (programs) to learning via training (examples). Cognitive systems are not “programmed” as computers are. Namely, they are “trained” by discovering the internal rule expressing a function through trial and error. The training efficiently converges to a target function by adjusting the weights of an FN until it reaches, with a good approximation, the de- sired function corresponding to the learned competence. Even this aspect is described in functional terms because it corresponds to the search for a minimum of the function expressing the errors committed in acquiring the target function.
The general schema of training in acquiring competence can be described in func tional terms, apart from any specific algorithm of its realization. Let fR be the function computed by an FN and FC(a) the function expressing a competence. Given an argu ment a, the error in computing FC(a) can be expressed by fR(a) FC(a) .
Suppose to provide to the FN a sequence of arguments |a1, a2, ¯a3, . . . | of training, and correspondently, the errors e1, e2, e3, For each error, the network changes its
weights by reconfiguring itself. Let R1, R2, R3, the sequence of networks obtained after reconfigurations. The network has an adequate training strategy when, after a certain number of steps, the errors are under a prefixed error threshold ϵ, by arriving at a step j where |fRj (a) − FC(a)| < ϵ.
Natural cognitive systems can learn because they surely possess learning strategies according to the pattern above. The artificial intelligence revolution of recent years is based on Machine Learning, which essentially discovered efficient algorithms for implementing the learning schema outlined above. Moreover, also the learning schema of an FN can be described in terms of a system of functional equations [
32], as a dual FN acting on the given FN for reconfiguring its weights toward the target function realizing a desired competency. In other words, learning is the super competence allowing for the acquisition of competencies.
Proposition 5 Given an FN of type FN (n1, n2, . . . , nk), a dual learning network of type FN (nn, nk 1, . . . , n1) exists acting on the first one and directing its learning strategy toward the target function of the competence acquired by its training.
The proof of the above proposition is given by the system of functional equations given in [
32] (
Section 3, Theorem 1), where back-propagation is expressed by learning equations.
Figure 2 is a visualization of integrating an FN with a learning FN control- ling it. The main idea of integrating the original FN (to be trained) with the training one (learning FN) is in the right part of the figure. Namely, each arrow of the first original FN is replaced by a node of the second FN and by two arrows, determining a bridge path with the same origin and target of the replaced arrow. Apart from the arrows of bridges, the nodes of the second FN are oriented in the opposite verse to the verse of the original FN.
Network integration is based on the following principle.
Principle 6 (FN Holomorphy) Any FN represents a function, and functions acting on FNs can be represented by integrating FNs.
For example, a function acting over a given FN can be expressed by another FN acting on the weights of the first. This idea could be applied to express the super- competencies we consider in the next sections. Analogous ideas are underlying the notions of ANN
capsule, neural hyper-networks, neural meta-networks, and compositional ANN [
8,
13,
38,
46]. In all cases, the holomorphy of functional networks is the main ingredient of these architectures.
The passage from computer science to artificial intelligence is an epochal step syn thesized by the expression from Computing Machines to Machine Learning. A computing machine is a device able to compute a function expressed by an internal program, providing the result of the function in correspondence to any argument of its domain. Conversely, a machine-learning system from some triples (argument, result, error) can discover the function approximating, within a given error threshold, the functional rule underlying all the pairs (argument, result) received in a training process.
However, the learning we described is not a conscious activity, and even if a cognitive system can realize a task, it is performed without understanding how it happens. It is similar to a child who can go on a bike without knowing the mechanisms that make it possible. Of course, often, it is not important to understand what we do, but a further level of comprehension is crucial in many cognition activities.
4. Meaning and Comprehension
The notion of meaning is associated with symbolic systems by which cognitive systems represent an external world and interact with other cognitive systems living in that world. Even in the most primitive cases, symbols represent things and relations be- tween them. A cognitive system is based on primary data (sensory and perceptive) that the system receives from outside and organizes within its functional networks. These networks represent associations of primary data with “forms” internal to the system (sounds, images, odors, tastes, touches). When symbolic systems evolve toward languages the associative mechanisms become more complex because symbols express data and data aggregates, but also relations, facts, evaluations, judgments, and previsions. In other words, primary data become the terminal points of conceptual systems that realize higher-order realities giving interpretation to the primary data of the external reality of the observable things and facts. The terms meaning and comprehension refer to the mechanisms linking things and facts of the external reality with the conceptual world that cognitive systems have elaborated as a result of a process of internal organization, developed through the interaction with the world and other cognitive systems. The conceptual world related to a language is intrinsically based on societies of cognitive systems, that is, groups of individual cognitive systems elaborating a culture and a collective educational system. Every language is developed within a society. Namely, the individual can establish a symbolic system by interacting with someone similar to him but external to him, in the same external world. In this collective perspective, the world is seen by his eyes by the interaction with other eyes seeing it.
The recent success of chatbots is based on transformations of words into numerical vectors,
Embedding Vectors, of dimensions of several thousands of components. These components express the membership to syntactical and semantical categories and features due to the contexts where a word can occur in the text of the given language. These vectors express the correct use of words in discourse. Namely, knowing the meaning of a word means using it correctly. This notion of meaning has a long tradition in linguistics and philosophy [
48]. It has become effective with the implementation of this idea in artificial intelligence with language representation systems such as Word2Vec [
48] and its developments in the wider context of LLMs models (Large Language Models) [
27] of the most recent conversational systems.
In phonematic, any phoneme is defined by the values of pertinent tracts of sounds associated with distinctive features for the phoneme (labial, guttural, velar, nasal, dental, occlusive, …). In the case of words, typical features are syntactic categories, semantical aspects, contexts, styles, registers, …. Given a linguistic corpus, these features are completely derived by all the propositions where words occur, the syntactic roles, the replaceable words, the forbidden or expected words in close positions,…. Many of these features correspond to numerical values of vector components corresponding to scores in linguistic games based on paraphrases, synonyms, antonyms, comparisons, analogies, metaphors, syntheses, and expansions (similar words pass successfully similar tests). Just this kind of game was the basis of
Pretraining phase in
Generative Pretrained Transformer models of artificial conversational models [
41]. However, the very basic idea of semantic vectors is that words have
distributional profils in the texts of a given language, and these profiles can be encoded by numerical (probabilistic) vectors, in a perspective very close to [
14].
According to this perspective, meanings are points of a multidimensional vector space, the language semantic space, with two important characteristics: openness and relativity. The component values of such vectors are not fixed because these values express relations among words, and these relations continuously change during the conversational and dialogical activity. Moreover, these values do not have an absolute character because they depend on the textual fonts, the particular strategies for which they are computed, and the history of training and development of a conversational system. Linguistic games provide components of linguistic embedding vectors. However, numerical values of related semantic vector spaces depend on the criteria for measuring features of their word distributional profiles. This means that different in- dividual experiences determine different semantic vectors and related semantic spaces. Then, a natural question arises. How can we conciliate the relativism of meaning, with its social communication value? A possible answer is that understanding is always uncertain, approximate, partial, and erroneous. People’s communication is generally based on a small percentage of reliability, that is inversely proportional with the se- mantic complexity of the communication and directly proportional with the similarity of the semantic spaces of communicating subjects. Nevertheless, on average, language communication works enough to produce correct and reliable interactions.
Given a semantic space, comprehension corresponds to the localization of linguistic expressions in this space. A proposition of n words provides a (n+1)-gon where vertices are the origin and the n vectors exiting from the origin of the space (of many thousands of components). We can assume the following:
Conjecture 7 The meaning of a n word proposition is a point within a region close to the barycenter of the (n + 1)-gon corresponding to the proposition.
This point of view has many important consequences. The first consequence is that a meaning, constructed by combining words of a proposition, maintains the same dimensionality. Analogously, the meaning of the propositions of discourse again results in a vector having the same dimensionality as its constituents. At any level of semantical composition, the dimensional invariance continues to hold. This interesting phenomenon has complete evidence in the way meanings are represented within the most recent chatbot of the ChatGPT family of OpenAI (since the version Chat- GPT.4). Complex meanings are obtained by processes of localization inside semantic spaces, where the composition of meanings determines semantic regions of intersection within which the single meanings lose some specific individual aspects, but gain characteristics of synthesis and semantic amalgamation.
A second consequence of the intrinsic homogeneity of meanings allows for extending language semantics to complex meanings, by passing from semantics to knowledge.
5. Language and Knowledge
We have considered functional networks and attention mechanisms controlling them in learning processes. However, a complex cognitive system is built of many, up to tens of thousands of, functional networks exhibiting basic behaviors and competencies integrated into complex activities. The scaling up in connections is based on the holomorphic structure of FN. Namely, each FN is representable in the same way single functions are represented. Therefore, the connection of many FN is a second-order FN (over first-order FN). This mechanism can be iterated, again and again, by obtaining competencies built on competencies of the previous levels at each level. A function from a type
τ to e type
σ has type (
τ→σ) (
τ can have a cartesian product type
τ1 ×
τ2 ×
. . . × τk). A function from type (
τ→σ) to a type
ρ has type ((
τ→σ)→
ρ). This same typing mechanism extends naturally to FNs. In this sense, we have functional networks and competencies of high orders. Any FN has the same structure as a neuron, and many FNs can be organized as neurons are, but synchronization mechanisms are necessary for effective coordination. In [
4]
synapsembles were introduced, which play a very similar task [
49].
Knowledge is a high-order competence where many functional networks are coordinated and related to the language competence considered in the previous section. Complex concepts are obtained by “barycentric” syntheses of basic meanings. These syntheses correspond to concepts that become arguments of predications, implications, quantifications, abstractions, …. In other words, the logical structure of the ordinary language becomes a conceptual framework of the organization of complex concepts. This viewpoint explains the importance of language, not only for its primary communicative functions but also for its relational power in connecting representations.
Moreover, even language logic is based on functions at different logical levels, start- ing from individuals, truth values, and unary predicates. Namely, predicates over predicates exist, called hyper-predicates. Good(a) is a predication of property Good to the individual constant a, while Past(P ) is the predication of the property Past to a predicate constant P . Predicates are complemented by an operator, which we de- note by _. Therefore, Love_b expresses the property of an individual loving b, wence Love_b(a) asserts that a loves b. The proposition “John is going home with a bike” can be represented by:
John(a), Go(P ), Pres(P ), Progr(P ), Home(b), Bike(c), P _Place(b)_With(c)(a)
we do not go on to present a logical method of sentence representation [
30,
33]. We limit ourselves to observing that using 20 logical operators, the meaning of any sentence can be obtained by the meanings of the single words. The same approach can be extended to complex semantic units constructed by functional networks. In this sense, ordinary language provides a logical structure that can be iterated at high-order functional networks.
A logical operator deserves special attention; it is the predicative abstraction operator. When it applies to a predicate Pred of type (arg>→ bool), where bool is the type of truth values, it provides P red of type ((arg→bool)→bool), that is a hyper predicate, for Pred. Namely Ĝo(P ) does not mean that P is going, but that P has the property of predicates that imply the predicate Go.
The knowledge involves meaningful entities, of a purely conceptual nature, that shed new light on the whole semantic system of a cognitive system. Let us call these meanings
ideal elements, by extending a terminology due to David Hilbert [
16] for the mathematical entities that redefine completely the previous conce
ptual organization of the mathematical knowledge. Examples of these concepts are
√2, the root
square of 2, which is a number different from any fraction; the imaginary unit
i =
√ 1 that is
a number different from any real number; the Euler number e that is the natural basis of Neperian logarithms; Newton-Leibniz differentials on which mathematical analysis is based on; Cantor’s transfinite ordinal and cardinal numbers. Analogous concepts in physics are the waves, the force fields, the electrons, the photons, the subatomic particles, and the quanta. Of course, lists of the same relevance are available in Biology, Medicine, Economy, and any field of human knowledge. Any conceptual system changes when these ideal elements are elaborated. These elements are new meanings making new the previous meanings from which they spring.
Ideal elements put in evidence two important aspects of cognitive dynamics, which are both related to Jean Piaget’s investigation on “Genetic Epistemology” [
42]—the Assimilation-Accomodation dialectic in cognition equilibrium and the Individual-Collective duality of knowledge evolution.
When a cognitive system acquires new information that provides a gain of knowledge, the acquiring system reacts by searching the minimum variation of its internal state that is compatible with the process of assimilation that internalizes the received information. When this is not possible, or only partially possible, the system promotes a change involving its internal structure. The former process is called assimilation, while the latter is called accommodation. In the assimilation accommodation inter- play, the best solution is a min-max optimization for which the maximum quantity of information has to enter the system, with the minimum change in its structure. An acquisition that does not alter the internal structure is informationally poor, while an acquisition requiring a too intense change could require an excessive updating cost that prevents an effective realization of the change. Therefore, a cognitive system is subjected to opposite forces. A natural comparison can be done with the acquisition of food, where the advantage of the ingested food has to be in equilibrium with its di- gestion cost. In the next section, we outline the role of an emotional system in pushing cognitive systems toward purposes and finalities.
The Individual-Collective duality in knowledge evolution means that individual cognitive systems cannot exploit their potentiality without an external world stimulating them where other cognitive systems similar to them interact with them in a continuous process of of reciprocal influence, from which the cognition acquires a collective character within which each single system is a specific instance. This means that individual processes of cognitive maturation are a sort of holographic image of the collective def inition of the “culture” of the society to which single systems belong. This vision is inspired by a principle held in biological evolution, according to which ontogenesis is a recapitulation of filogenesis
6. Autonomy and Sensoriality
A dynamic is produced by a force orienting and driving it. Informational dynamics is based on data that transform according to rules into new data or new data organization. In this way, a data sequence provides a computation as an information processing dynamic. The starting point is an initial configuration, and the final point is a configuration where no rule can be applied.
Cognitive natural systems are autonomous, that is, they work without any external intervention. In this case, any elaboration is generated by some internal mechanism. This means that, internally, a kind of disequilibrium has to be produced, with con- sequent dynamics, to transform the disequilibrium into a new form of equilibrium. Therefore, a cognition system can autonomously generate a process if a system is con-nected to it, providing disequilibrium conditions and driving it to equilibrium. Such a system is an emotional system supporting cognitive dynamics. An angry animal searching for food is a simple case to explain the situation. An angriness instinct produces the necessity for satisfaction, which activates the food search. Instinct satisfaction is the primordial mechanism activating natural cognitive systems. Such mechanisms are based on a sensoriality. Whence, a hierarchy of finalities is defined, going from instincts to more evolute forms of satisfaction and happiness. In mind, a very sophisticated organization of finalities is established, moving behaviors at all levels. Each person aspires to be happy, and this wish is the motor of his/her life.
This means that behavior autonomy is based on sensoriality and is oriented toward forms of happiness. But happiness implies the satisfaction of finalities in a condition of freedom. This situation is intrinsically the origin of conflicts when the finalities of some individuals are against others. Therefore the freedom of the cognitive system is a critical aspect involving moral and ethical issues. Since the origin of cybernetics, the risk of thinking machines was considered by the founders of the digital era [
40,
54]. Today their speculations are reality and science cannot decline its responsibilities. In Wiener and von Neumann’s time, an analogous debate was active about the nuclear bomb, and the positions of the two scientists were opposite. Artificial intelligence needs a consideration of the relationship between man, nature, and science, in a framework of social finalities and principles that cannot be derogated to the policies of small groups aimed at particular interests.
Events related to emotions can leave traces in synapse values that can remain even when the inputs producing a particular activation of a functional module are not present. In other words, emotions are related to processes of “tracing” that are very important for the efficiency of cognitive competencies.
7. Consciousness and Mind
At the highest level of cognitive competencies, there is consciousness. It can be represented as a map, essentially a function producing images of FNs, hence exploiting the competence of imagination. This meta-competence allows for evaluating possibilities therefore, it is related to action evaluation and behavior planning. It is interesting to consider that the notion of function, which was the origin of our discourse on cognition, is also its conclusion by exploiting the image, which, even in its mathematical terminology, provides the essential representational aspect of function. In this case, parts of a cognitive system are mapped in elements by producing a map of the whole system, giving an internal representation of competencies and concepts to dominate. This super-competence of a cognitive system provides what is intended as a mind, based on the internal perception of its reality as something that exists apart from any external reality by which and for which a cognitive system is designed in its original constitution. The possibility of virtualization is the basis of an autonomous existence motivated by the internal activity of the structure itself.
An important aspect of consciousness is the activity of continuous navigation in the FN map that it exhibits. If the attention mechanism provides mechanisms of focusing on input stimuli, analogous mechanisms of auto-attention are necessary for focusing on some parts of the whole cognitive system. This kind of navigation is an activity that any thinking person experiments continuously. A reliable and uniform way of navigating in cognitive space has to be based on random mechanisms, such as Brownian motion or drunk walking. Many artificial cognitive systems use randomness to give effects similar to human behavior. Still, it is a necessary aspect of intelligence. Intelligence is a complex notion that is difficult to define completely and precisely.
In the spirit of our discourse, it is a quality of a cognitive system that possesses many efficient cognitive competencies. But surely, there is a crucial aspect of intelligence that is related to randomness. Namely, an intelligent system cannot be deterministic; otherwise, it is not creative, and creativity is an essential aspect of intelligence. Even in its etymology, it is based on the ability to capture the internal logic of phenomena. This ability requires abstract and general reasoning. However, a very subtle creative aspect of intelligent behavior relates to randomness, according to the rule of: Using random events to realize finalities. Intelligent cognitive systems must include randomness in their structure to ensure efficient exploration of solution spaces in problem-solving situations.
Even if a cognitive system can manage a map of its competencies, surely many are unconscious, especially if they are at a high level in the functional hierarchy. This can be explained by considering that knowledge about a functional level requires a higher logical level therefore, since the cognitive hierarchy is finite, an uncovered por tion in the consciousness mapping has to exist. However, the reason for the necessity of unconsciousness is deeper and related to randomness. A complex cognitive system can do more than it is conscious to do. This incompleteness allows for openness and plasticity, which were characters very often mentioned as typical of cognition. A com- parison occurs naturally with incompleteness in mathematical logic. We can formalize mathematical theories, but when theories are mathematically rich, there are theorems of the theory that we can never prove in the corresponding formal theory [
34].
The functional description of cognition provided clarifications of many basilar notions of intellectual activity. The following short synthesis could enlighten all the discourse developed so far. Learning means acquiring the function associated with some cognitive competence. Meanings are represented by semantic vectors in a mul tidimensional real space, and comprehension reduces data to the language meaning internally available in a cognitive system. Knowledge is a system of concepts that extend language semantics to a system of complex meanings based on primary mean ings. A cognitive system can reach autonomy when coupled with an emotional system supporting it and defining a hierarchy of finalities. They start from the sensory level of pleasure-pain. The moral and ethical systems of values based on basic emotions produce internal mechanisms of evaluation and orientation for the whole cognitive system. Memory is a relational system of meanings associated with traces of events that give relevance to experiences of emotional importance. Consciousness consists of a (incomplete) cognitive map of the whole cognitive system.
A final lesson emerges from our discourse. Cognition is a functional activity, and artificial cognitive systems shed new light on natural cognition, becoming an instrument for understanding them, and therefore, providing new possibilities for artificial intelligence too.
References
- Awret, U., Holographic Duality and the Physics of Consciousness. Front Syst Neurosci., 15;16:685699 (2022).
- Bahdanau, D., Cho, K., Bengio, Y., Neural Machine Translation by Jointly Learning to Align and Translate, arXiv:1409.0473 (2014).
- Basodi, S., Ji, C., Zhang, H., Pan, Y. Gradient Amplification: An Efficient Way to Train Deep Neural Networks, Big Data Mining and Analytics, 3, 3 (2020).
- Buzsaki G. Neural syntax: Cell assemblies, synapsembles, and readers. Neuron., 68:362–85 (2010).
- Brown, R.E., Donald O., Hebb and the Organization of Behavior: 17 years in the writing. Mol Brain 13, 55 (2020).
- Brown T. B. et al. Language Models are Few-Shot Learners, NEURIPS, 33, 1877–1901 (2020).
- Church, A., A note on the Entscheidungsproblem, Journal of Symbolic Logic, 1, 40-41 (1936).
- Cagnetta, F., Petrini, L., Tomasini, U. M., Wyart, M.: How Deep Neu- ral Networks Learn Compositional Data: The Random Hierarchy Model arXiv:2307.02129v5 [cs.LG] (2024).
- Cybenko, G., Approximation by superposition of a sigmoid function Matematics of Control, Signals, and Systems, 2, 303-314 (1989).
- De Giorgi, E. Selected Papers, Dal Maso G., Forti M., Miranda M., Spagnolo S. (eds.), Springer-Verlag, Berlin & Heidelberg (2006).
- Euler, L. Introductio in analysin infinitorum. Vol. 1o(1748). Euler Archive - All Works 101 https://scholarlycommons.pacific.edu/euler-works/101.
- Goodfellow, I., Bengio, Y. Courville A., Deep Learning. MIT Press (2016).
- Ha, D., Dai, A., Le, Q. V.: Hypernetworks, arXiv:1609.09106v4 [cs.LG] (2016).
- Harris, Z., S., Distributional Structure, WORD, 10:2-3, 146-162 (1954).
- Hebb, O., Organization of Behaviour, Wiley, New York (1949).
- Hilbert, D., Über das Unendliche, Math. Ann. 95, 161–190 (1926).
- Hinton G. E., Implementing semantic networks in parallel hardware. In: Hinton.
- G. E., Anderson, J. A. (eds) Parallel models of associative memory. Lawrence Erlbaum Associates, pp 191–217(1981).
- Hinton G. E., How neural networks learn from experience. Sci Am 267(3):145–151 (1992).
- Hinton, G. E., Osindero, S.; Teh, Y., A fast learning algorithm for deep belief nets, Neural Computation, 18 (7): 1527–1554 (2006).
- Hinton G. E., Learning multiple layers of representation. Trends Cogn. Sci. 11(10):1527– 1554 (2007).
- Hinton, G. E., To recognize shapes, first learn to generate images. In: Drew, T., Cisek, P., Kalaska J (eds) Computational neuroscience: Theoretical insights into brain function. Elsevier, pp 535–548 (2007).
- Hinton, G. E., Krizhevsky, A., Wand. S. D. Transforming auto-encoders. In: International conference on artificial neural networks systems (ICANN-11), pp 1–8 (2011).
- Hinton, G. E. Where do features come from? Cogn. Sci. 38(6):1078–1101 (2014).
- Hopfield, J.J., Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. USA 79, 2554–2558 (1982).
- Hornick, K., Stinchcombe, M., White, M., Multilayer feedforward networks are universal approximators, Neural Networks, 2, 359-366 (1989).
- Kandel, E. R., In Search of Memory. The Emergence of a New Science of Mind.
- W. W. Norton & Company, Inc. New York, N.Y. (2006).
- Kaplan, J. et al. Scaling Laws for Neural Language Models, arXiv, 2001.08361 (2020).
- Kelley, H. J., Gradient Theory of Optimal Flight Paths ARS Semi-Annual Meet- ing, May 9-12, 1960, Los Angeles, CA (1960).
- Le Cun, Y., Une Procédure d’Apprentissage pour Réseau à Seuil Asymétrique. Cognitiva 85: A la Frontiere de l’Intelligence Artificielle des Sciences de la Conaissance des Neurosciences, pp. 599-604 (1985).
- Manca, V., Agile Logical Semantics for Natural Languages. Information, 15, 1, 64 (2024).
- Manca, V., Bonnici, V., Life Intelligence, in Infogenomics (Ch. 6), Springer (2023).
- Manca, V., Artificial Neural Network Learning, Attention, and Memory. Infor- mation, 15, 387, (2024).
- Manca, V., Conversazioni Artificiali. Maieutica al tempo dei chatbots. Carte Amaranto (2024).
- Manca, V., Python Arithmetic. The informational nature of numbers Springer (2024).
- McCulloch, W., Pitts, W., A Logical Calculus of Ideas Immanent in Nervous Activity. Bulletin of Mathematical Biophysics, 5, 115–133 (1943).
- Minsky, M., Computation. Finite and Infinite Machines, Prentice-Hall Inc. (1967).
- Mitchell, T., Machine Learning, McGraw Hill (1997).
- Munkhdalai, T., Yu, H.: Meta Networks, arXiv:1703.00837v2 [cs.LG] (2017).
- Nielsen, M. Neural Networks and Deep Learning. Online (2013).
- Neumann (von), J. The Computer and the Brain, Yale University Press, New Haven, Coo (1958).
- OpenAI, GPT4-Technical Report, ArXiv submit/4812508 [cs.CL] 27 Mar (2023).
- Piaget, J., L’epistemologie Génétique, Presses Universitaires de France, Paris (1970).
- Rosenblatt, F., The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386–408 (1958).
- Rumelhart, D. E., Hinton, G. E., Williams, R. J., Learning representations by back-propagating errors, Nature, 329, October 9 (1986).
- Russell, B., Whitehead, A. N., Principia Mathematica, Cambridge University Press (1910-13).
- Sabour, S., Frosst, N., Hinton, G. E.: Dynamic Routing Between Capsules, 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA. (2017).
- Shannon, C. E., Computers and Automata, Proceedings of the Institute of Radio Engineers, New York, 41, 1234-1241 (1953).
- Skansi, S. (ed.), Guide to Deep Learning Basics. Logical, Historical, and Philo- sophical Perspectives. Springer (2020).
- Tomasello R., Garagnani M., Wennekers T., Pulvermüller F. A., Neurobiologi- cally constrained cortex model of semantic grounding with spiking neurons and brain-like connectivity. Front. Comput. Neurosci.,12:88 (2018).
- Turing, A. M., On computable numbers, with an application to the Entschei- dungsproblem. Proceedings of the London Mathematical Society, 58: 230–265 (1936).
- Turing, A. M., Computing Machinery and Intelligence, Mind, London, N. S. 59,433-460 (1950).
- Werbos, P., Beyond regression: New tools for prediction and analysis in behavior sciences. PhD Thesis, Harvard University, Cambridge, MA. (1974).
- Werbos, P., Backpropagation Through Time: What It Does and How to Do It.
- Proceedings of the IEEE, 78 (10), 1550–1560 (1990).
- Wiener, N. Science and Society, Methodos, Milan, 13, Nn, 49-50, 3-10 (1961).
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).