This approach will be the main inspiration for the following definition.
3.1. A General Definition
We define a cognitive system as a dynamic structure on an event-state space. This is a measurable space
where
S is a state space, for example, a Hilbert space of
features, and
M is a
-algebra of subsets of
S, and
p is a probability measure [
11].
In the space state S, an event is a pair where is a subset of states that may occur at time t (the notion of event occurrence may be further specified in many possible ways, according to parameters of localization/distribution in the considered state space).
Intuitively, a cognitive system develops in time by interacting with its external world through several levels of representation of the reality it perceives. The information content of data coming from the outside is given by the costs that the internal representation requires during the assimilation and accommodation phases of this representation.
Representing objects, events, and states of affairs gains the possibility of discovering relations, making inferences, predictions, and decisions for fruitful interactions with the external world.
The essence of these abilities is the notion of mathematical function, going back to Euler’s definition in
Introductio in Analysin Infinitorum (1748), and formalized in by Alonzo Church in 1930 years in terms of
-expressions, that is, a map sending objects of a set to images of them, in such a way that operating on images gives an advantage with respect to operating directly on the objects. A map of a city is a simple way to understand it. On a map, we can choose the best way to connect two places by exploiting the quick overview of a reduced and synthetic representation of streets. When symbols or symbolic structures map objects, functions are named codes. Typically, a code is a function from strings over a finite alphabet of symbols,
codewords, into a set of encoded objects. Coding theory is a widely investigated subject, central in Information Theory [
7]. We do not enter into specific details about codes, but want to stress their critical role in cognition.
Any cognitive system assumes states. Therefore, at any time, it is placed at a point in a multidimensional space. Some of these states are knowledge states, that is, projections in subspaces, with representational roles.
Coding, possibly articulated in parts and levels, is the main ingredient of cognitive dynamics. Considering a cognitive system only in terms of states and codes is an “extensional" approach, because it prescinds on the physical realization of states (molecular, biomolecular, cellular, circuital, …). In the following, we adopt this viewpoint and will show that, although it is very general and abstract, it allows us to derive interesting properties.
The main intuition of the following definition of a cognitive system is based on internal states, external states, and knowledge states. Internal and external states have to be disjoint to express the self-non-self distinction.
Definition 1.
A cognitive system Γ
has five components: are sets of states
of an assumed event space, where t ranges in a set T denoting the lifetime
of Γ
(Γ
is equipped by an internal clock, ), and is an encoding function, we will abbreviate with γ;
and are non-empty sets of the internal and external states of Γ; is the set of knowledge states, where ;
The self-non-self
distinction holds:
γ i s the knowledge encoding
function: such that, for any argument x, the non-idempotence
condition holds:
is a measure of information, the epistemic information
of an input datum d. It is the sum of two components: the μ component expresses the assimilation cost
of d. The η component expresses the corresponding accommodation cost
, which may require a reorganization of Γ
to reach some consistency and adequacy features of its global structure. □
A state is accessible only if it is a -image of a knowledge state, and this request holds even for knowledge states. In other words, accessibility requires, in any case, a true representational level, through a knowledge state that is different from the state it represents.
In a cognitive system,
may increase and enrich over time; the costs of data acquisition from the system may change dramatically during its lifetime, depending on the construction of its knowledge. Being
and
disjoint, also their counterimages are disjoint, hence no confusion arises between the knowledge encodings of internal and external states, in symbols:
External states are based on
input data expressing the information coming to
from the outside. A part of the internal state is available to the outside and constitutes
output data of
.
Example 1 (Shannon’s Information Source)
. Let us consider a basic cognitive system that takes in input a sequence al symbols over a finite alphabet and assimilates them internally by a knowledge encoding γ that constructs a table of binary encodings of symbols according to a Huffman code, where more probable symbols receive shorter representations [7]. The external states are input sequences to the system; the internal states are tables of pairs (symbol occurrence, binary representation), and the knowledge states are the binary codewords of input symbols with their occurrence multiplicity. □
If the assimilation cost is the length of the binary representation, and the accommodation cost is null, the epistemic information of symbol i results in with the frequency of i in the input sequence. This corresponds exactly to Shannon’s notion of information quantity.
By classical information theory, we know that the average information of symbols in the sequence corresponds to Shannon’s entropy, and no code can exist that has a shorter average information. This means that Shannon entropy is an inferior limit to the average Shannon’s information quantity, and Huffman codes are optimal codes for this information measure. In other words, Shannon’s information quantity is a measure of a basic cognitive system, where knowledge encoding minimizes the average length of codewords (and no assimilation costs are considered).
Shannon’s framework is simple yet powerful enough to develop numerous informational concepts, making it very useful in many contexts. However, it is too simple to express the complex interaction of cognitive character, where the knowledge of a cognitive agent and its history of development over time play crucial roles in evaluating the informational content of data.
Example 2 (Artifical Neural Network). An ANN is a cognitive system with the following components.
Internal States: the weights, biases, and activation states of all neurons at a given time.
External States. The external states are the input vectors provided to the network, representing observations from the environment.
Knowledge States: a subset of the internal states, the part of the system’s state that “encodes" past training data. In this case, the memory of knowledge is not simply a storage of copies, but a deeper structure of values that, in the context of network connections, synthesizes the experience, providing the competencies acquired by the ANN during its training history.
Knowledge Encoding: the function γ of the forward-pass computation of the network. It takes an input vector and maps it to a set of activated neurons, which represent its internal representation.
Assimilation Cost: the computational cost of the forward pass, or the cost of computing embedding vectors related to linguistic inputs in a LLM transformer model (in a chatbot case).
Accommodation Cost is where the model shines. It is the cost of the backpropagation algorithm and the subsequent weight updates. This is the computational effort required to adjust the knowledge states (weights) to minimize the error between its prediction and the new data point. A data point that is consistent with the network’s current knowledge will have a low error and, therefore, a low accommodation cost. A data point that is a "surprise" or an "outlier" will have a high error, triggering a large weight adjustment and thus incurring a high accommodation cost. □
ANN training is a continuous process of assimilation (forward-pass) and accommodation (backpropagation) that defines the information content of each training sample. The dual-cost model assimilation-accommodation captures the full complexity of learning as a continuous process, defining the information content of data.
3.2. Incompleteness of Epistemic Reflexivity
Reflexivity is a fundamental phenomenon of logic and computation. It occurs when there is a function
f that maps a proper subset of
A into the whole set
A (expansive reflexivity), or when a set
A is mapped into a proper subset (contractive reflexivity). If
A is a finite set, then this is impossible; therefore, reflexivity is strictly related to infinity and an essential notion in the foundations of mathematics [
9,
25]. There are many forms of reflexivity. For example, recurrence is a case of reflexivity, where a new value of a function is defined in terms of previously defined values. Reflexive patterns constitute the logical schema of many paradoxes (Liar paradox, Russel paradox, Richard’s paradox …) from which fundamental logical theories stemmed [
39].
A
reflexion over a set
A is a function
f from a proper subset
B of
A to
A. The reflexion is
complete if it is surjective, that is, if all elements of
A are images of elements of
B; otherwise, it is
incomplete. A set
A is
reflexive, or Dedekind infinite, when a 1-to-1 function exists between a proper subset
B of
A and
A. Any infinite set is reflexive [
25], and no finite set can be reflexive. A cognitive system, even if it is physically finite in any instant of its life, is potentially infinite when its states are defined on values that range over infinite sets.
According to the definition of a cognitive system , we know that . Now we show that the encoding does not completely cover the internal states, that is, the set of -images is strictly included in the internal states. There are internal states that are not encoded by any knowledge state: these states are inaccessible (to the knowledge of ). This result resembles typical results in mathematical logic, and this emphasizes its general relevance. The relationship with classical results of mathematical logic will be discussed below.
Theorem 1.
In any cognitive system Γ
, the knowledge encoding γ is an incomplete reflexion:
Proof. Let
and
(time parameter
t is omitted, because it does not impact the following reasoning). Of course, sets
E and
I are subsets of
, and
,
. Let us consider the biggest subset
A of
I such that
:
The states in
A are “autoreferential", because they are
-images of states in
A. Let
U be the set of knowledge states that are neither in
E nor in
A. Then,
, and:
If any state s of U is a -image of some state , then, by the definition of A, the inclusion holds, but this is impossible because A and U are disjoint sets. Therefore, if , some of its states are not -images of (other states of) U. These states can neither be -images of A nor -images of E, because the -images of A are in A and the -images of E are external states; therefore they are inaccessible. If , then all the internal states in would be inaccessible, because, again, these states can neither be -images of A nor -images of E (-images of A are in A and the -images of E are external states). If also , for the same reason, all the states in E are inaccessible. Therefore, in all possible cases, surely in there are internal states that are inaccessible.
□
It is surprising that from very general requirements on cognitive systems, it follows the existence of inaccessible internal states. In other words, we can conclude that a cognitive system has to include an internal part of “ignorance" that is necessary for keeping a coherent distinction of the self-non-self distinction.
The
incompleteness theorems of mathematical logic are at the origin of computer science. When mathematicians defined logical calculi of predicate logic that could deduce all the logical consequences of some given axioms [
6,
15], they also soon discovered that, in predicate logic, for axiomatic theories reaching some logical complexity, there exist propositions
p such that either
p or
cannot be deduced from the axioms (first Gödel’s incompleteness theorem [
6]).
Kurt Gödel [
12] invented the
arithmetization of syntax for encoding propositions of Peano’s arithmetic by numbers, within the axiomatic system
formulated in predicate logic. His celebrated theorem asserts that any theory including Peano Arithmetic is logically incomplete because, under any interpretation of the theory, there are true propositions that cannot be deduced as theorems of the theory. In
for any proposition
p there is a number
encoding
p. Gödel determined a formula
of
such that
is true in
if and only if
p is deducible in PA. This is a case of reflexivity that we could say
autorefentiality, where numbers encode propositions on numbers. The original proof by Gödel was based on one of the most interesting cases of reflexivity. It was given by constructing in
a formula
expressing, via the
D predicate, its undeducibility in
(a transformation of the Liar Paradox sentence, going back to Epimenides, VI Century A.C.). Let ⊢ be a symbol expressing the deducibility and
the undeducibility. Let us assume that
is coherent, that is, for no formula
of
, both conditions
and
can hold.
If were deducible in , then , that is, would be true, so would not be deducible, whence .
If
were deducible, then
, then
would be true, so it is not true that
is undeducible, therefore
, which implies
. In conclusion:
and
This means that both
and
cannot be deducible, but one of them has to be surely true, then there exists a true formula of
that is not deducible in
.
The construction of is very technical, so we prefer to present another proof that relates the logical incompleteness to Turing’s notion of computability.
Theorem 2 (Gödel’s logical incompleteness). In Peano Arithmetic , there are true arithmetic propositions that cannot be deduced within .
Proof. Any recursively enumerable set
A of numbers [
28] can be represented in PA (by expressing the computation of the Turing machine generating it). This means that there exists a formula
such that
is deduced if and only if
. Moreover, Alan Turing [
41] defined (using a diagonal construction due to Cantor) a recursively enumerable set
K of numbers that is not decidable, because no Turing machine exists that can always establish, in general, in a finite number of steps, whether a given element belongs to it. Therefore, there exists a number
a such that it is impossible to decide in a finite number of computation steps whether
or
. This means that both
and
are not deducible in
, otherwise simulating
deductions with a Turing machine computations (easily realizable with Gödel arithmetization) we could decide the membership of
a to
K. Of course, at least one of the two formulae above is surely true in
. Therefore, there are true propositions of
that cannot be deduced in
. □
We now provide a proof of the theorem above using the definition of a cognitive system, which demonstrates the power of such a general definition. A theory is logically coherent if, for no formula p, both p and are deduced in the theory. An extension of a theory is a theory such that all the formulae deducible in are also deducible in . The theory is a proper extension, since in can be deduced formulae that are not deduced in . A coherent theory is logically complete if all its true formulae are deducible in it, that is, if for any formula p of the theory either p or is deducible in the theory. A theory that is not logically complete is logically incomplete.
Theorem 3 (Logical incompleteness via Cognitive Systems). True formulae of any logically coherent theory that is an extension of cannot coincide with the formulas deducible in the theory.
Proof. The proof follows from the possibility of defining as a cognitive system. True formulae of are its internal states, while false formulae of are its external states. Knowledge states of are the deduction formulae of , expressed by the deduction predicate D. The formula encodes the true formula p, while encodes the false formula p.
Even if they are not relevant in the proof, we mention that in as a cognitive system, the time could be that of the theory extension steps (see later on); the assimilation cost could be some measure of the complexity of deductions; and the accommodation cost could be that of reorganizing the theory (adding new axioms) to prove a true proposition that cannot be deduced.
From the theorem of epistemic incompleteness, we can deduce that knowledge states cannot encode all the internal states. In as a cognitive system, this implies that the true formulae cannot coincide with the deducible formulae of , because internal states of do not coincide with those represented by knowledge states of . □
Another proof of Gödel’s incompleteness follows from the deduction predicate D and the existence of extensions of . The following argument is interesting in cognitive terms, because it shows the positive role of incompleteness as a guarantee of enrichment.
Theorem 4 (Logical incompleteness via Theory Extensions). The logical incompleteness of theory follows from the fact that it can be coherently extended.
Proof. Let us assume that is logically complete, then its true formulas coincide with the deducible formulae of .
Now, let q be a formula that cannot be deduced from the Peano axioms, then is true, so it is deducible in . If we extend by adding q to it, the formula is true in the new theory and deducible in it, but the theory becomes incoherent. This means that if were complete, it cannot be extended to any coherent theory. On the other hand, we know that if is coherent, it can be extended to another coherent theory (including, for example, negative numbers). This implies again that true and deducible formulae of cannot coincide, that is, is not complete. □
Figure 1 can be applied to a formal theory, in accordance with its interpretation in terms of a cognitive system. The internal circle represents deducibility formulas, the white circle true formulas, and the external border false propositions.
Figure 1.
A diagram that illustrates the reflexivity of a cognitive system. The internal circle is the knowledge states, the white oval is the set of internal states, while the border around it is the set of external states. The arrow is the encoding of knowledge states.
Figure 1.
A diagram that illustrates the reflexivity of a cognitive system. The internal circle is the knowledge states, the white oval is the set of internal states, while the border around it is the set of external states. The arrow is the encoding of knowledge states.
Figure 2.
The structure of internal states. The subset E contains the encoding of external states. The subset A contains the autoreferential states, and the encoding of internal states. Some of the subsets , are not empty and contain inaccessible states.
Figure 2.
The structure of internal states. The subset E contains the encoding of external states. The subset A contains the autoreferential states, and the encoding of internal states. Some of the subsets , are not empty and contain inaccessible states.
3.3. Epistemic Levels
The given notion of a cognitive system is very general. In real cases of such systems, we must consider their internal structure, which is often articulated in a vast number of functional modules, organized at multiple levels [
24]. The knowledge states and their encoding are surely a crucial aspect of the cognitive architecture. It is natural to assume a notion of
knowledge level as the number of iterations of the encoding
given in our definition. Namely, a knowledge state
s encodes an internal state
, but even this state can encode at a higher level of representation the state
. Higher knowledge levels enhance the connection and elaboration of data processing. Moreover, it is also reasonable that
is not a simple function, but it is the union of many encoding functions, which act in dependence on the parts and on the levels of the states to which they apply.
Encodings, as well as all the “objects" of knowledge, are functional modules, which can be encoded by real vectors that generalize embedding vectors, say feature vectors. In this way, they become data at further knowledge levels. Thus, cognition is intrinsically reflexive, in a way analogous to arithmetic and all fundamental mathematical theories.
Transformer LLM models demonstrate [
4,
5] that comprehension is expressed through the values dynamically generated during data processing, which essentially consist of embedding vectors in multidimensional Hilbert spaces. The further knowledge levels encode data into other semantic spaces, adding levels of internal comprehension to more complex meanings. When modules are encoded by feature vectors
f.v., structures resembling formal theories,
f.v.t., realized by suitable functional modules, provide operations and relations, generating other kinds of feature vectors too
f.v.t.v, which again encode, at a superior level, functions of lower levels. In this, some details are lost, but features appear that express new relationships.
Therefore, meanings of a cognitive system are not static values memorized in some locations, but rather processes of localization in abstract spaces that geometrize data and their relations in corresponding "visualizations" within those spaces. Based on the first encoding of embedding vectors, learning/training mechanisms acquire specific competencies by reducing the errors between computed and expected functions that realize such competencies. When competencies are acquired, a cognitive system reaches the first step in constructing its structures.
In the superior levels, cognitive systems elaborate a representation of the external world and define finalities and strategies based on self-knowledge. Any level may include many sublevels; however, the three main levels, illustrated in
Figure 3, can be denoted as: the
operative level of competencies; the
coordinative level, where the elements of the operative level and their relations are represented and integrated as different parts of a global competence; the
directive level, where the system elaborates tasks and strategies, possibly organized in a hierarchy of finalities. The most superior level encodes a representation of itself, that is, a full consciousness. The intelligence of the system emerges as the ability to coordinate and integrate the functionalities of all cognitive levels, by finding the best representations, and by finalizing its behavior to the situations of its life. In the superior levels, a cognitive system needs to incorporate theories or similar structures.
The system of
Figure 3 consists of functional modules. Arrows are channels sending feature vectors. represented by full squares. From the outlined perspective. Some feature vectors encode functional modules. In this way, a cognitive system can get knowledge of its own functionalities.
The encoding mechanisms and knowledge levels are not prefixed according to an external project elaborated by a designer; rather, they emerge following an internal principle that propagates and generalizes the transformer approach toward a multilevel perspective. Implementations of this generalization in artificial systems remain an open problem for future research. However, the key point toward multilevel transformers is the possibility of recognizing modules as functional units and encoding them through suitable feature vectors.
Figure 4 shows a possible way to encode a functional level of an inferior level by an expression of weights and bias, which can be translated into a feature vector (the functional circuit of the module is expressed down, on the right, where circles are a sigmoid function).
Thus, the encoding of functional modules upward propagates the basic transformer mechanism.
In the middle level, embedding vectors give meaning to words and discourses. Therefore, feature vectors exploit embedding vector semantics to describe functional modules and introduce reflexive knowledge about themselves.
Logical operations underlying the logic of natural language are functions [
23]; therefore, they are realized by functional modules. This means that chatbots, suitably extended, could, in principle, be able to “understand" the mechanisms on which their comprehension is based. In other words, the encoding of functional modules is a crucial step to increase the reflexivity of cognitive systems. Understanding how to realize the encoding of functional modules is a key research topic for both artificial intelligence and theoretical neurophysiology.
This scenario requires new levels of training. Training by example is too limited. Training by reasoning is the method that induces a cognitive system to organize its structures toward a multilevel transformer representation of its internal knowledge. It has to be based on natural language and developed through conversational activities. The semantic spaces that underlie the different knowledge levels add new dimensions according to a Chinese boxes mechanism where a single coordinate may refer to a point encapsulating the features of some hidden semantic space.