This approach will be the main inspiration for the following definition.
3.1. A General Definition
Let
S be a state space, for example, a Hilbert space of
features. When in the state space
S, we consider time, then we pass to an event-state space. This is a measurable space
, where
M is a
-algebra of subsets of
S, and
p is a probability measure [
11], and an
event is a pair
where
is a subset of states that may occur at time
t (the notion of event occurrence may be further specified in many possible ways, according to parameters of localization/distribution in the considered state space).
A cognitive system is a dynamic structure on an event-state space. Intuitively, a cognitive system develops in time by interacting with its external world through several levels of representation of the reality it perceives. The information content of data coming from the outside is given by the costs that the internal representation requires during the assimilation and accommodation phases of this representation.
Representing objects, events, and states of affairs gains the possibility of discovering relations, making inferences, predictions, and decisions for fruitful interactions with the external world.
The essence of these abilities is the notion of mathematical function, going back to Euler’s definition in
Introductio in Analysin Infinitorum (1748), and formalized in by Alonzo Church in 1930 years in terms of
-expressions, that is, a map sending objects of a set to images of them, in such a way that operating on images gives an advantage with respect to operating directly on the objects. A map of a city is a simple way to understand it. On a map, we can choose the best way to connect two places by exploiting the quick overview of a reduced and synthetic representation of streets. When symbols or symbolic structures map objects, functions are named codes. Typically, a code is a function from strings over a finite alphabet of symbols,
codewords, into a set of encoded objects. Coding theory is a widely investigated subject, central in Information Theory [
7]. We do not enter into specific details about codes, but want to stress their critical role in cognition.
At any time, a cognitive system is placed at a point (or a set of possible points) in a multidimensional space of states. Some of these states are knowledge states with representational roles.
Coding, possibly articulated in parts and levels, is the main ingredient of cognitive dynamics. Considering a cognitive system only in terms of states and codes is an “extensional" approach, because it prescinds on the physical realization of states (molecular, biomolecular, cellular, circuital, …). In the following, we adopt this viewpoint and will show that, although it is very general and abstract, it allows us to derive interesting properties.
The main intuition of the following definition of a cognitive system is based on internal states, external states, and knowledge states.
Definition 1.
A cognitive system Γ has seven components:
are non-empty sets of states of an assumed state space S, where t ranges in a set T denoting the lifetime of Γ (Γ is equipped by an internal clock, ), and is an encoding function; μ and η are cost functions, of data encoding , and of encoding updating in passing from to , respectively. The following conditions are required for Γ (the temporal parameter is omitted by intending that conditions hold for every time t):
and are non-empty disjoint sets of the internal and external states of Γ; is the set of knowledge states , where ;
γ i s the knowledge encoding
function:
such that, for any argument x, the no fixpont
condition holds:
is a measure of information, the epistemic information
of an input datum d. It is the sum of two components:
the μ component expresses the assimilation cost of d. The η component expresses the corresponding accommodation cost , which may require a reorganization of γ to reach some consistency and adequacy features of its global structure. □
An internal state is accessible only if it is a -image of a knowledge state (this request holds even for knowledge states). In other words, accessibility requires, in any case, a true representational level, through a knowledge state that is different from the state it represents.
In a cognitive system,
may increase and enrich over time; the costs of data acquisition from the system may change dramatically during its lifetime, depending on the construction of its knowledge. Being
and
disjoint, also their counterimages are disjoint, hence no confusion arises between the knowledge encodings of internal and external states, in symbols:
External states are based on input data expressing the information coming to from the outside. A part of the internal state is available to the outside, providing the output data of .
Example 1
(Shannon’s Information Source).
Let us consider a basic cognitive system that takes in input a sequence al symbols over a finite alphabet and assimilates them internally by a knowledge encoding γ that constructs a table of binary encodings of symbols according to a Huffman code, where more probable symbols receive shorter representations [7]. The external states are input sequences to the system; the internal states are tables of pairs (symbol occurrence, binary representation), and the knowledge states are the binary codewords of input symbols with their occurrence multiplicity. □
If the assimilation cost is the length of the binary representation, and the accommodation cost is null, the epistemic information of symbol i results in with the frequency of i in the input sequence. This corresponds exactly to Shannon’s notion of information quantity.
By classical information theory, we know that the average information of symbols in the sequence corresponds to Shannon’s entropy, and no code can exist that has a shorter average information. This means that Shannon entropy is an inferior limit to the average Shannon’s information quantity, and Huffman codes are optimal codes for this information measure. In other words, Shannon’s information quantity is a measure of a basic cognitive system, where knowledge encoding minimizes the average length of codewords (and no assimilation costs are considered).
Shannon’s framework is simple yet powerful enough to develop numerous informational concepts, making it very useful in many contexts. However, it is too simple to express the complex interaction of cognitive character, where the knowledge of a cognitive agent and its history of development over time play crucial roles in evaluating the informational content of data.
Example 2
(Artifical Neural Network). An ANN is a cognitive system with the following components.
Internal States: the weights, biases, and activation states of all neurons at a given time.
External States. The external states are the input vectors provided to the network, representing observations from the environment.
Knowledge States: a subset of the internal states, the part of the system’s state that “encodes" past training data. In this case, the memory of knowledge is not simply a storage of copies, but a deeper structure of values that, in the context of network connections, synthesizes experience, providing the competencies acquired by the ANN during its training.
Knowledge Encoding: the function γ of the forward-pass computation of the network. It takes an input vector and maps it to a set of activated neurons, which represent its internal representation.
Assimilation Cost: the computational cost of the forward pass, or the cost of computing embedding vectors related to linguistic inputs in a LLM transformer model (in a chatbot case).
Accommodation Cost is where the model shines. It is the cost of the backpropagation algorithm and the subsequent weight updates. This is the computational effort required to adjust the knowledge states (weights) to minimize the error between its prediction and the new data point. A data point that is consistent with the network’s current knowledge will have a low error and, therefore, a low accommodation cost. A data point that is a "surprise" or an "outlier" will have a high error, triggering a large weight adjustment and thus incurring a high accommodation cost. □
ANN training is a continuous process of assimilation (forward-pass) and accommodation (backpropagation) that defines the information content of each training sample. The dual-cost model assimilation-accommodation captures the full complexity of learning as a continuous process and defines the information content of data.
3.2. Epistemic Incompleteness
Reflexivity is a fundamental phenomenon of logic and computation. It occurs when there is a function
f that maps a proper subset of
A into the whole set
A (expansive reflexivity), or when a set
A is mapped into a proper subset (contractive reflexivity). If
A is a finite set, then this is impossible; therefore, reflexivity is strictly related to infinity and an essential notion in the foundations of mathematics [
9,
26]. There are many forms of reflexivity. For example, recurrence is a case of reflexivity, where a new value of a function is defined in terms of previously defined values. Reflexive patterns constitute the logical schema of many paradoxes (Liar paradox, Russel paradox, Richard’s paradox …) from which fundamental logical theories stemmed [
40].
A
reflexion over a set
A is a function
f from a proper subset
B of
A to
A. The reflexion is
complete if it is surjective, that is, if all elements of
A are images of elements of
B; otherwise, it is
incomplete. A set
A is
reflexive, or Dedekind infinite, when a 1-to-1 function exists between a proper subset
B of
A and
A. Any infinite set is reflexive [
26], and no finite set can be reflexive. A cognitive system, even if it is physically finite in any instant of its life, is potentially infinite when its states are defined on values that range over infinite sets.
According to the definition of a cognitive system , we know that . Now we show that the encoding does not completely cover the internal states, that is, the set of -images is strictly included in the internal states. There are internal states that are not encoded by any knowledge state: these states are inaccessible (to the knowledge of ). This result resembles typical results in mathematical logic, and this emphasizes its general relevance. The relationship with classical results of mathematical logic will be discussed below.
Theorem 1.
Epistemic Incompleteness (Inaccessibility in Cognitive Systems)
In any cognitive system Γ, the knowledge encoding γ is an incomplete reflexion:
Proof. Let
and
(time parameter
t is omitted, because it does not impact the following reasoning). Of course, sets
E and
I are subsets of
, and
,
(see Equation (
4)).
Let us consider the following two alternatives.
1) , that is, I is -autoreferential: for all there exist such that . In this case, all the internal states of (which is a non-empty set) are inaccessible, because they cannot be -images of knowledge states (the -images of E are external states).
2) if
, let
A be the biggest proper subset of
I that is
-autoreferential. Then, the elements of the non-empty set
are not
-images of elements of
U. Namely, any element
cannot verify an equation
with
, otherwise
A would not be the biggest autoreferential subset of
I. In conclusion, the elements of
U are not accessible, because they cannot be
-images of
U, nor
-images of
A (which is autoreferential), nor
-images of
E (which are external states). In conclusion, in both cases, surely in
there are internal states that are inaccessible (see
Figure 1).
It is surprising that from very general requirements on cognitive systems, it follows the existence of inaccessible internal states. In other words, we can conclude that a cognitive system has to include an internal part of “ignorance" that is necessary for keeping a coherent self-non-self distinction. We notice that, in the general setting of the previous definition of cognitive systems, the “knowledge” of a state does not coincide with its membership to , but with the possibility of having a -image representing it. Namely, a state q of can express the knowledge of an internal state s, but the knowledge of q requires an image of it (different from q) representing it. In other words, knowledge is a continuous process of representation that can be iterated at many representation levels.
The
incompleteness theorems of mathematical logic are at the origin of computer science. Logical calculi over formulae define relations ⊢ holding between a set of formulae
(premisses) and a formula
(conclusion). We write
for indicating that
is deduced from
via some logical calculus. The logical consequence is an analogous relation, indicated by
, indicating that in all interpretations where all formulae of
are true, also
is true. In first-order logic, or predicate logic ([
6,
15]), mathematicians defined logical calculi that are complete (Gödel, 1930), in the sense that these calculi could deduce all the logically valid propositions (valid in all the interpretations) [
6,
12]. However, Gödel also soon discovered (1931) that, in predicate logic, for axiomatic theories reaching some logical complexity, there exist propositions
p such that either
p or
cannot be deduced from the axioms (first Gödel’s incompleteness theorem [
12]). The theory PA of Peano’s Axioms, expressed in first-order logic, presents such an incompleteness.
Kurt Gödel [
12] invented the
arithmetization of syntax for encoding propositions of Peano’s arithmetic by numbers, within the axiomatic system
formulated in predicate logic. His celebrated theorem asserts that any theory including Peano Arithmetic is logically incomplete because, under any interpretation of the theory, there are true propositions that cannot be deduced as theorems of the theory. In
for any proposition
p there is a number
encoding
p. Gödel determined a formula
of
such that
is true in
if and only if
p is deducible in PA. This is a case of reflexivity that we could say
autorefentiality, where numbers encode propositions on numbers. The original proof by Gödel was based on one of the most interesting cases of reflexivity. It was given by constructing in
a formula
expressing, via the
D predicate, its undeducibility in
(a transformation of the Liar Paradox sentence, going back to Epimenides, VI Century A.C.). Let
be a symbol expressing the undeducibility. Let us assume that
is coherent, that is, for no formula
of
, both conditions
and
can hold.
If were deducible in , then , that is, would be true, so would not be deducible, whence .
If
were deducible, then
, then
would be true, so it is not true that
is undeducible, therefore
, which implies
. In conclusion:
and
This means that both and cannot be deducible, but one of them has to be surely true, then there exists a true formula of that is not deducible in .
The construction of is very technical, so we prefer to present another proof that relates the logical incompleteness to Turing’s notion of computability.
Theorem 2
(Gödel’s logical incompleteness). (via Turing’s undecidability)
In Peano Arithmetic , there are true arithmetic propositions that cannot be deduced within .
Proof. Any recursively enumerable set
A of numbers [
29] can be represented in PA (by expressing the computation of the Turing machine generating it). This means that there exists a formula
such that
is deduced if and only if
. Moreover, Alan Turing [
42] defined (using a diagonal construction due to Cantor) a recursively enumerable set
K of numbers that is not decidable, because no Turing machine exists that can always establish, in general, in a finite number of steps, whether a given element belongs to it. Therefore, there exists a number
a such that it is impossible to decide in a finite number of computation steps whether
or
. This means that both
and
are not deducible in
, otherwise simulating
deductions with a Turing machine computations (using Gödel arithmetization) we could decide the membership of
a to
K. Of course, at least one of the two formulae above is surely true in
. Therefore, there are true propositions of
that cannot be deduced in
. □
We now provide a proof of the theorem above using the definition of a cognitive system, which demonstrates the power of such a general definition.
Theorem 3
(Logical incompleteness via Cognitive Systems). The logical consequences of Peano’s axioms PA do not coincide with the formulas deducible from PA.
Proof. The proof follows from the possibility of defining as a cognitive system. True formulae of (the logical consequences of PA axioms) are its internal states, while false formulae of are its external states. Knowledge states of are the deduction formulae of , expressed by the deduction predicate D. The formula encodes the true formula p, while encodes the false formula p.
From the theorem of epistemic incompleteness, we can deduce that knowledge states cannot encode all the internal states. In as a cognitive system, this implies that the true formulae cannot coincide with the deducible formulae of , because internal states of do not coincide with those represented by knowledge states of .
We mention that in as a cognitive system, the time could be that of the theory extension steps (see later on); the assimilation cost could be some measure of the complexity of deductions; and the accommodation cost could be that of reorganizing the theory (adding new axioms) to prove a true proposition that cannot be deduced.
Figure 2 can be applied to a formal theory, in accordance with its interpretation in terms of a cognitive system. The internal circle represents deducibility formulas, the white circle true formulas, and the external border false propositions.
3.3. Epistemic Levels
The given notion of a cognitive system is very general. In real cases of such systems, we must consider their internal structure, which is often articulated in a vast number of functional modules, organized at multiple levels [
25]. The knowledge states and their encoding are surely a crucial aspect of the cognitive architecture. It is natural to assume a notion of
knowledge level as the number of iterations of the encoding
given in our definition. Namely, an internal state
s is encoded by an internal state
, but this state can be encoded at a higher level of representation by the state
. Higher knowledge levels enhance the connection and elaboration of data processing. Moreover, it is also reasonable that
is not a simple function, but it is the union of many encoding functions, which act in dependence on the parts and on the levels of the states to which they apply.
In ANN, the basic mechanism of knowledge encoding is based on the notion of embedding vectors. They are the core of the success of LLM neural networks, which realize chatbots by elaborating meanings in terms of numerical vectors.
Transformer LLM models demonstrate [
4,
5] that comprehension is expressed through the values dynamically generated during data processing, which essentially consist of embedding vectors in multidimensional Hilbert spaces.
The embedding vectors implicitly ensure an adequacy to distributional criteria of the kind outlined before, as much as they encode the right way words combine with the other words in a way that is coherent with the distributional profiles they have in the corpus on which the neural network was trained for acquiring its conversational competence.
Embedding vectors are the knowledge encoding of words, phrases, and concepts. They are the “objects" of knowledge, and are obtained through specific functional modules. In turn, these modules can also be encoded by real vectors analogous to embedding vectors. In this way, they become data at further knowledge levels. Thus, cognition is intrinsically reflexive, in a way analogous to arithmetic and all fundamental mathematical theories.
The further knowledge levels encode data into other semantic spaces, adding internal comprehension to more complex meanings.
However, meanings of a cognitive system are not static values memorized in some locations, but rather processes of localization in abstract spaces that geometrize data and their relations in corresponding "visualizations" within those spaces.
The three main levels of a cognitive system are illustrated in
Figure 3: the
operative level of competencies; the
coordinative level, where the elements of the operative level and their relations are represented and integrated as different parts of a global competence; the
directive level, where the system elaborates tasks and strategies, possibly organized in a hierarchy of finalities. The most superior level encodes a representation of itself, that is, a full consciousness. The intelligence of the system emerges as the ability to coordinate and integrate the functionalities of all cognitive levels, by finding the best representations, and by finalizing its behavior to the situations of its life. In the superior levels, a cognitive system needs to incorporate theories or similar structures.
The system of
Figure 3 consists of functional modules. Arrows are channels sending feature vectors. represented by full squares. Some feature vectors encode functional modules. In this way, a cognitive system can get knowledge of its own functionalities.
The encoding mechanisms and knowledge levels are not prefixed according to an external project elaborated by a designer; rather, they emerge following an internal principle that propagates and generalizes the transformer approach toward a multilevel perspective. Implementations of this generalization in artificial systems remain an open problem for future research. However, the key point toward multilevel transformers is the possibility of recognizing modules as functional units and encoding them through suitable feature vectors.
Figure 4 shows a possible way to encode a functional level of an inferior level by an expression of weights and bias, which can be translated into a feature vector.
In the middle level, embedding vectors give meaning to words and discourses. Therefore, feature vectors exploit embedding vector semantics to describe functional modules and introduce reflexive knowledge about themselves.
Logical operations underlying the logic of natural language are functions [
24]; therefore, they are realized by functional modules. This means that chatbots, suitably extended, could, in principle, be able to “understand" the mechanisms on which their comprehension is based. In other words, the encoding of functional modules is a crucial step to increase the reflexivity of cognitive systems. Understanding how to realize the encoding of functional modules is a key research topic for both artificial intelligence and theoretical neurophysiology.
Figure 5 gives another view of a neural network by distinguishing the main activities and their connection to the cognitive component, where the knowledge representation is realized by specific modules.
This scenario requires new levels of training. Training by example is too limited. Training by reasoning is the method that induces a cognitive system to organize its structures toward a multilevel transformer representation of its internal knowledge. It has to be based on natural language and developed through conversational activities. The semantic spaces that underlie the different knowledge levels add new dimensions according to a Chinese boxes mechanism where a single coordinate may refer to a point encapsulating the features of some hidden semantic space.