2. Results
We consider binary strings
containing symbols
, which are our basis AT
objects [
2], with
zeros and
ones, having a fixed length
. We consider strings to be
messages transmitted through a communication channel between a source and a receiver, similarly to the Claude Shannon approach used in the derivation of information entropy [
28], and consider the process of their formation within the AT framework.
Definition 1. A string assembly index is the smallest number of steps s required to assemble a binary string of length N by joining two basic symbols and strings joined in previous steps. Therefore, the assembly index is a function of the string .
For example, the string
can be assembled in seven steps:
join 0 with 0 to form , adding to the pool,
join with 1 to form , adding to the pool,
...
join with 1 to form ,
five steps:
join 0 with 0 to form , adding to the pool,
join with 1 to form , adding to the pool,
join with pattern taken from the pool to form , adding to the pool,
join with 0 to form , adding to the pool,
join with 1 to form ,
or at least four steps:
join 0 with 1 to form , adding to the pool,
join with 0 to form , adding to the pool,
join with pattern taken from the pool to form , adding to the pool,
join with pattern taken from the pool to form .
Therefore, the string (
1) has an assembly index
that represents the length of the shortest assembly pathway leading to its assembly.
cannot be assembled in a simpler way.
Definition 2. A string is a balanced string if it has the same number of symbols, where or if N is odd.
Without loss of generality, we assume that if N is odd, (e.g., for , , and ). However, our results are equivalently applicable if we assume the opposite (i.e. a larger number of ones for an odd N).
The number
of balanced strings among all
strings is
1This is OEIS A001405 sequence, the maximal number of subsets of an
N-set such that no one contains another, as asserted by Sperner’s theorem, and approximated using Stirling’s approximation for large
N.

Theorem 1. A string having length is the shortest string having more than one string assembly index 1.
Proof. The proof is trivial. For
the assembly index
, as all basis
objects have a pathway assembly index of 0 [
2] (they are not
assembled).
provides four available strings with
.
provides eight available strings with
. Only
provides 16 strings that include four stings with
and twelve strings with
including
balanced strings, as shown in
Table 1 and
Table 2.
For example, to assemble the string we need to assemble the string and reuse it. Therefore, for and for , where denotes a set of different assembly indices. □

In the following, we derive the tight lower bound of the set of different string assembly indices 1.
Table 1.
Distribution of the assembly indices for .
Table 1.
Distribution of the assembly indices for .
| |
|
|
|
0 |
1 |
2 |
3 |
4 |
| 2 |
4 |
1 |
|
2 |
|
1 |
| 3 |
12 |
|
4 |
4 |
4 |
|
| 16 |
1 |
4 |
|
4 |
1 |
Table 2.
balanced strings .
Table 2.
balanced strings .
| k |
|
|
| 1 |
(0 |
1) |
(0 |
1) |
2 |
| 2 |
(1 |
0) |
(1 |
0) |
2 |
| 3 |
0 |
1 |
1 |
0 |
3 |
| 4 |
1 |
1 |
0 |
0 |
3 |
| 5 |
1 |
0 |
0 |
1 |
3 |
| 6 |
0 |
0 |
1 |
1 |
3 |
Table 1 and
Table A2–
Table A9 (
Appendix C) show the distributions of the assembly indices among
strings for
taking into account the number of ones
. The sums of each column form Pascal’s triangle read by rows (OEIS sequence A007318).
Theorem 2 (Tight lower bound on the string assembly index). The smallest string assembly index as a function of N is given by the OEIS sequence A014701.
Proof. Strings for which , can be formed by joining two basic symbols, adding the pair to the pool and joining the longest strings taken from the pool until N is reached.
If
,
then
. Therefore, we can use the following procedure


In other words, we recursively calculate the remainder of a division of
N by the largest
and increment the assembly index with every step
s, noting that only at the first step it was equal to the largest
s.
This procedure reflects the assembly of the string but gives the same as the procedure (OEIS A014701)

for the number of steps
s to reach 1 starting from
N. □
In the following, we conjecture the form of the upper bound of the set of different string assembly indices 1.
In general, of all strings
having a given assembly index, shown in
Table 1 and
Table A2–
Table A9, most are those having
. The only exceptions are
for
(
) and for
(
),
for
(
) and for
(
), and
for
(
).
Introducing the definition 2 of a balanced string allows us to reduce the search space of possible strings with maximal assembly indices to balanced strings only. With the exception of , of all strings having a maximum assembly index, most are balanced.
We can further restrict the search space to distinct strings.
Definition 3. A string is a distinct string if a ring formed with this string by joining its beginning with its end is unique among the rings formed from the other distinct strings .
There are at least two and at most
N forms of a distinct string
that differ in the position of the starting symbol. For example for
balanced strings, shown in
Table 2, two augmented strings with
correspond to each other if we change the starting symbol
Similarly, four augmented strings with
correspond to each other
after a change in the position of the starting symbol. Thus, there are only two distinct strings for
The number of distinct strings among all strings is given by the OEIS sequence A000031. In general (for ), the number of distinct strings is much lower than the number of balanced strings.

By neglecting the notion of the beginning and end of a string, we focus on its length and content. In Yoda’s language,
"complete, no matter where it begins. A message is".
The numbers of the balanced
, distinct
, and balanced distinct
2 strings are shown in
Table 3 and
Figure 1.
We note that, in general, the starting symbol is relevant for the assembly index. Thus, different forms of a distinct string may have different assembly indices. For example, for
balanced strings
and
, shown in
Table A12 have
. However, these strings are not distinct, since they correspond to each other and to the balanced strings
,
,
,
, and
with
. They all have the same triplet of adjoining ones.
Definition 4. The assembly index of a distinct string is the smallest assembly index among all forms of this string.
Thus, if different forms of a distinct string have different assembly indices, we assign the smallest assembly index to this string. In other words, we assume that the smallest number of steps
where
denotes a particular form of a distinct string
, is the string assembly index of this distinct string.

The distribution of the assembly indices of the balanced distinct strings Ek is shown in Table 4.
Table 4.
Distribution of assembly indices among balanced distinct strings for .
Table 4.
Distribution of assembly indices among balanced distinct strings for .
| N |
|
|
|
|
|
|
|
|
| 4 |
2 |
1 |
1 |
|
|
|
|
|
| 5 |
2 |
|
1 |
1 |
|
|
|
|
| 6 |
4 |
|
1 |
2 |
1 |
|
|
|
| 7 |
5 |
|
|
2 |
3 |
|
|
|
| 8 |
10 |
|
1 |
1 |
6 |
2 |
|
|
| 9 |
14 |
|
|
1 |
4 |
7 |
2 |
|
| 10 |
26 |
|
|
1 |
6 |
9 |
10 |
|
| 11 |
42 |
|
|
|
2 |
14 |
20 |
6 |
If a string for which is constructed from repeating patterns, then a string for which must be the most patternless. The string assembly index must be bounded from above and must be a monotonically nondecreasing function of N that can increase at most by one between N and .
Identifying the shortest pathway is known to be computationally challenging [
3]. This problem has been proven to be at least as hard as NP-complete [
39]. However, certain heuristic rules apply in our binary case. For example,
for we cannot avoid two doublets (e.g. ) within a distinct string and thus ,
for we cannot avoid two pairs of doublets (e.g. and ) within a distinct string and thus ,
for we cannot avoid three pairs of doublets (e.g. , , and ) within a distinct string and thus ,
for we cannot avoid two pairs of doublets and one doublet three times (e.g. , , and , and thus ,
etc.
Conjecture 1.
The problem of determining the assembly index of a given binary stringis NP-complete [39], while the problem of creating the string so that it would have a predetermined maximum assembly index for this length of the string is NP-hard.
We found it much easier to determine an assembly index of a given binary string than to create a string so that it would have a maximum assembly index as a function of the length of the string. A proof of conjecture 1 would also be the proof of the following conjecture.
Every computable problem and every computable solution can be encoded as a finite binary string. Here, determining whether the assembly index of a given string has its known maximal value corresponds to checking the solution to a problem for correctness, whereas creating such a string corresponds to solving the problem. Thus, AT would solve the P versus NP problem in theoretical computer science.
Table 5 shows the exemplary balanced strings
having maximal assembly indices that we created (cf. also
Appendix B). To determine the assembly index
of the string
we look for the longest patterns that appear at least twice within the string, and we look for the largest number of these patterns. Here, we find that each of the two triplets
and
appear twice in
and are based on the doublets
and
also appearing in
. Thus, we start with the assembly pool
made in four steps and join the elements of the pool in the following seven steps to arrive at
. On the other hand, another form of this balanced distinct string
has
.
Conjecture 3 (Tight upper bound on a string assembly index) With exceptions for small N the largest string assembly index of a binary string as a function of N is given by a sequence formed by strings for , where denotes increasing by one, and 0 denotes maintaining it at the same level, and .
However, at this moment, we cannot state if this conjecture applies to distinct or non-distinct strings. The assembly indices for
are unique, whereas the assembly indices for
were discussed above and are calculated in
Appendix C for balanced and balanced distinct strings.
The conjectured sequence is shown in
Figure 2 and
Figure 3 starting with
(we note in passing that
is a dimension of the void, the empty set
∅, or (-1)-simplex). Subsequent terms are given by
, which is periodic for
and flattens at
, and
,
,
.
This sequence can be generated using the following procedure

We note the similarity of this bound to the Aufbau rule
3, the Janet sequence (OEIS A167268) and the monotonically non-decreasing Shannon entropy of chemical elements, including observable ones [
23]. Perhaps the exceptions in the sequence 3 vanish as
N increases.
The bounds 2 and 3 are shown in
Table 3 and illustrated in
Figure 2 and
Figure 3. A binary string can be assembled in a number of steps bounded from below by the bound 3 and, as we conjecture for large
N, bounded from above by the bound 2.
The Hamlet tragedy contains approximately 130,000 letters. Assigning five bits per letter (32 possibilities), the Hamlet tragedy can be encoded in a string having
bits (81.25 kB) yielding the total number of possible strings
(including
), and their assembly indices are bounded by
The lower bound (
8) can be calculated directly using the procedures of Theorem 2. The upper bound (
8) can be estimated by finding the smallest
k that satisfies
and using the relation
of Conjecture 3.
We assume that the assembly index of the string encoding the actual Hamlet tragedy is close to the upper bound. Even if the probability of random typing of the Hamlet tragedy is unfathomably small, when constrained to the bounds of the physical universe [
5], as asserted by the infinite monkey theorem, this tragedy was once created by William Shakespeare.
SARS-CoV-2 genome sequence contains 29903 bases
. Assigning two bits per base it can be encoded in a string of
bits having the assembly index bounded by

Figure 2.
Tight lower bound on the string assembly index 2 (red) and (red, dash-dot), conjectured upper bound on the string assembly index 3 (green), factual values of the string assembly index (blue) and the distinct string assembly index (cyan) and (green, dash-dot), for .
Figure 2.
Tight lower bound on the string assembly index 2 (red) and (red, dash-dot), conjectured upper bound on the string assembly index 3 (green), factual values of the string assembly index (blue) and the distinct string assembly index (cyan) and (green, dash-dot), for .
Figure 3.
Tight lower bound on the string assembly index 2 (red) and (red, dash-dot), conjectured upper bound on the string assembly index 3 (green) and (green, dash-dot), for .
Figure 3.
Tight lower bound on the string assembly index 2 (red) and (red, dash-dot), conjectured upper bound on the string assembly index 3 (green) and (green, dash-dot), for .