1. Introduction
Assembly theory (AT), formulated in 2017, introduced the concept of an
initial pool [
1].
Definition 1. We call a set that contains different basic symbols c, where b is a finite natural radix, the initial assembly pool.
The reader will find numerous results on AT published since 2017 in refs. [
1,
2,
3,
4,
5,
6,
7,
8,
9].
In this short note, we extend the results of our previous study [
9] to strings of any natural radix
b. We consider the formation of strings
of length
N containing symbols from the initial assembly pool within the AT framework in consecutive assembly steps from basic symbols
c and strings assembled in previous steps.
Definition 2. We call a set that contains basic symbols and strings assembled in previous steps the working assembly pool.
Using the Definitions 1 and 2, the assembly index of a string is the minimal achievable value of a difference between the cardinalities of the working and initial assembly pools leading to this string, since at each assembly step the cardinality of the working assembly pool increases by one. Therefore, in contrast to the working assembly pool 2, the initial assembly pool 1 must not contain strings of basic symbols. To illustrate this, consider the following mapping between such a faulty initial assembly pool containing five basic symbols and three strings of these symbols and the initial assembly pool of radix
Now consider the string
assembled beginning with the initial assembly pool
and having the assembly index
only two steps above
. We can assemble the string
of length
in 7 steps with the initial assembly pool
and then, using the mapping (
1), it will correspond to the string (
2). However, as we shall show in the following section,
. In fact the latter string (
3) should be assembled as
with the assembly index
and with the initial assembly pool
, as
according to (
1).
The following two theorems were already stated in our previous study [
9] for
. We restate them here
for clarity.
Theorem 1. A string of length is the shortest string that allows for more than one assembly index for all b.
Proof.
provides available strings with unit assembly indices. provides available strings with assembly indices equal to two. Only provides strings that include b strings and strings with assembly indices equal to two, while the assembly index of the remaining strings is three. For example, to assemble the string , we need to assemble the string and reuse it from , while there is nothing available to reuse, in the case of the string . □
Where the symbol value can be arbitrary, we write * assuming that it is the same within the string. If we allow for the possibility different from *, we write ★. Thus, , for example, is a placeholder for all b strings, while a placeholder for all strings.
Theorem 2. The smallest string assembly index as a function of N corresponds to the shortest addition chain for N (OEIS A003313) for all b.
Proof. Strings
for which
,
can be formed in subsequent steps
s by joining the longest string assembled so far with itself until
is reached. Therefore, if
, then
. Only
strings have such an assembly index in this case, including
b strings
and
strings
and the assembly pathway of each of the strings (
5) and (
6) is unique. At each assembly step, its length doubles.
An addition chain for
having the shortest length
(commonly denoted as
) is defined as a sequence
of integers such that
,
for
. The first step in creating an addition chain for
N is always
and this corresponds to assembling a doublet
or
from the initial assembly pool
P. Thus, the lower bound for
s of the addition chain for
N,
is achieved for
by strings (
5) and (
6) .
The second step in creating an addition chain can be or . Thus, finding the shortest addition chain for N corresponds to finding an assembly index of a string containing basic symbols and/or doublets and/or triplets containing these doublets for since due to Theorem 1 only they provide the same assembly indices . □
2. Results
The seven-bit string is the longest string that can have the maximum assembly index
. There are four such bitstrings containing two clear triplets and the starting bit at the end or the ending bit at the start, that is
and their lengths cannot be increased without a repetition of a doublet, which inevitably reduces the assembly index to
.
This observation and Theorem 2 motivated us to develop a general procedure to construct the longest possible string that has the assembly index , as a function of the radix . We denote the length of this string by .
After a few groping try-outs (cf. Appendices
Appendix A and
Appendix B) we eventually reached a stable procedure. We start with an initial balanced string of length
containing
b clear triplets ordered as
The doublets that can be inserted into the initial string (
8) can be arranged in a
matrix
where the crossed out entries on diagonal cannot be reused, as they would create repetitions in this string. If we assume that we shall not insert doublets between the clear triplets of the string (
8) and hence we can also cross out the entries on the first superdiagonal in the matrix (
9).
In the
step, we create a string containing doublets on the first subdiagonal of the matrix (
9) starting with 10
and we append it to the string (
8). With this step, we also eliminate the doublets on the second superdiagonal starting with the doublet 02, as well as the doublet
. In the
step, we create a string containing doublets on the third superdiagonal beginning with the doublet 03
and append it to the string created so far. With this step, we also remove the doublet
and the middle part of the second subdiagonal containing
. And so on.
We shall illustrate this process for
. The matrix
contains all the doublets that were used to create the string of length
For
we would obtain the string of length
for
we would obtain the string of length
for
we would obtain the string of length
for
we would obtain the string of length
and
leads to the following string of length
The final string is always terminated by 0.
The strings of odd lengths generated by the general procedure outlined above are not only the longest, but also the most balanced. This leads to the following theorem.
Theorem 3.
The longest length of a string composed of b different basic symbols that has the assembly index of is given by
(OEIS A353887) and this string is nearly balanced, that is
where is the number of occurrences of all but one symbol within the string.
Proof. The
given by formula (
19) is an odd number for all
b. As shown in
Table 1, the first element
is the length of the initial string (
8) containing
b clear triplets and
is the number of entries in the doublet matrix (
9) of the previous
b. By definition, a string of length
cannot have any repetitions; it can only contain doublets and clear triplets that do not contain these doublets. Therefore, to be the most patternless, this string must maximize Shannon entropy; must be the most balanced. For the string of the form (
20) the fractions in the Shannon entropy are
(where without loss of generality we assume that the symbol occuring
times within the string is
) and the Shannon entropy is
The strings given by the equation (
19) are not the shortest possible ones. Strings satisfying the equation (
20) and satisfying
are given by
(OEIS
A002522). However, they do not contain all the possible doublets and furthermore their entropies are smaller than the entropies of the strings given by the equation (
19).
Now, assume
a contrario that a string longer than
can be constructed, say of length
. But in this case, the corresponding
. The string of the length given by the formula (
19) maximizes the Shannon entropy if it must additionally satisfy the relation (
20). □
Although the case for
(only one symbol) is degenerate, the formula (
19) yields correct result; the string
is the longest string with
, as for
we simply have
(OEIS
A003313).
3. Conclusions
There is one string of length
, four strings of length
, seventy-two strings of length
(cf.
Appendix A). Their number for
requires further research.
Author Contributions
WB: First concept of a general procedure for constructing the string of length ; determining for ; third concept of a general procedure for constructing the string of length leading to theorem 3; noting that must be more balanced than ; numerous clarity corrections and improvements; PM: Second concept of a general procedure for constructing the string of length ; numerous clarity corrections and improvements; SŁ: The remaining part of the study.
Acknowledgments
The authors thank Andrzej Tomski for numerous clarity corrections and improvements and Mariola Bala for motivation. SŁ thanks his wife, Magdalena Bartocha, for her everlasting support.
Appendix A. Method A
In the first method of creating the longest, patternless string we developed, we started with the string of clear triplets (
8) which we augmented with
doublets
to form the string
of length
The introduction of
doublets from the first row of the (
9) and the doublet 10 into the string (
8) also introduces other doublets. For
the augmented string (
A1) is has the length
as the insertion of 0210 at the end of the string (
8) introduces the doublet 21. Thus, by construction, doublet
(last row,
column) cannot be reused. For
only two doublets can be introduced without repetitions, leading to twelve unique strings of length
Finally, we have to multiply the cardinality of this set by
to account for permutations. For example, the first string
, is equivalent to five strings
,
,
,
, and
. Hence, there are seventy-two different strings of length
. This method turned out to be valid for
only, as
and not all available doublets were used.
Appendix B. Method B
The second method we developed is an extension of the first one
Appendix A. We start with the augmented string (
A1) and the doublet matrix
For
, there are
and
(in total
) doublets available, respectively, in the upper triangle (beginning with 13) and lower triangle (beginning with 21) of the doublet matrix (
A4). We can insert the doublets from the upper triangle to the augmented string (
A1) as follows
creating a string of length
This method also turned out to be valid for
only, as
even though the strings of length
contain all doublets of the matrix (
9) without repeating. However, the strings (
A5) created by this method are non-balanced and do not contain all available
b clear triplets. For example, for
, the non-balanced string (
A5) of length
that contains all possible doublets (but no clear triplet 222) is
This led us to the third method described in
Section 2.
References
- Marshall, S.M.; Murray, A.R.G.; Cronin, L. A probabilistic framework for identifying biosignatures using Pathway Complexity. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 2017, 375, 20160342. [CrossRef]
- Imari Walker, S.; Cronin, L.; Drew, A.; Domagal-Goldman, S.; Fisher, T.; Line, M. Probabilistic Biosignature Frameworks. In Planetary Astrobiology; Meadows, V.; Arney, G.; Schmidt, B.; Des Marais, D.J., Eds.; University of Arizona Press, 2019; pp. 1–1. [CrossRef]
- Meadows, V.S.; Arney, G.N.; Schmidt, B.E.; Des Marais, D.J., Eds. Planetary astrobiology; University of Arizona space science series, The University of Arizona Press ; Houston : Lunar and Planetary Institute: Tucson, 2020. OCLC: 1151198948.
- Liu, Y.; Mathis, C.; Bajczyk, M.D.; Marshall, S.M.; Wilbraham, L.; Cronin, L. Exploring and mapping chemical space with molecular assembly trees. Science Advances 2021, 7, eabj2465. [CrossRef]
- Marshall, S.M.; Mathis, C.; Carrick, E.; Keenan, G.; Cooper, G.J.T.; Graham, H.; Craven, M.; Gromski, P.S.; Moore, D.G.; Walker, S.I.; Cronin, L. Identifying molecules as biosignatures with assembly theory and mass spectrometry. Nature Communications 2021, 12, 3033. [CrossRef]
- Marshall, S.M.; Moore, D.G.; Murray, A.R.G.; Walker, S.I.; Cronin, L. Formalising the Pathways to Life Using Assembly Spaces. Entropy 2022, 24, 884. [CrossRef]
- Sharma, A.; Czégel, D.; Lachmann, M.; Kempes, C.P.; Walker, S.I.; Cronin, L. Assembly theory explains and quantifies selection and evolution. Nature 2023, 622, 321–328. [CrossRef]
- Jirasek, M.; Sharma, A.; Bame, J.R.; Mehr, S.H.M.; Bell, N.; Marshall, S.M.; Mathis, C.; MacLeod, A.; Cooper, G.J.T.; Swart, M.; Mollfulleda, R.; Cronin, L. Investigating and Quantifying Molecular Complexity Using Assembly Theory and Spectroscopy. ACS Central Science 2024, 10, 1054–1064. [CrossRef]
- Łukaszyk, S.; Bieniawski, W. Assembly Theory of Binary Messages. Mathematics 2024, 12, 1600. [CrossRef]
Table 1.
The maximum length of a string having the assembly index and their Shannon entropies, as a function of the radix b.
Table 1.
The maximum length of a string having the assembly index and their Shannon entropies, as a function of the radix b.
| b |
|
|
| 1 |
|
0 |
| 2 |
|
0.9852 |
| 3 |
|
1.5766 |
| 4 |
|
1.9952 |
| 5 |
|
2.3190 |
| 6 |
|
2.5831 |
| 7 |
|
2.8061 |
| 8 |
|
2.9991 |
| 9 |
|
3.1692 |
| 10 |
|
3.3214 |
| 11 |
|
3.4590 |
| 12 |
|
3.5846 |
| 13 |
|
3.7002 |
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).