Preprint
Short Note

This version is not peer-reviewed.

On the “Assembly Theory and its Relationship with Computational Complexity”

A peer-reviewed article of this preprint also exists.

Submitted:

23 December 2024

Posted:

25 December 2024

You are already at the latest version

Abstract
The study provides an alternative definition of the assembly space as an acyclic, 2-in-regular digraph of strings provided with an edge labeling map that preserves the commutativity of an assembly step but defines the order of concatenation of strings in this step. Remarkably, the uniqueness of each vertex is the sufficient criterion to establish if an assembly step is allowed and to introduce the notion of an assembly pool: unit-length strings cannot be assembled from shorter strings and, hence, are inaccessible, forming the initial assembly pool, and strings present in the assembly space can not be assembled again, possibly using different pathways, as they would not be unique. What is allowed is the evolution of assembly pathways to make them shorter. We also comment on certain results of [10.48550/ARXIV.2406.12176], showing that the Assembly Steps Problem, not the Assembly Index Problem, has been proved in the referenced study to be NP-complete.
Keywords: 
;  ;  ;  ;  

1. Introduction

A recent study [1] shows that assembly theory offers a distinct approach and answers different questions than computational complexity theory with its focus on minimum descriptions via compressibility and discusses fundamental differences in the ontological basis of assembly theory and the assembly index as a physical observable, which distinguish it from theoretical approaches to formalizing life that are unmoored from measurement. Furthermore, the study [1] claims to contain the proof of the conjecture posed in [2] that the Assembly Index Problem is NP-Complete. In general, it argues to show that any instance of Vertex Cover Problem, which is known to be NP-Hard, can be reformulated as an instance of the Assembly Index Problem. Finally, it leaves open the question of whether the noncommutative concatenation version of string assembly is NP-Complete; its authors see no alternative to reconcile the commutativity of the assembly step with the noncommutativity of the string concatenation (cf. [1], Supplementary Material, footnote 101.).
Here, we show that such a reconciliation is, in fact, possible by providing an alternative definition of the assembly space. We also show that it is the Assembly Steps Problem, not the Assembly Index Problem, which has been shown in [1] to be NP-complete. Hence, finding if a string can be assembled along a given path in a given number of steps is NP-complete. But we do not know if finding the minimal number of assembly steps leading to this string, that is, the assembly index of this string [3], is NP-complete.

2. An Alternative Definition of an Assembly Space

An assembly space Ω = ( Γ , ϕ ) is defined in [1] (cf. Definition 8) as an acyclic directed graph Γ = ( V , E ) , where V is the set of vertices and E is the set of edges together with an edge labeling map ϕ : E e v V . Γ contains a finite and non-empty set of vertices Src ( Γ ) = B Ω that form the basis of Ω , each reachable only from itself. All remaining vertices of Γ are reachable from a vertex in B Ω .
The edge labeling map ϕ cleverly defines the assembly step. Namely
e = ( x , z ) E ( Ω ) ϕ ( e ) = y V ( Ω ) e = ( y , z ) E ( Ω ) : ϕ ( e ) = x ,
allowing to write e = ( x y , z ) , e = ( y x , z ) , and
e = ( x y , z ) E ( Ω ) e = ( y x , z ) E ( Ω ) .
However, the commutativity of the relation (2) cannot imply the commutativity of the string concatenation. If all assembly objects in V ( Ω ) are strings ( x y , x y ) E ( Ω ) does not imply ( y x , y x ) E ( Ω ) , as might be expected for string concatenation, but it actually implies also ( y x , x y ) E ( Ω ) . Therefore, a string assembly space is endowed in [1] with the additional property (cf. [1], Definition 19 and Eq. (10))
( x y , z ) E ( Ω ) z = [ x y ] z = [ y x ] ,
and likewise for the 2nd edge ( y x , z ) E ( Ω ) .
However, the property (3) makes the terminating string z unresolvable if x y . Consequently, z cannot be used in subsequent assembly operations. Here, we provide an alternative definition of the assembly space.
Definition 1
(Assembly Space). An assembly space Ω = ( C , E , ϕ ) is an acyclic digraph of strings C = { C k } , where all b N unit length strings (basic symbol(s)) are inaccessible source vertices and the remaining strings are 2-in-regular assembly steps vertices, E is a set of edges, and ϕ : E e C k C is an edge labeling map, wherein an assembly step s > 0 consists of forming a new string C z from two not necessarily different s 1 + b strings C x , C y by concatenating them with each other, establishing edges e = ( C x , C z ) and e = ( C y , C z ) , and assigning, strings C x , C y to edges e , e using ϕ as
C z = C x C y = strcat ( C x , C y ) ϕ ( e ) = C y ϕ ( e ) = C x , C z = C y C x = strcat ( C y , C x ) ϕ ( e ) = C y ϕ ( e ) = C x ,
where ∘ denotes the string concatenation (strcat) operator.
The definition of edge labeling map (4) is possible if only b > 1 , i.e., for more than one basic symbol, as in that case we can say that a given inaccessible symbol is the 1 st one, another is the 2 nd one, and so on; we can sort them. Otherwise, the notion of a concatenation direction is pointless for one symbol only. Contrary to the previous definition of the labeling map ϕ [1], the relation (4) preserves the commutativity of the assembly step but defines the order of concatenation of the strings, as - in general - for different strings C x C y C x C y C y C x .
The definition 1 is consistent: all vertices are unique (in any standard graph all vertices should be unique), and all are strings. Since an assembly step always consists of joining two parts only [5], this can be thought of as the left and right fragments of the newly formed string [3], and those strings that can be the result of concatenation of two shorter strings are assembly step 2-in-regular vertices, while unit-length strings are inaccessible. Remarkably, the uniqueness of each vertex is the sufficient criterion to establish if an assembly step is allowed (cf. [1], Definition 10) and to introduce the notion of an assembly pool: vertices (strings) present in the assembly space can not be assembled again, possibly using different pathways, as they would not be unique; they can only be used in assembly of other strings. What is allowed is the evolution of assembly pathways to make them shorter, as shown in Figure 1. This evolution seems to be stimulated by the trend to decrease the assembly depth [3,6].

3. The Assembly Steps Problem is NP-Complete

In order to show that any instance of Vertex Cover Problem ( G , k ) , where G = ( V , E ) is a graph, V ( G ) is the set of vertices and E ( G ) is the set of edges and k is the cardinality of a set of vertices that includes at least one vertex of every edge of G, which is known to be NP-Hard, can be reformulated in polynomial time as an instance of the Assembly Index Problem, the following procedure is offered (cf. [1], Section 4.2). For a given instance of the Vertex Cover Problem ( G , k ) , where τ k | V ( G ) | , and τ is the vertex cover number (the size of a minimum vertex cover), an instance of the Assembly Index Problem ( Ω , C x , a ( N , b ) ( C x ) ) is constructed, where Ω is a constructed assembly space, and C x is the target string for which the assembly index a ( N , b ) ( C x ) is to be determined. It is then claimed that a certificate for the Vertex Cover Problem
C = { v l V ( G ) | C l = [ 0 v l 0 ] V ( Ω ) }
containing a subset { v l } of vertices of G that includes at least one vertex of every edge of G can be used to produce a certificate ( Ω , C x , a ( N , b ) ( C x ) ) for the Assembly Index Problem and vice versa, where Ω Ω is a rooted subspace (cf. [1] Definition 15) of the assembly space Ω containing only a proper subset { C l = [ 0 v l 0 ] } of the strings of the form C k = [ 0 v k 0 ] , v k V ( G ) . Hence, such an instance of the Assembly Index Problem would be logically equivalent to an instance of the Vertex Cover Problem from which it was constructed.
The construction of C x (cf. [1], Section 4.2) begins with defining the basis of the assembly space Ω (cf. Eqs. (17), (50)), i.e., the unit-length strings
B Ω = { 0 , 1 , 2 , , | V ( G ) | } ,
containing | V ( G ) | symbols of vertices V ( G ) , and a special symbol that here we call "0" (it is defined as "#" in [1]). Hence, b = | B Ω | = | V ( G ) | + 1 . Then, a set of 3 | V ( G ) | vertex strings
C k 1 = [ 0 v k ] , C k 2 = [ v k 0 ] , C k 3 = [ 0 v k 0 ] , v k V ( G )
is assembled (Eq. (18)). Subsequently, a set of | E ( G ) | edge strings
C j = [ 0 v s j 0 v t j 0 ] , e j = ( v s j , v t j ) E ( G )
is assembled (Eq. (19)). The last step of the construction of C x is a sequence of 2 | V ( G ) | strings
S 0 0 , S k S k [ 0 v k ] , v k V ( G ) , S | V ( G ) | + 1 S | V ( G ) | [ v 1 0 ] , S | V ( G ) | + k S | V ( G ) | + k 1 [ v k 0 ] , v k V ( G ) ,
and | E ( G ) | strings
S 2 | V ( G ) | + 1 S 2 | V ( G ) | [ 0 v s 1 0 v t 1 0 ] , S 2 | V ( G ) | + j S 2 | V ( G ) | + j 1 [ 0 v s j 0 v t j 0 ] , e j = ( v s j , v t j ) E ( G ) ,
defined in [1] by Eqs. (20)-(25), where the target string C x S 2 | V ( G ) | + | E ( G ) | is defined as the last string of this sequence and
| V ( Ω ) | = | V ( G ) | + 1 + 3 | V ( G ) | + | E ( G ) | + 2 | V ( G ) | + | E ( G ) | = 6 | V ( G ) | + 2 | E ( G ) | + 1 .
Finally ([1], Section 4.2.3) it is claimed that given ( Ω , C x , n vcp ( C x ) ) is a certificate for the Assembly Index Problem if the set (5) is a vertex cover of G with size k, i.e. a certificate for the Vertex Cover Problem is given, wherein n vcp ( C x ) is the assembly index of string C x and
n vcp ( C x ) = 4 | V ( G ) | + 2 | E ( G ) | + k | V ( Ω ) | = n vcp ( C x ) + | B Ω | = 5 | V ( G ) | + 2 | E ( G ) | + k + 1 ,
which depends on k and is minimal if k = τ .
By construction, the basic symbols (6), the edge strings (8), and the sequence strings (9a) and (9b) contained in Ω must also be contained in Ω (certificate). However, the vertex strings (7) of the form [ 0 v k 0 ] are the exception, as each of the edge strings (8) can be assembled from strings (7) in one of the two mutually exclusive steps (cf. [1] Eqs. (53), (54))
[ 0 v s j ] [ 0 v t j 0 ] = [ 0 v s j 0 v t j 0 ] or [ 0 v s j 0 ] [ v t j 0 ] = [ 0 v s j 0 v t j 0 ] ,
leaving some of the strings [ 0 v s j 0 ] or [ 0 v t j 0 ] redundant. It can be seen by comparing the cardinalities of the spaces Ω (10) and Ω (11), which - as expected - leads to k < | V ( G ) | . There are 3 | V ( g ) | strings (7) in Ω and only 2 | V ( g ) | + k strings in Ω .
By construction (9a), (9b), the target string has the form
C x = C vs C es = [ 001020 ( | V ( G ) | 1 ) 0 | V ( G ) | 1020 ( | V ( G ) | 1 ) 0 | V ( G ) | 0 ] C es ,
where the C vs is a vertex-specific part of C x depending solely on | V ( G ) | and its explicit form is given by the formula (9a), and the C es is an edge-specific part of C x , generated by the formula (9b), and depending both on | E ( G ) | , edge vertex assignments and the order of labeling of the edges of graph G. However, | C es | = 5 | E ( G ) | , as the length of each edge string (8) is five and there are | E ( G ) | such strings in C x and Ω . Therefore, the length of the target string is
N x = | C vs | + | C es | = ( 4 | V ( G ) | + 1 ) + 5 | E ( G ) | .
Furthermore, by construction C vs contains two copies of the string C lng [ 1020 ( | V ( G ) | 1 ) 0 | V ( G ) | ] of length N lng = 2 | V ( G ) | 1 having the assembly index equal to a lng = 2 | V ( G ) | 2 as it does not contain any repetitions of substrings. We can take advantage of the fact that each m copies of an n-plet C n contained in a string decrease the assembly index of this string at least by m ( n 1 ) a C n [3], where a C n is the assembly index of this n-plet, to estimate the upper bound for the assembly index of C x reduced by the presence of these two copies of C lng . Furthermore, excluding the degenerate cases of empty and disjoint graphs G, we can further infer some information about C x . That is, since any vertex v k V ( G ) is a part of some edge e j E ( G ) , C x contains at least two repetitions of doublets [ c l 0 ] (or [ 0 c l ] ), with l = 1 , 2 , , | V ( G ) | 1 as the string C lng also contains | V ( G ) | 1 such doublets, and each repetition decreases the assembly index by one. Hence, the upper bound must be further decreased by | V ( G ) | 1 . Finally, each string C x contains | E ( G ) | + 1 repetitions of a doublet [ 00 ] and, hence, the upper bound must be further decreased by | E ( G ) | . Therefore, the initial upper bound on the assembly index that amounts to N x 1 [5] if N x b 2 + b + 1 [3] decreases to
n dec ( C x ) ( N x 1 ) 2 ( 2 | V ( G ) | 2 ) ( 2 | V ( G ) | 2 ) ( | V ( G ) | 1 ) | E ( G ) | = | V ( G ) | + 4 | E ( G ) | + 3 ,
which, in contrast to n vcp ( C x ) (11), is independent of k.
We have examined a few simple graphs, shown in Figure 2, obtaining the results listed in Table 1.
As an example, consider the trivial graph G = ( V , E ) , shown in Figure 2(b) having two edges connected at one vertex. Hence, its vertex cover number is τ = 1 . In this case, (10) | V ( Ω ) | = 23 and the target string generated by sequences (9a) and (9b) has the form
C x = [ 00102031020300102002030 ] .
As the vertex cover of the graph G is the vertex 2, the subspace Ω (the certificate) is devoid of triplets [ 010 ] and [ 030 ] , since the edges ( 1 , 2 ) and ( 2 , 3 ) share the vertex 2, and the edge strings (8) could be assembled as [ 01 | 020 ] and [ 020 | 30 ] . Therefore, the number of steps on the assembly pathway of C x defined by Ω , given by the relation (11), amounts to n vcp ( C x ) = | Ω | | B Ω | 2 = 17 , as shown in Figure 3(a) also illustrating the assembly depth [6] ( d ( C x ) = 9 ) of this string: 7 steps (1-7) for vertex strings (7), 2 steps (8, 9) for edge strings (8), 6 steps (10-15) for sequence strings (9a), and 2 steps (16, 17) for sequence strings (9b)) which corresponds to the vertex cover number τ = 1 , if only the string (16) is assembled using the set of allowed assembly operations defined by the equations Eqs. (38)-(45) of [1].
However, imposing such a set of allowed assembly steps deviates from the principles of assembly theory that assume the possibility of assembling any object from any two objects in the assembly pool. Even if we assume that only some steps are allowed and some are not due to peculiarities of the assembled data structures, this is certainly not the case for strings, considered in [1] in the proof of Lemma 3. All strings are possible and mathematically well defined [7]. What could be the reason for allowing the assembly of a string [ A B C ] and disallowing the assembly of a string [ B C A ] from a set of basic symbols { A , B , C } ? The evolution of information became possible as soon as a first bit, not a first particle or object, became accessible [2,3,8,9].
Therefore, the assembly index of the string (16) is a ( C x ) = 13 . One of the shortest pathways of the string (16) is shown in Figure 3(c) with d ( C x ) = 10 . A quadruplet [ 1020 ] present in two independent copies is assembled in step 5, 5-plet C lng = [ 10203 ] present in two copies is assembled in step 6. Furthermore, C x contains two independent copies of [ 00 ] , [ 10 ] , and [ 20 ] . A slightly longer pathway leading to the string of length (15) is shown in Figure 3(b).

4. Conclusions

The study [1] shows that the Assembly Steps Problem, that is, a problem of determining if a given string can be assembled in a given number of steps according to principles of assembly theory, can be reformulated as the NP-hard Vertex Cover Problem, that is a problem of determining if a given set of vertices of a graph contains at least one vertex of each edge of this graph, and hence the former problem is also NP-hard. Furthermore, the study [1] shows that, since a proposed solution (certificate) to the Assembly Steps Problem can be checked for correctness in polynomial time, the Assembly Steps Problem becomes NP-complete.
However, based on a few simple graphs, we have here shown that the proposed construction of an assembly space from a graph (a certificate) to map the correspondence between the Minimum Vertex Cover Problem and the Assembly Steps Problem does not reflect the shortest pathway leading to this string and hence does not correspond to the Assembly Index Problem.
In this study, we have also answered in the affirmative the question posed in [1]: using the alternative, novel of the assembly space of strings 1 and the procedure of constructing the Assembly Steps Problem to correspond to the Vertex Cover Problem [1] shows that the noncommutative concatenation version of the Assembly Steps Problem is also NP-Complete.
Unfortunately, we still do not know if the Assembly Index Problem is NP-Complete.

Funding

This research received no external funding.

Acknowledgments

The author thanks Mariola Bala, Wawrzyniec Bieniawski, Piotr Masierak for motivation, critical discussions and numerous clarity, reasoning, and grammar corrections, his wife, Magdalena Bartocha, for her everlasting support, and his partner and friend, Renata Sobajda, for her prayers.

References

  1. Kempes, C.P.; Lachmann, M.; Iannaccone, A.; Fricke, G.M.; Chowdhury, M.R.; Walker, S.I.; Cronin, L. Assembly Theory and its Relationship with Computational Complexity. arXiv 2024, arXiv:2406.12176. [Google Scholar] [CrossRef]
  2. Łukaszyk, S.; Bieniawski, W. Assembly Theory of Binary Messages. Mathematics 2024, 12, 1600. [Google Scholar] [CrossRef]
  3. Bieniawski, W.; Masierak, P.; Tomski, A.; Łukaszyk, S. On the Certain Salient Regularities of Strings of Assembly Theory. 2024. [Google Scholar] [CrossRef]
  4. Marshall, S.M.; Moore, D.G.; Murray, A.R.G.; Walker, S.I.; Cronin, L. Formalising the Pathways to Life Using Assembly Spaces. Entropy 2022, 24, 884. [Google Scholar] [CrossRef]
  5. Marshall, S.M.; Murray, A.R.G.; Cronin, L. A probabilistic framework for identifying biosignatures using Pathway Complexity. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 2017, 375, 20160342. [Google Scholar] [CrossRef] [PubMed]
  6. Pagel, S.; Sharma, A.; Cronin, L. Mapping Evolution of Molecules Across Biochemistry with Assembly Theory. arXiv 2024, arXiv:2409.05993. [Google Scholar] [CrossRef]
  7. Sharma, A.; Czégel, D.; Lachmann, M.; Kempes, C.P.; Walker, S.I.; Cronin, L. Assembly theory explains and quantifies selection and evolution. Nature 2023, 622, 321–328. [Google Scholar] [CrossRef] [PubMed]
  8. Łukaszyk, S. Black Hole Horizons as Patternless Binary Messages and Markers of Dimensionality. In Future Relativity, Gravitation, Cosmology; Chapter 15; Nova Science Publishers, 2023; pp. 317–374. [Google Scholar] [CrossRef]
  9. Łukaszyk, S. Life as the Explanation of the Measurement Problem. Journal of Physics: Conference Series 2024, 2701, 012124. [Google Scholar] [CrossRef]
1
Here, all references to [1] relate to the Supplementary Material of [1].
Figure 1. The evolution of assembly pathways leading to a string [ 0101 ] to make them shorter and minimize the assembly depth, also illustrating the definition of the edge labeling map ϕ for C x C y : ϕ ( e ) = C y C z = C x C y (a), ϕ ( e ) = C y C z = C y C x ("blue arrow goes first").
Figure 1. The evolution of assembly pathways leading to a string [ 0101 ] to make them shorter and minimize the assembly depth, also illustrating the definition of the edge labeling map ϕ for C x C y : ϕ ( e ) = C y C z = C x C y (a), ϕ ( e ) = C y C z = C y C x ("blue arrow goes first").
Preprints 143921 g001
Figure 2. Simple graphs we examined: one edge (a), two edges (b), three edges (c), square (d), "EM rocket"(e), a complete graph K 5 (f). Red circles indicate a minimum vertex cover.
Figure 2. Simple graphs we examined: one edge (a), two edges (b), three edges (c), square (d), "EM rocket"(e), a complete graph K 5 (f). Red circles indicate a minimum vertex cover.
Preprints 143921 g002
Figure 3. Three assembly spaces of the same string C x = [ 00102031020300102002030 ] : the pathway to produce a vertex cover certificate ( n vcp ( C x ) = 17 steps) (a), the pathway taking into account the general distributions of substrings in all strings C x ( n dec ( C x ) = 14 steps) (b), a shortest, assembly index pathway ( a ( C x ) = 13 steps) (c).
Figure 3. Three assembly spaces of the same string C x = [ 00102031020300102002030 ] : the pathway to produce a vertex cover certificate ( n vcp ( C x ) = 17 steps) (a), the pathway taking into account the general distributions of substrings in all strings C x ( n dec ( C x ) = 14 steps) (b), a shortest, assembly index pathway ( a ( C x ) = 13 steps) (c).
Preprints 143921 g003
Table 1. Assembly indices a ( C x ) of target strings C x constructed [1] for the Vertex Cover Problem graphs G; minimum and maximum assembly indices a min ( N x ) , a max ( N x , b ) for N x ; numbers of steps leading to C x used in [1] ( n vcp ( C x ) ) and derived here ( n dec ( C x ) ), for examined graphs.
Table 1. Assembly indices a ( C x ) of target strings C x constructed [1] for the Vertex Cover Problem graphs G; minimum and maximum assembly indices a min ( N x ) , a max ( N x , b ) for N x ; numbers of steps leading to C x used in [1] ( n vcp ( C x ) ) and derived here ( n dec ( C x ) ), for examined graphs.
Graph | V ( G ) | | E ( G ) | τ N x a min ( N x ) a ( C x ) n dec ( C x ) n vcp ( C x ) a max ( N x , b )
one edge 2 1 1 14 5 8 9 11 12
two edges 3 2 1 23 7 13 14 17 21
three edges 4 3 1 32 5 18 19 23 30
square 4 4 2 37 7 20 23 26 33
"EM rocket" 6 7 3 60 7 32 37 41 58
K 5 5 10 4 71 9 40 48 44 < 56
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2025 MDPI (Basel, Switzerland) unless otherwise stated