Preprint
Short Note

This version is not peer-reviewed.

On the "Assembly Theory and its Relationship with Computational Complexity"

A peer-reviewed article of this preprint also exists.

Submitted:

17 December 2024

Posted:

18 December 2024

Read the latest preprint version here

Abstract
The study provides an alternative definition of the assembly space, as an acyclic, 2-in-regular digraph with vertices associated with strings, wherein some vertices (initial assembly pool) are inaccessible, and with the edge labeling map preserving the commutativity of the assembly step in terms of the labels of assembled vertices, but defining the order of concatenation of strings associated with those vertices in a string associated with the vertex being the endpoint of both incoming edges starting at these vertices. We also comment on certain results of [10.48550/ARXIV.2406.12176], showing that it is the Assembly Steps Problem, not the Assembly Index Problem, that has been proved in the referenced study to be NP-complete.
Keywords: 
;  ;  ;  ;  

1. Introduction

A recent study [1] shows that assembly theory offers a distinct approach and answers different questions than computational complexity theory with its focus on minimum descriptions via compressibility and discusses fundamental differences in the ontological basis of assembly theory and the assembly index as a physical observable, which distinguish it from theoretical approaches to formalizing life that are unmoored from measurement. Furthermore, the study [1] claims to contain the proof of the conjecture posed in [2] that the Assembly Index Problem is NP-Complete. In general, it argues to show that any instance of Vertex Cover Problem, which is known to be NP-Hard, can be reformulated as an instance of the Assembly Index Problem. Finally, it leaves open the question of whether the non-commutative concatenation version of string assembly is NP-Complete; its authors see no alternative to reconcile the commutativity of the assembly step with the non-commutativity of the string concatenation (cf. [1], Supplementary Material, footnote 101.).
Here, we show that such a reconciliation is, in fact, possible by providing an alternative definition of the assembly space. We also show that it is the Assembly Steps Problem, not the Assembly Index Problem, which has been shown in [1] to be NP-complete. Hence, finding if a string can be assembled along a given path in a given number of steps is NP-complete. But we do not know if finding the minimal difference between the cardinalities of the final and initial assembly pools that lead to this string, that is, the assembly index of this string [3], is NP-complete.

2. An Alternative Definition of an Assembly Space

An assembly space Ω = ( Γ , ϕ ) is defined in [1] (cf. Definition 8) as an acyclic directed graph Γ = ( V , E ) , where V ( Ω ) is the set of vertices and E ( Ω ) is the set of edges together with an edge labeling map ϕ ( e ) : e E ( Ω ) v V ( Ω ) . Γ contains a finite and non-empty set of vertices Src ( Γ ) = B Ω that form the basis of Ω , each reachable only from itself. All remaining vertices of Γ are reachable from a vertex in B Ω . The edge labeling map ϕ cleverly defines the assembly step. Namely
e = ( x , z ) E ( Ω ) ϕ ( e ) = y V ( Ω ) e = ( y , z ) E ( Ω ) : ϕ ( e ) = x ,
allowing to write e = ( x y , z ) , e = ( y x , z ) , and
e = ( x y , z ) E ( Ω ) e = ( y x , z ) E ( Ω ) .
However, the commutativity of the relation (2) cannot imply the commutativity of the string concatenation. If all assembly objects in V ( Ω ) are strings ( x y , x y ) E ( Ω ) does not imply ( y x , y x ) E ( Ω ) , as might be expected for string concatenation, but it actually implies also ( y x , x y ) E ( Ω ) . Therefore, a string assembly space is endowed in [1] with the additional property (cf. [1], Definition 18 and Eq. (10))
( x y , z ) E ( Ω ) z = [ x y ] z = [ y x ] ,
and likewise for the edge ( y x , z ) E ( Ω ) .
However, this conclusion and, thus, the inevitability of the property (3) it entails makes the string associated with the terminating vertex z unresolvable if x y . Here, we provide an alternative definition of the assembly space, noting that vertices x, y, or z of the assembly space are not the same as strings C x , C y , or C z they are associated with.
Definition 1 
(Assembly Space). An assembly space Ω = ( Γ , ϕ ) is an acyclic, 2-in-regular digraph Γ = ( V , E ) , where V is a set of vertices v k associated with strings C k with b inaccessible vertices associated with b N different basic symbols c of unit length, and the edge labeling map defined as:
e = ( x , z ) E ( Γ ) ϕ ( e ) = y ! e = ( y , z ) E ( Γ ) : ϕ ( e ) = x C z = C x C y = strcat ( C x , C y ) y ! e = ( y , z ) E ( Γ ) : ϕ ( e ) = x C z = C y C x = strcat ( C y , C x ) ,
that defines the concatenation order of the strings C x , C y associated respectively with vertices x, y in the string C z associated with the vertex z being the endpoint of both edges e and e .
The set of b inaccessible vertices forms the initial assembly pool P 0 ( b ) { 0 , 1 , , b 1 } [3] and is equivalent to the basis B Ω (cf. [1], Eq. (7)). We note that the definition of edge labeling map (4) is possible if only b > 1 , i.e., the initial assembly pool contains more than one distinct basic symbol, as in that case we can say that a given symbol is the 1st one in the initial assembly pool, another is the 2nd one, and so on; we can sort them. Otherwise, the notion of a concatenation direction is pointless for one symbol only. The relation (4) preserves the commutativity of the assembly step in terms of the vertices labels, that is ( x y , z ) E ( Ω ) ( y x , z ) E ( Ω ) , but defines the order of concatenation of the strings associated with those labels, as for different strings C x C y C x C y C y C x . This is shown in Figure 1.
In an acyclic, 2-in-regular digraph, each accessible vertex has exactly two incoming edges corresponding to the assembly step. This definition extends the previous definition of the assembly space as a quiver [4]; the notion of self-reachability of vertices of an assembly space is misleading as it suggests some self-assembly. Hence, basis objects are inaccessible, not self-reachable (cf. [1], Definition 7).

3. The Assembly Steps Problem is NP-Complete

In order to show that any instance of Vertex Cover Problem ( G , k ) , where G = ( V , E ) is a graph, V ( G ) is the set of vertices and E ( G ) is the set of edges and k is the cardinality of a set of vertices that includes at least one vertex of every edge of G, which is known to be NP-Hard, can be reformulated in polynomial time as an instance of the Assembly Index Problem, the following procedure is offered (cf. [1], Section 4.2). For a given instance of the Vertex Cover Problem ( G , k ) , where τ k V | G | , and τ is the vertex cover number (the size of a minimum vertex cover), an instance of the Assembly Index Problem ( Ω , C x , a ( N , b ) ( C x ) ) is constructed, where Ω = ( Γ , ϕ ) is a constructed assembly space, and C x is the target string for which the assembly index a ( N , b ) ( C x ) is to be determined. It is then claimed that a certificate for the Vertex Cover Problem
C = { v l V ( G ) | C l = [ 0 v l 0 ] V ( Ω ) }
containing a subset { v l } of vertices of G that includes at least one vertex of every edge of G can be used to produce a certificate ( Ω , C x , a ( N , b ) ( C x ) ) for the Assembly Index Problem and vice versa, where Ω Ω is a rooted subspace (cf. [1] Definition 15) of the assembly space Ω containing only a proper subset { C l = [ 0 v l 0 ] } of the strings of the form C k = [ 0 v k 0 ] , v k V ( G ) . Hence, such an instance of the Assembly Index Problem would be logically equivalent to an instance of the Vertex Cover Problem from which it was constructed.
The construction of C x (cf. [1], Section 4.2) begins with forming the basis of the assembly space Ω (cf. Eqs. (17), (50))
B Ω = { 0 , 1 , 2 , , | V ( G ) | } ,
containing | V ( G ) | symbols of vertices V ( G ) , and a special symbol that here we call 0 (it is defined as # in [1]). Hence, b = | B Ω | = | V ( G ) | + 1 . Then, a set of 3 | V ( G ) | strings
C k 1 = [ 0 v k ] , C k 2 = [ v k 0 ] , C k 3 = [ 0 v k 0 ] , v k V ( G )
is assembled (Eq. (18)) Subsequently, a set of | E ( G ) | strings
C j = [ 0 v s j 0 v t j 0 ] , e j = ( v s j , v t j ) E ( G )
is assembled (Eq. (19)). The last step of the construction of C x is a sequence of 2 | V ( G ) | strings
S 0 0 , S k S k [ 0 v k ] , v k V ( G ) , S | V ( G ) | + 1 S | V ( G ) | [ v 1 0 ] , S | V ( G ) | + k S | V ( G ) | + k 1 [ v k 0 ] , v k V ( G ) ,
and | E ( G ) | strings
S 2 | V ( G ) | + 1 S 2 | V ( G ) | [ 0 v s 1 0 v t 1 0 ] , S 2 | V ( G ) | + j S 2 | V ( G ) | + j 1 [ 0 v s j 0 v t j 0 ] , e j = ( v s j , v t j ) E ( G ) ,
defined in [1] by Eqs. (20)-(25), where the target string C x S 2 | V ( G ) | + | E ( G ) | is defined as the last string of this sequence and
| V ( Ω ) | = | V ( G ) | + 1 + 3 | V ( G ) | + | E ( G ) | + 2 | V ( G ) | + | E ( G ) | = 6 | V ( G ) | + 2 | E ( G ) | + 1 .
Finally ([1], Section 4.2.3) it is claimed that given ( Ω , C x , n vcp ( C x ) ) is a certificate for the Assembly Index Problem if the set (5) is a vertex cover of G with size k, i.e. a certificate for the Vertex Cover Problem is given, wherein n vcp ( C x ) is the assembly index of string C x and
n vcp ( C x ) = 4 | V ( G ) | + 2 | E ( G ) | + k | V ( Ω ) | = n vcp ( C x ) + | B Ω | = 5 | V ( G ) | + 2 | E ( G ) | + k + 1 ,
which depends on k and is minimal if k = τ .
By construction, the basic symbols (6), the edge strings (8), and the sequence strings (9a) and (9b) contained in Ω must also be contained in Ω (certificate). However, the vertex strings (7) of the form [ 0 v k 0 ] are the exception, as each of the edge strings (8) can be assembled from strings (7) in one of the two mutually exclusive steps (cf. [1] Eqs. (53), (54))
[ 0 v s j ] [ 0 v t j 0 ] = [ 0 v s j 0 v t j 0 ] or [ 0 v s j 0 ] [ v t j 0 ] = [ 0 v s j 0 v t j 0 ] ,
leaving some of the strings [ 0 v s j 0 ] or [ 0 v t j 0 ] redundant. It can be seen by comparing the cardinalities of the spaces Ω (10) and Ω (11), which - as expected - leads to k < | V ( G ) | .
By construction (9a), (9b), the target string has the form
C x = C vs C es = [ 001020 ( | V ( G ) | 1 ) 0 | V ( G ) | 1020 ( | V ( G ) | 1 ) 0 | V ( G ) | 0 ] C es ,
where the C vs is a vertex-specific part of C x depending solely on | V ( G ) | and its explicit form is given by the formula (9a), and the C es is an edge-specific part of C x , generated by the formula (9b), and depending both on | E ( G ) | , edge vertex assignments and the order of labeling of the edges of graph G. However, | C es | = 5 | E ( G ) | , as the length of each edge string (8) is five and there are | E ( G ) | such strings in C x and Ω . Therefore, the length of the target string is
N x = | C vs | + | C es | = ( 4 | V ( G ) | + 1 ) + 5 | E ( G ) | .
Furthermore, by construction C vs contains two copies of the string C lng [ 1020 ( | V ( G ) | 1 ) 0 | V ( G ) | ] of length N lng = 2 | V ( G ) | 1 having the assembly index equal to a lng = 2 | V ( G ) | 2 as it does not contain any repetitions of substrings. We can take advantage of the fact that each m copies of an n-plet C n contained in a string decrease the assembly index of this string at least by m ( n 1 ) a C n [3], where a C n is the assembly index of this n-plet, to estimate the upper bound for the assembly index of C x reduced by the presence of these two copies of C lng . Furthermore, excluding the degenerate cases of empty and disjoint graphs G, we can further infer some information about C x . That is, since any vertex v k V ( G ) is a part of some edge e j E ( G ) , C x contains at least two repetitions of doublets [ c l 0 ] (or [ 0 c l ] ), with l = 1 , 2 , , | V ( G ) | 1 as the string C lng also contains | V ( G ) | 1 such doublets, and each repetition decreases the assembly index by one. Hence, the upper bound must be further decreased by | V ( G ) | 1 . Finally, each string C x contains | E ( G ) | + 1 repetitions of a doublet [ 00 ] and, hence, the upper bound must be further decreased by | E ( G ) | . Therefore, the initial upper bound on the assembly index that amounts to N x 1 [5] if N x b 2 + b + 1 [3] decreases to
n dec ( C x ) ( N x 1 ) 2 ( 2 | V ( G ) | 2 ) ( 2 | V ( G ) | 2 ) ( | V ( G ) | 1 ) | E ( G ) | = | V ( G ) | + 4 | E ( G ) | + 3 ,
which, in contrast to n vcp ( C x ) (11), is independent of k.
We have examined a few simple graphs, shown in Figure 2, obtaining the results listed in Table 1.
As an example, consider the trivial graph G = ( V , E ) , shown in Figure 2(b) having two edges connected at one vertex. Hence, its vertex cover number is τ = 1 . In this case, | V ( Ω ) | = 23 and the target string generated by sequences (9a) and (9b) has the form
C x = [ 00102031020300102002030 ] .
As the vertex cover of the graph G is the vertex 2, the subspace Ω (the certificate) is devoid of triplets [ 010 ] and [ 030 ] , since the edges ( 1 , 2 ) and ( 2 , 3 ) share the vertex 2, and the edge strings (8) could be assembled as [ 01 | 020 ] and [ 020 | 30 ] . Therefore, the number of steps on the assembly pathway of C x defined by Ω , given by the relation (11), amounts to n vcp ( C x ) = | Ω | | B Ω | 2 = 17 , as shown in Figure 3(a) also illustrating the assembly depth [6] ( d ( C x ) = 9 ) of this string: 7 steps (5-11) for vertex strings (7), 2 steps (12,13) for edge strings (8), 6 steps (14-19) for sequence strings (9a), and 2 steps (20,21) for sequence strings (9b)) which corresponds to the vertex cover number τ = 1 , if only the string (16) is assembled using the set of allowed assembly operations defined by the equations Eqs. (38)-(45) of [1].
However, imposing such a set of allowed assembly steps deviates from the principles of assembly theory that assume the possibility of assembling any object from any two objects in the assembly pool. Even if we assume that only some steps are allowed and some are not due to peculiarities of the assembled data structures, this is certainly not the case for strings, considered in [1] in the proof of Lemma 3. All strings are possible and mathematically well defined [7]. The evolution of information became possible as soon as a first bit, not a first particle or object, became accessible [2,3,8,9].
Therefore, the assembly index of the string (16) is a ( C x ) = 13 . One of the shortest pathways of the string (16) is shown in Figure 3(c) with d ( C x ) = 10 . A quadruplet [ 1020 ] present in two independent copies is assembled in step 5, 5-plet C lng = [ 10203 ] present in two copies is assembled in step 6. Furthermore, C x contains two independent copies of [ 00 ] , [ 10 ] , and [ 20 ] . A slightly longer pathway leading to the string of length (15) is shown in Figure 3(b).

4. Conclusions

The study [1] shows that the Assembly Steps Problem, that is, a problem of determining if a given string can be assembled in a given number of steps according to principles of assembly theory, can be reformulated as the NP-hard Vertex Cover Problem, that is a problem of determining if a given set of vertices of a graph contains at least one vertex of each edge of this graph, and hence the former problem is also NP-hard. Furthermore, the study [1] shows that, since a proposed solution (certificate) to the Assembly Steps Problem can be checked for correctness in polynomial time, the Assembly Steps Problem becomes NP-complete.
However, here, based on a few simple graphs, we have shown that the proposed construction of an assembly space from a graph (a certificate) to map correspondence between the Minimum Vertex Cover Problem and the Assembly Steps Problem does not reflect a shortest pathway leading to this string and hence does not correspond to the Assembly Index Problem.
In this study, we have also answered in the affirmative the question posed in [1]: using the alternative definition of the assembly space 1 and the procedure of constructing the Assembly Steps Problem to correspond to the Vertex Cover Problem [1] shows that the non-commutative concatenation version of the Assembly Steps Problem is also NP-Complete.
Unfortunately, we still do not know if the Assembly Index Problem is NP-Complete.

Funding

This research received no external funding.

Acknowledgments

The author thanks Mariola Bala, Wawrzyniec Bieniawski, Piotr Masierak for motivation, critical discussions and numerous clarity, reasoning, and grammar corrections, his wife, Magdalena Bartocha, for her everlasting support, and his partner and friend, Renata Sobajda, for her prayers.

References

  1. Kempes, C.P.; Lachmann, M.; Iannaccone, A.; Fricke, G.M.; Chowdhury, M.R.; Walker, S.I.; Cronin, L. Assembly Theory and its Relationship with Computational Complexity. arXiv 2024, arXiv:2406.12176. [Google Scholar] [CrossRef]
  2. Łukaszyk, S.; Bieniawski, W. Assembly Theory of Binary Messages. Mathematics 2024, 12, 1600. [Google Scholar] [CrossRef]
  3. Bieniawski, W.; Masierak, P.; Tomski, A.; Łukaszyk, S. On the Certain Salient Regularities of Strings of Assembly Theory, 2024. [CrossRef]
  4. Marshall, S.M.; Moore, D.G.; Murray, A.R.G.; Walker, S.I.; Cronin, L. Formalising the Pathways to Life Using Assembly Spaces. Entropy 2022, 24, 884. [Google Scholar] [CrossRef]
  5. Marshall, S.M.; Murray, A.R.G.; Cronin, L. A probabilistic framework for identifying biosignatures using Pathway Complexity. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 2017, 375, 20160342. [Google Scholar] [CrossRef]
  6. Pagel, S.; Sharma, A.; Cronin, L. Mapping Evolution of Molecules Across Biochemistry with Assembly Theory. arXiv 2024, arXiv:2409.05993. [Google Scholar] [CrossRef]
  7. Sharma, A.; Czégel, D.; Lachmann, M.; Kempes, C.P.; Walker, S.I.; Cronin, L. Assembly theory explains and quantifies selection and evolution. Nature 2023, 622, 321–328. [Google Scholar] [CrossRef] [PubMed]
  8. Łukaszyk, S. Black Hole Horizons as Patternless Binary Messages and Markers of Dimensionality. In Future Relativity, Gravitation, Cosmology; Chapter 15; Nova Science Publishers: Hauppauge, NY, USA, 2023; pp. 317–374. [Google Scholar] [CrossRef]
  9. Łukaszyk, S. Life as the Explanation of the Measurement Problem. Journal of Physics: Conference Series 2024, 2701, 012124. [Google Scholar] [CrossRef]
1
Here, all references to [1] relate to the Supplementary Material of [1].
Figure 1. Alternative definition of the the edge labeling map ϕ for C x C y : ϕ ( e ) = y C z = C x C y (a), ϕ ( e ) = y C z = C y C x (b).
Figure 1. Alternative definition of the the edge labeling map ϕ for C x C y : ϕ ( e ) = y C z = C x C y (a), ϕ ( e ) = y C z = C y C x (b).
Preprints 143158 g001
Figure 2. Simple graphs we examined: one edge (a), two edges (b), three edges (c), square (d), "EM rocket"(e), a complete graph K 5 (f). Red circles indicate a minimum vertex cover.
Figure 2. Simple graphs we examined: one edge (a), two edges (b), three edges (c), square (d), "EM rocket"(e), a complete graph K 5 (f). Red circles indicate a minimum vertex cover.
Preprints 143158 g002
Figure 3. Three assembly spaces of the same string C x = [ 00102031020300102002030 ] : the pathway to produce a vertex cover certificate ( n vcp ( C x ) = 21 4 = 17 steps) (a), the pathway taking into account the general distributions of substrings in all strings C x ( n dec ( C x ) = 18 4 = 14 steps) (b), a shortest, assembly index pathway ( a ( C x ) = 17 4 = 13 steps) (c).
Figure 3. Three assembly spaces of the same string C x = [ 00102031020300102002030 ] : the pathway to produce a vertex cover certificate ( n vcp ( C x ) = 21 4 = 17 steps) (a), the pathway taking into account the general distributions of substrings in all strings C x ( n dec ( C x ) = 18 4 = 14 steps) (b), a shortest, assembly index pathway ( a ( C x ) = 17 4 = 13 steps) (c).
Preprints 143158 g003
Table 1. Assembly indices a ( C x ) of target strings C x constructed [1] for the Vertex Cover Problem graphs G; minimum and maximum assembly indices a min ( N x ) , a max ( N x , b ) for N x ; numbers of steps leading to C x used in [1] ( n vcp ( C x ) ) and derived here ( n dec ( C x ) ), for examined graphs.
Table 1. Assembly indices a ( C x ) of target strings C x constructed [1] for the Vertex Cover Problem graphs G; minimum and maximum assembly indices a min ( N x ) , a max ( N x , b ) for N x ; numbers of steps leading to C x used in [1] ( n vcp ( C x ) ) and derived here ( n dec ( C x ) ), for examined graphs.
Graph | V ( G ) | | E ( G ) | τ C x N x a min ( N x ) a ( C x ) n dec ( C x ) n vcp ( C x ) a max ( N x , b )
one edge 2 1 1 [ 00102102001020 ] 14 5 8 9 11 12
two edges 3 2 1 [ 00102031020390102002030 ] 23 7 13 14 17 21
three edges 4 3 1 [ 001 0410 4001040 03040 ] 32 5 18 19 23 30
square 4 4 2 [ 001 0410 4001020 04030 ] 37 7 20 23 26 33
"EM rocket" 6 7 3 [ 001 0610 6001020 04060 ] 60 7 32 37 41 58
K 5 5 10 4 [ 001 0510 5001020 04050 ] 71 9 40 48 44 < 56
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated