2. An Alternative Definition of an Assembly Space
An assembly space
is defined in [
1] (cf. Definition 8) as an acyclic directed graph
, where
V is the set of vertices and
E is the set of edges together with an edge labeling map
.
contains a finite and non-empty set of vertices
that form the basis of
, each reachable only from itself. All remaining vertices of
are reachable from a vertex in
.
The edge labeling map
cleverly defines the assembly step. Namely
allowing to write
,
, and
However, the commutativity of the relation (
2) cannot imply the commutativity of the string concatenation. If all assembly
objects in
are strings
does not imply
, as might be expected for string concatenation, but it actually implies also
. Therefore, a string assembly space is endowed in [
1] with the additional property (cf. [
1], Definition 19 and Eq. (10))
and likewise for the 2
nd edge
.
However, the property (
3) makes the terminating string
z unresolvable if
. Consequently,
z cannot be used in subsequent assembly operations. Here, we provide an alternative definition of the assembly space.
Definition 1 (Assembly Space).
An assembly space is an acyclic digraph of strings , where all unit length strings (basic symbol(s)) are inaccessible source vertices and the remaining strings are 2-in-regular assembly steps vertices, E is a set of edges, and is an edge labeling map, wherein an assembly step consists of forming a new string from two not necessarily different strings , by concatenating them with each other, establishing edges and , and assigning, strings , to edges , e using ϕ as
where ∘ denotes the string concatenation (strcat) operator.
The definition of edge labeling map (
4) is possible if only
, i.e., for more than one basic symbol, as in that case we can say that a given inaccessible symbol is the
one, another is the
one, and so on; we can sort them. Otherwise, the notion of a
concatenation direction is pointless for one symbol only. Contrary to the previous definition of the labeling map
[
1], the relation (
4) preserves the commutativity of the assembly step but defines the order of concatenation of the strings, as - in general - for different strings
.
The definition 1 is consistent: all vertices are unique (in any standard graph all vertices should be unique), and all are strings. Since an assembly step always consists of joining two parts only [
5], this can be thought of as the left and right fragments of the newly formed string [
3], and those strings that can be the result of concatenation of two shorter strings are assembly step 2-in-regular vertices, while unit-length strings are inaccessible. Remarkably, the uniqueness of each vertex is the sufficient criterion to establish if an assembly step is allowed (cf. [
1], Definition 10) and to introduce the notion of an assembly pool: vertices (strings) present in the assembly space can not be
assembled again, possibly using different pathways, as they would not be unique; they can only be used in assembly of other strings. What is allowed is the evolution of assembly pathways to make them shorter, as shown in
Figure 1. This evolution seems to be stimulated by the trend to decrease the assembly depth [
3,
6].
3. The Assembly Steps Problem is NP-Complete
In order to show that any instance of Vertex Cover Problem
, where
is a graph,
is the set of vertices and
is the set of edges and
k is the cardinality of a set of vertices that includes at least one vertex of every edge of
G, which is known to be NP-Hard, can be reformulated in polynomial time as an instance of the Assembly Index Problem, the following procedure is offered (cf. [
1], Section 4.2). For a given instance of the Vertex Cover Problem
, where
, and
is the vertex cover number (the size of a minimum vertex cover), an instance of the Assembly Index Problem
is constructed, where
is a constructed assembly space, and
is the target string for which the assembly index
is to be determined. It is then claimed that a certificate for the Vertex Cover Problem
containing a subset
of vertices of
G that includes at least one vertex of every edge of
G can be used to produce a certificate
for the Assembly Index Problem and vice versa, where
is a rooted subspace (cf. [
1] Definition 15) of the assembly space
containing only a proper subset
of the strings of the form
. Hence, such an instance of the Assembly Index Problem would be logically equivalent to an instance of the Vertex Cover Problem from which it was constructed.
The construction of
(cf. [
1], Section 4.2) begins with defining the basis of the assembly space
(cf. Eqs. (17), (50)), i.e., the unit-length strings
containing
symbols of vertices
, and a special symbol that here we call "0" (it is defined as "#" in [
1]). Hence,
. Then, a set of
vertex strings
is assembled (Eq. (18)). Subsequently, a set of
edge strings
is assembled (Eq. (19)). The last step of the construction of
is a sequence of
strings
and
strings
defined in [
1] by Eqs. (20)-(25), where the target string
is defined as the last string of this sequence and
Finally ([
1], Section 4.2.3) it is claimed that given
is a certificate for the Assembly Index Problem if the set (
5) is a vertex cover of
G with size
k, i.e. a certificate for the Vertex Cover Problem is given, wherein
is the assembly index of string
and
which depends on
k and is minimal if
.
By construction, the basic symbols (
6), the edge strings (
8), and the sequence strings (
9a) and (
9b) contained in
must also be contained in
(certificate). However, the vertex strings (
7) of the form
are the exception, as each of the edge strings (
8) can be assembled from strings (
7) in one of the two mutually exclusive steps (cf. [
1] Eqs. (53), (54))
leaving some of the strings
or
redundant. It can be seen by comparing the cardinalities of the spaces
(
10) and
(
11), which - as expected - leads to
. There are
strings (
7) in
and only
strings in
.
By construction (
9a), (
9b), the target string has the form
where the
is a vertex-specific part of
depending solely on
and its explicit form is given by the formula (
9a), and the
is an edge-specific part of
, generated by the formula (
9b), and depending both on
, edge vertex assignments and the order of labeling of the edges of graph
G. However,
, as the length of each edge string (
8) is five and there are
such strings in
and
. Therefore, the length of the target string is
Furthermore, by construction
contains two copies of the string
of length
having the assembly index equal to
as it does not contain any repetitions of substrings. We can take advantage of the fact that each
m copies of an
n-plet
contained in a string decrease the assembly index of this string at least by
[
3], where
is the assembly index of this
n-plet, to estimate the upper bound for the assembly index of
reduced by the presence of these two copies of
. Furthermore, excluding the degenerate cases of empty and disjoint graphs
G, we can further infer some information about
. That is, since any vertex
is a part of some edge
,
contains at least two repetitions of doublets
(or
), with
as the string
also contains
such doublets, and each repetition decreases the assembly index by one. Hence, the upper bound must be further decreased by
. Finally, each string
contains
repetitions of a doublet
and, hence, the upper bound must be further decreased by
. Therefore, the initial upper bound on the assembly index that amounts to
[
5] if
[
3] decreases to
which, in contrast to
(
11), is independent of
k.
We have examined a few simple graphs, shown in
Figure 2, obtaining the results listed in
Table 1.
As an example, consider the trivial graph
, shown in
Figure 2(b) having two edges connected at one vertex. Hence, its vertex cover number is
. In this case, (
10)
and the target string generated by sequences (
9a) and (
9b) has the form
As the vertex cover of the graph
G is the vertex 2, the subspace
(the certificate) is devoid of triplets
and
, since the edges
and
share the vertex 2, and the edge strings (
8) could be assembled as
and
. Therefore, the number of steps on the assembly pathway of
defined by
, given by the relation (
11), amounts to
, as shown in
Figure 3(a) also illustrating the assembly depth [
6] (
) of this string: 7 steps (1-7) for vertex strings (
7), 2 steps (8, 9) for edge strings (
8), 6 steps (10-15) for sequence strings (
9a), and 2 steps (16, 17) for sequence strings (
9b)) which corresponds to the vertex cover number
, if only the string (
16) is assembled using the set of allowed assembly operations defined by the equations Eqs. (38)-(45) of [
1].
However, imposing such a set of allowed assembly steps deviates from the principles of assembly theory that assume the possibility of assembling any
object from any two
objects in the assembly pool. Even if we assume that only some steps are allowed and some are not due to peculiarities of the assembled data structures, this is certainly not the case for strings, considered in [
1] in the proof of Lemma 3. All strings are possible and mathematically well defined [
7]. What could be the reason for allowing the assembly of a string
and disallowing the assembly of a string
from a set of basic symbols
? The evolution of information became possible as soon as a first bit, not a first
particle or
object, became accessible [
2,
3,
8,
9].
Therefore, the assembly index of the string (
16) is
. One of the shortest pathways of the string (
16) is shown in
Figure 3(c) with
. A quadruplet
present in two independent copies is assembled in step 5, 5-plet
present in two copies is assembled in step 6. Furthermore,
contains two independent copies of
,
, and
. A slightly longer pathway leading to the string of length (
15) is shown in
Figure 3(b).