Submitted:
27 January 2026
Posted:
28 January 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Background: Canonical String-Based Assembly Theory
- is the set of objects, taken to be the set of all non-empty substrings of w;
- R is the set of binary join rules of the formfor all such that .
- (a)
- each is either a single letter in Σ or is obtained from two earlier objects by a rule in R, i.e. there exist with and ;
- (b)
- at least one occurrence of w appears among the .
3. Templated Assembly Spaces
- F is neither all zeros nor all ones;
- T is obtained by replacing each block byand setting .
-
is the set of objects, defined asThus every object is either a substring of w (containing no wildcards) or a block-compressed template for w (containing at least one wildcard and at least one literal segment).
- is the set of concatenation rulesfor all such that .
-
is the set of template instantiation rules. For a template let denote the set of positions of wildcards in T, i.e.For every non-empty subset and every object such that , we have a rulewhere denotes the string obtained from T by replacing each wildcard at a position in S by u and leaving all other positions unchanged. In particular, if is a singleton, we obtain a single-star substitution at position p, and if we obtain a fully parallel substitution that replaces all wildcards in T by u simultaneously. As before, we explicitly require ; this is automatically satisfied whenever u is a substring of w, but in general we impose it as part of the no-trash condition.
- (a)
- each is either a monomer (a single letter in Σ or the wildcard *) or is obtained by applying a rule in to earlier objects;
- (b)
- at least one occurrence of w appears among the .
3.1. Illustrative Example. Selective Template Mining and When Wildcards Help
- Single-use templates. If a template T is assembled and instantiated only once, then introducing it typically does not help: since T is not a primitive object, one must pay (via ) for constructing its skeleton, and then pay an additional unit-cost instantiation step in . In particular, when (i.e. T has a single wildcard position), an instantiation step provides no parallelism across multiple sites; any advantage over purely concatenative assembly is therefore not automatic and depends on how the surrounding context is reused elsewhere in the plan.
- Amortisation by reuse. Templates can become beneficial once the same assembled skeleton is instantiated multiple times within an assembly plan (possibly with different fillers). In that case, the one-time cost of constructing T can be amortised across several instantiations. This effect can already occur for , but it requires repeated use of the same contextual “frame” represented by T.
- Parallel filling of several sites. Even for a fixed target w, a single instantiation rule may fill several wildcard positions at once (choose ). Thus, templates with enable a genuinely new mechanism: one operation in can realise multiple occurrences of a motif simultaneously, which is the main source of potential step savings relative to purely concatenative assembly.
3.2. A Worked Example: Separating and
- Canonical assembly plan (showing ).
- Templated assembly plan and .
3.3. Additional Worked Example I: Separating ASI and TAI
- Canonical assembly plan (showing ).
- Templated assembly plan and .
3.4. Additional Worked Example II: Separating ASI and TAI
- Canonical assembly plan (showing ).
- Templated assembly plan and .
4. Relation to Grammars and Pattern-Based Formalisms
4.1. A Greedy Macro-Grammar Heuristic for Approximating
- Candidate templates and filler families.
- A simple gain score.
- A sketch of a greedy heuristic.
- Mine candidate skeletons together with admissible fillers u for which , and compute in the current working representation.
- For each T, choose a filler family U (e.g. all admissible fillers, or the best-scoring subset) and compute .
- Select a skeleton T with a filler family U of maximum positive gain.
- Commit to constructing T once (using ), constructing each once, and generating each chosen occurrence of via a single rule application in (fully parallel substitution, ).
- “Compress” the working representation of w by treating each generated occurrence of each as an available object, and continue searching for further profitable candidates on the remaining structure.
5. Computational Complexity
- Instance: a string and a non-negative integer k (given in unary or binary).
- Question: does there exist a templated assembly plan for w of total cost at most k in the templated assembly space ?
- On hardness.
6. Discussion
- Chemistry and molecular design: Many molecules can be described as relatively rigid scaffolds decorated by repeated functional groups: aromatic cores bearing several identical substituents, oligomers with the same side chain attached at multiple positions, or dendritic architectures built from a small set of branching units. In a string-based representation of molecules (for example a linearised encoding of a core with substituents), such patterns correspond to templates where wildcard positions mark sites of functionalisation. Canonical assembly indices quantify reuse of repeated fragments, but they do not distinguish between independent repetitions and repetitions coordinated by a shared backbone. In contrast, templated assembly makes this distinction explicit: a common scaffold with wildcard sites is built once, and then instantiated with chosen functional groups. Molecules or families of molecules with a large gap would then correspond to structures where much of the apparent combinatorial complexity comes from templated substitution patterns rather than from genuinely unrelated submotifs, suggesting applications in quantifying scaffold–decoration modularity in medicinal chemistry, combinatorial library design, and polymer or supramolecular engineering.
- Sequence analysis and genomics: Many genomic regions exhibit modular architectures, with recurring motifs (e.g. transcription factor binding sites, protein domains) embedded in diverse local contexts. Canonical assembly indices already provide a measure of hierarchical reuse, but templated indices could more directly reflect shared scaffolds with multiple occurrences of the same motif. Differential behaviour of and across genomic regions might highlight functionally relevant templated structures.
- Biosignatures and origins of life: Assembly Theory has been proposed as a framework for defining universal biosignatures based on complexity measures derived from putative assembly paces [1,2,3]. The templated assembly index offers a complementary axis: high together with a large gap might signal systems that exploit templated copying of modules in ways characteristic of evolved biological organisation, for example gene families under common regulatory architectures.
References
- Marshall, S.M.; Murray, A.R.G.; Cronin, L. A probabilistic framework for identifying biosignatures using Pathway Complexity. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 2017, 375, 20160342. [Google Scholar] [CrossRef] [PubMed]
- Marshall, S.M.; Moore, D.G.; Murray, A.R.G.; Walker, S.I.; Cronin, L. Formalising the Pathways to Life Using Assembly Spaces. Entropy 2022, 24, 884. [Google Scholar] [CrossRef] [PubMed]
- Sharma, A.; Czégel, D.; Lachmann, M.; Kempes, C.P.; Walker, S.I.; Cronin, L. Assembly theory explains and quantifies selection and evolution. Nature 2023, 622, 321–328. [Google Scholar] [CrossRef] [PubMed]
- Łukaszyk, S.; Bieniawski, W. Assembly Theory of Binary Messages. Mathematics 2024, 12, 1600. [Google Scholar] [CrossRef]
- Masierak, P. Computational Complexity of Determining the Assembly Index. IPI Letters 2026, 9–12. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).