Why a Constant Number of Vertebrae? Digital Control of Segmental Identity during Vertebrate Development

It is not understood how the numbers and identities of vertebrae are controlled during mammalian development. The remarkable robustness and conservation of segmental numbers may suggest the digital nature of the underlying process. The study proposes a mechanism that allows cells to obtain and store the segmental information in digital form, and to produce a pattern of chromatin accessibility that in turn regulates Hox gene expression specific to the metameric segment. The model requires that a regulatory element be present such that the number of occurrences of the motif between two consecutive Hox genes equals the number of segments under the control of the anterior gene. This is true for the recently discovered hydroxyl radical cleavage 3bp‐periodic (HRC3) motif, associated with histone modifications and developmental genes. The finding not only allows the correct prediction of the numbers of segments using only sequence information, but also resolves the 40‐year‐old enigma of the function of temporal and spatial collinearity of Hox genes. The logic of the mechanism is illustrated in the attached animated video. How different aspects of the proposed mechanism can be tested experimentally is also discussed.


Numbers in Metazoan Body Plans
Across the animal kingdom, strong conservation of numerical aspects of the body plan is observed. [1] The numbers of digits, teeth, segments, or even the total number of cells are often invariant within species. In vertebrates, including humans, the numbers of metameric segments in each region of the spine (cervical, thoracic, lumbar, sacral, coccygeal), as well as the total number of vertebrae, are highly conserved not only within a species, but DOI: 10.1002/bies.201900133 also often for larger phyla, [2] for example, almost all mammalian species have exactly seven cervical vertebrae, [3] turtles have eight cervical and ten thoracic vertebrae, etc. It is, however, not understood how these numbers are encoded in the genome, or which mechanism is responsible for reliably generating the correct numbers of vertebrae and their respective identities. Such a mechanism of counting segments would have to be very robust. In humans, the only relatively common variation is the L5 sacralization; apart from that, the numbers of segments within each region vary (usually on one side) only in certain rare cases of congenital vertebral malformations, for example, in hemivertebrae or cervical ribs. Studies in model animals show that the total number of somites (segment precursors) is not affected by reducing the size [4] or axial length [5] of the embryo, nor by environmental factors, cell size or ploidy, nor by interventions such as notochord ablation. [6] While the prevailing view in the field is that the numerical aspects of segmental identity are determined by the cells sensing a gradient in concentration of a morphogen or by precise, segmentspecific timing of expression, [7] there is no satisfactory explanation of how the cells could perform an absolute measurement of morphogen concentration or of time with the high precision required for unfailingly determining their location along the anteroposterior axis. Likewise, it has been proposed that the total number of somites is controlled by a finely tuned ratio between the somite cycle and the total duration of somitogenesis until the presomitic mesoderm (PSM) is exhausted; [8] however, later experiments contradict this notion, [6] and no explanation is offered for the high precision and reproducibility of the process.
www.advancedsciencenews.com www.bioessays-journal.com collinearity: a conserved arrangement on chromosomes that is the same as their order of activation along the body axis. The regulation is very precise, for example, the regions of activity of Hox genes are tightly confined to specific rhombomeres [11] or to segments of the vertebrate anteroposterior body axis. [12] The vertebrate Hox genes are synchronized: the spatiotemporal expression domains of paralogs from the A, B, C, and D clusters are generally identical along the anteroposterior axis, [13] with only minor differences between the clusters (e.g., in members of the Hox4 group of paralogs).
Despite 40 years of active research, the mechanisms of Hox gene regulation have remained elusive. Evolutionary conservation suggests that the coupling between collinearity in the chromosomal scale and collinearity along the body axis is significant, but neither its function nor the underlying mechanism has been explained. [14,15] Hox genes tend to be inhibited by more posterior ones, but this process (termed "posterior prevalence") appears not to be universal, and it is likely secondary to an unknown primordial mechanism of regulation. [16] More importantly, posterior prevalence can only account for the posterior boundary of the expression domain of a Hox gene. The anterior boundaries appear to be functionally more relevant for patterning the vertebral column, but the mechanism of establishment of the anterior boundaries of Hox expression domains has remained unknown. Recent evidence suggests that chromatin structure [17] and histone demethylation, especially removal of the H3K27me3 histone mark, play important roles in activation of Hox genes, [18,19] also numerous enhancers have been identified, especially for the posterior Hox genes. [20][21][22] Nonetheless, the mechanism precisely directing chromatin modifications to specific loci at the right time and place has remained mysterious. [21] It has been argued that the ancestral patterning strategy is based on progressive opening of chromatin in the Hox clusters in the 3′-5′ direction, [18,23] possibly mediated by a timer mechanism. [15] It is, however, not known how the speed of this progression would be controlled, and how the mechanism would be initiated or terminated in a segment-specific manner. Also the enhancer landscape evolves fast within Hox clusters [24] and is unlikely to be at the base of a primordial process that is responsible for the Hox gene collinearity and is shared by divergent groups of bilaterian animals. Despite significant effort, neither the CTCF-binding site nor long non-coding RNAs have been confirmed to play a primary role in regulation of chromatin in the Hox clusters. [22,25] Recently, we discovered the HRC3 signature, a conserved regulatory element associated with most Hox genes and with other developmental transcription factors. [26] The element is approximately 180 bp long, and is defined by a pattern in the DNA structure rather than its sequence. Specifically, the signature is defined by a significantly high peak in the periodogram [27] of the hydroxyl radical cleavage (HRC) pattern of the DNA [28] at period P = 3 bp, measured over a 180 bp interval (this mathematical definition is only an approximation of the sensitivity of the biophysical process underlying the function of the element, and some false positives or false negatives may be possible). In most Hox genes, an HRC3 motif coincides with the homeobox, a sequence encoding the DNA binding domain of these genes, these motifs are also found in multiple intergenic regions within the Hox clusters and other regions of the genome. Analysis of chromatin immunoprecipitation (ChIP-seq) experiments shows that the HRC3 loci show a genome-wide association with binding of histonemodifying enzymes, suggesting that the element may be involved in directing chromatin modifications to the specific loci, and thus play a role in regulating Hox gene expression.
The anterior boundaries of Hox expression appear to depend on the somitic number rather than on the original axial positioning. [29] Therefore, it is feasible to speculate that the underlying regulatory mechanism is based on digital counting rather than on sensing the level of a morphogen. In this article, I postulate such a direct link that relates the somitic number with the chromatin state. I further describe the most intriguing property of the HRC3 motif: the number of occurrences of the motif between two consecutive Hox genes corresponds to the number of segments under the control of the anterior gene. I further outline the logic of a possible mechanism of segment counting and explain the role of the HRC3 regulatory element in the process.

Analog and Digital Processes
There are two main types of control systems: analog and digital. The outputs of analog systems are continuous functions of the input parameters; such systems are fast, simple, and intuitive, but they are limited in their precision and stability, and they are susceptible to external interference. It is also very difficult to design analog systems capable of executing complex logic. On the other hand, digital or discrete systems are capable of executing complicated tasks, switching between distinct states or modes of operation, and can be stable even in multiple-inputmultiple-output designs. [30] Most known dynamical systems in biology are analog, [31] with the notable exceptions of nucleotide sequence replication, transcription, and translation, which are fully discrete. The process of developmental patterning, including segmental identity, has many properties of a system requiring digital control, such as the high complexity and the remarkable stability and reproducibility of the topology of the body plan that is the product of the system.

Is Control of Segmental Identity Analog or Digital?
Here, I propose that the high robustness and high complexity of the mechanism of segmental identity suggest that a fundamentally digital mechanism plays an important role in the process.
Cells within a developing vertebrate body all have the same genome, but they need to perform logical tasks, such as "if I am in segment 19, I should produce a floating rib." To do this, the cells need to first acquire the segmental information, and then use this information to turn on the expression of segment-specific transcription factors (as Hox genes) that will execute the task. The differences between analog and digital control of segmental identity are outlined in Table 1. The analog model requires only one, purely chemical step, but the process would have to be able to very precisely distinguish between the absolute concentrations of a morphogen (such as RA or FGF) characteristic of the neighboring segments. With 30-50 segments, the segment-to-segment Table 1. Models of control of segmental identity.

Analog Digital
• Activate a Hox gene if concentration of a morphogen crosses a specified threshold.
• Acquire segment number information from environment in a digital form (e.g., N = 19). • Accumulate and store the digital information.
• Use the information to produce the correct pattern of gene expression.
• Conceptually simple but requires highly precise chemistry to be stable.
• Sharp boundaries of expression domains are difficult to achieve.
• Complex, multistage logic, but the system is inherently stable and robust.
• Sharp boundaries are natural for a system built with discrete on/off switches.
differences in concentration may be of the order of 2-3%, in other words a 2-3% error in sensing the concentration will result in the cell executing the developmental program of the wrong segment, potentially leading to a developmental malformation. The required precision may be unreachable for multiple reasons, including cell-to-cell variation resulting from Poissonian noise (error level of 4-5% assuming ≈500 morphogen molecules per cell [32] ). Even if a cell could count all the morphogen molecules within its volume, it might be unable to recognize to which segment it should belong. On the contrary, equipped with a digital counter, a cell would be able to express the genes according to its segment number (an integer such as "13" or "22"), rather than to its position along the body axis (e.g.," ≈31% from the head, ≈69% from the tail, ±5%").
Also, in analog systems, the outputs are continuous functions of inputs, so in a system driven by a gradient of a morphogen, one would expect to find a gradient of Hox gene activity, as opposed to the sharp limits of expression domains that are observed across animal phyla. [33] A digital system, built from bistable elements that explain the sharp boundaries of expression domains, will be more resistant to noise, and generally more robust than any analog mechanism, but an implementation of complex logic is required for such a system to work. Digital systems are less intuitive than analog ones and have not been widely discussed in the context or regulations in development. In Sections 2.2-2.4 below, I will argue that all the components of a digital segment-counting system are present in animal cells, specifically in the cells of the PSM, the embryonic tissue where vertebrate segmentation occurs.

The Segmental Number Is Accessible to Cells in the PSM
In vertebrates, the segmentation is established during somitogenesis, when precursors of segments, called somites, are produced in the embryo's paraxial mesoderm. Somites are clusters of cells with specific patterns of gene expression that are formed by waves of expression of Wnt/Notch and FGF transcripts traveling along the anteroposterior body axis. [34][35][36][37] The waves of expression travel from the tail toward the embryo's head. Somites are created sequentially, starting from the anterior end of the PSM, at the locations where the wave stops just before reach-ing the last previously formed somite. In other words, once a somite has been formed, the waves of expression can no longer travel through its location. Consequently, the number of waves of WNT/Notch/FGF expression that have traveled through a location (or the number of times the genes were upregulated) is specific to each somite, and corresponds to the somite's identifying number (counted from the head). A schematic representation of the time-dependent expression of somite genes in different locations along the PSM is shown in Figure 1. As a consequence of the unique geometry and mechanics of the somitogenesis waves (waves travel back-to-front, but stop in locations front-to-back), during the process of somite formation, a cell has direct access to the information of its segmental number, in a digital form. To know precisely in which segment it is located, it is enough if a cell could count the number of times the genes in Wnt or Notch pathways are upregulated.
It is important to note that no other process is known that could provide this information to a cell, either before the onset of somitogenesis or after its completion; outside of somitogenesis, a cell has no known means to "look outside" and count the segments between its location and the anterior end of the body. Moreover, tissue transplant experiments, for example, a study by Narayana and Hamburge, [38] suggest that segments do indeed acquire their identity during somitogenesis. The role of somitogenesis in coupling between the temporal and spatial patterning has been recently hypothesized by Durston et al.; [39] however, no specific mechanism has been presented; also see work by Durston et al. [40] It should also be noted that the general principle underlying somitogenesis, along with the pathways involved and the overall spatiotemporal structure, is both robust and well conserved, [29,34,35,37,41] which makes the somite clock a good candidate for the background mechanism underlying the segmental counter.

Counting the Notch Peaks: The Chromatin Abacus
I have argued above that a cell in the PSM may acquire the information of segmental identity by counting the peaks of expression of Wnt or Notch genes. It remains to be answered which molecular mechanism may serve as the peak counter within the cell. The most natural implementation of a digital counting device is the abacus; in its simplest form, a line of ordered beads that may be in one of two discrete states ("left" or "right"). Here, I propose that a counter using the simple abacus principle may be at work at the level of the chromatin organization.
The postulated mechanism requires that the chromosomal region involved in segment counting contains predefined "checkpoints," that is, specific loci that may serve as boundary of active chromatin. The boundary progressing to the next checkpoint would then be equivalent to shifting one abacus bead from right to left, or incrementing the segment counter (performing the N: = N+1 arithmetic operation). Displaying discrete checkpoints, the counter is truly digital, that is, it operates on integers rather than on continuous variables. A schematic view of such mechanism is shown in Figure 2 and in the top panel of Figure 3. The biophysical details of the mechanism are yet to be elucidated; the underlying principle may be based on the H3K27 demethylation machinery opening the chromatin while progressing along the chromosome, and stalling on barrier checkpoints. The barriers can be crossed only during one of the phases of the somite cycle, possibly during high activity of the Notch pathway, see bottom panel of Figure 2. Such stop-and-release mechanism can assure that exactly one checkpoint is crossed at each peak of Notch expression. In Section 2.5, I will present evidence that the newly discovered HRC3 motifs are likely acting as the postulated chromosomal checkpoints.

The Chromatin Counter Produces Segment-Specific Expression of Hox Genes
I have argued that a segment counter can exist that acquires and remembers the segmental identity of each cell in the PSM by incrementing the counter state (shifting the boundary of active chromatin) each time the somite clock genes are upregulated. In this section, I will show that the state of such a counter can be "read" and translated into an expression pattern that is specific to every segment of the body. The basic idea of such a process is outlined in the bottom panel of Figure 3. Since every subsequent segment has a unique, characteristic profile of open (permissive) chromatin in the counter region, any genes located in this region will be available in specific segments, and blocked in others.
Note that the genes regulated by the counter should be organized in a cluster, and that their order of expression in time as well as along the body would directly correspond to their position in the cluster. The collinearity of genes regulated by the digital segmental counter is exactly the same as the collinearity of the Hox genes that has remained a mystery since the genes were discovered. Therefore, I postulate that the "counting chromatin" in vertebrate animals coincides with the clusters of Hox genes, and Hox genes are activated in a segment-specific manner by the counter progressively removing the H3K27me3 marks along the Hox cluster. A boundary of open chromatin in Hox clusters that is progressively repositioned with time during somitogenesis has been observed previously; [42] however, the resolution of the presented data is insufficient to determine where precisely the boundary is localized at a given time and in a given position along the embryo's axis. If my hypothesis is true, then one would expect to find the checkpoint loci within the Hox clusters, and their numbers of occurrences would correspond to the number of segments under the control of the specific Hox genes. Specifically, the number of checkpoints between two neighboring Hox genes will be equal to the number of segments where the 3′ gene is active and the 5′ gene remains silenced (see schematic in bottom panel of Figure 3).

The HRC3 Motif Functions as the Chromosomal Checkpoint
I have outlined a mechanism of establishing segment-specific gene expression patterns; the postulated process would require checkpoint loci in multiple positions within the Hox clusters; the checkpoints acting as boundaries of active chromatin fixed between peaks of Notch activity. The abovementioned properties of the checkpoint match the characteristics of the recently discovered HRC3 signature.
Additional evidence supporting the hypothesis comes from a search for sequences with the HRC3 signature in all locations along the Hox clusters, both coding and non-coding. The finding (based on applying the definition by Fongang et al. [26] to every position in the Hox clusters) has revealed that several dozen HRC3 motifs are found in each human and mouse Hox cluster (Figure 4). The numbers of HRC3 motifs (red) between neighboring genes are generally conserved between clusters and agree with the numbers of segments under the control of each gene, or more precisely to numbers of segments between anterior boundaries of the expression domains of these genes. For example, seven HRC3s are present between Hox4 and Hox6 in clusters A, B, and C, which may correspond to the seven cervical vertebrae.
Moreover, the hypothesis agrees with the gene expression profiles observed in the PSM. For example, Hox10 is activated in the PSM during the 21st oscillation of the somite clock (see bottom panel of Figure 4 plotted based on the data from Dequeant et al. [34] ), which agrees with its expression boundary after somitogenesis (at L1 or segment 21, see ref. [43]), and with 21 HRC3 motifs between the 3′ end of the HoxA cluster and the HoxA10 transcript.
Finally, the stop-and-release model of progressive chromatin modifications explains the observed enrichment of histonemodifying enzymes at the checkpoint motifs, where they are parked during low Notch activity. [26] These findings strongly support the central hypothesis of this article. It has permitted the development of a model of epigenetic regulation during embryonic development in which the HRC3 motif acts as an "address" to which the histone modifications are directed that mark transcriptionally active chromatin. Within the Hox clusters, chromatin is sequentially opened over an interval from the current boundary to the next HRC3 motif upon stimulation of the Wnt-Notch-Fgf pathways. [37] This model correctly predicts the numbers of segments in the vertebrate body and explains colinearity of Hox genes and other developmental transcription factors.

Can the Counter Account for the Conservation of Segmental Numbers?
I have shown how a single Hox cluster can implement a somite/segment counter in a vertebrate animal. It appears that a simple mutation, such as deletion or duplication of a counting motif, would result in a change in the number of vertebrae, something that is not typically observed in nature.
Most deuterostome genomes, including mammals, contain, however, not one but four (land vertebrates) or seven (teleost fish) clusters of Hox genes. In an animal with multiple Hox clusters, if a mutation occurs that deactivates an existing HRC3 motif, or introduces an additional one through duplication, the change will affect only one cluster of Hox genes. As a result, the expression patterns of Hox genes from different clusters would lose their synchrony, for example, if a motif was lost in Hox cluster A, cells in segment 20 would express genes dependent on HoxA according to the program for segment 19, but genes dependent on HoxB, HoxC, and HoxD as in segment 20 (see schematic in Figure 5). Such loss of synchronization would likely lead to detrimental or lethal developmental defects putting such mutant at a strong evolutionary disadvantage.
A rescuing mutation in each remaining Hox cluster would be required to fully restore the synchronization of Hox expression domains. This may explain why the segmental numbers are generally very well preserved in the animal kingdom, or at least among vertebrates: a simultaneous quadruple mutation changing the number of segments without significant detrimental  [34] The spatial expression boundary of these genes lies at segment 21, there are 21 HRC3 motifs in the respective cluster on the 3′ side of each of the genes, and the genes are activated after 21 waves of somitogenesis have passed the sampled region of the embryo. effects is highly unlikely. The explanation can also account for the fact that the conservation is much stronger in the anterior part of the body-a mutation changing the number of checkpoint motifs would affect gene expression in every segment posterior to such mutation. In the archetypal example of segmental number conservation (the seven cervical vertebrae in mammals), only a few exceptions are known (sloths, sirenians), and these animals show significant changes affecting their posterior body plan and limbs/digits. Such pleiotropic effects are also observed in humans suffering from a developmental defect. [44] For example, a cervical rib [45] is often associated with missing 12th rib [46] and with L5 sacralization, [47] which can be explained by partial (heterozygotic) loss of function of one of the checkpoint motifs. Strong associations between the malformations of the different organ systems and vertebral patterns have been observed. In an anatomical study of miscarried fetuses, only 20.6% of 1062 deceased human fetuses had a normal vertebral pattern. [48] The expression pattern of Hox genes is important not only for establishing the skeletal features, but also for other organs. Indeed, skeletal developmental abnormalities are often associated with cancer, [46] which is consistent with the pleiotropic effects of a shift in Hox gene expression domains, and the fact that tumors of different types have the characteristics of aberrant cellular development. [49] Note that in many experiments involving viable artificial Hox mutants, the function of the Hox gene is lost, but the HRC3 motif associated with the gene remains intact, and the experiment results in a homeotic transformation without a homeotic shift, leaving the number of segments remains unaffected. [50]

Possible Tests of the Counter Hypothesis
The model of somite counter presented in this article leads to numerous predictions and can be tested experimentally in a number of ways. First, the progression of the boundary of the active chromatin to the next HRC3 motif should be measurable by differential single-cell ChIP-seq of PSM samples at different somitic numbers. Second, deleting or duplicating HRC3 motifs in non-coding locations is expected to lead to homeotic shifts and predictable developmental malformations; the effects should be observable by multichannel fluorescent in-situ hybridization (FISH) to detect changes in anterior boundaries of Hox gene expression domains in embryos after completing somitogenesis stage. Finally, a quadruple, homozygotic deletion of a non-coding checkpoint motif in every Hox cluster, is expected to produce a fully viable animal, but with a reduced number of segments. While such a mutant would require multiple generations of breeding and genotyping of conditionally induced deletions, it would result in a colony of mice with six cervical vertebrae, most likely unable to produce viable offspring with wild-type animals (if deleted non-coding motifs 3′ of Hox6): technically this experiment would result in a new species with an artificially designed body plan.

Conclusions
In this article, I present a new hypothesis of an integrated primordial mechanism that explains several aspects of development, conservation of synteny, genome organization, and evolution. The hypothesis is especially promising in that it is based on one mechanism accounting for all these aspects at the same time; such parsimony is often a signature of successful theories. [51] The outline of the proposed mechanism can be summarized in the following steps: 1) Acquiring segmental identity. The identity of the segment is directly available to any cell in a vertebrate's PSM. The numerical segmental identity (e.g., "segment number 19") is determined by the number of somitogenesis waves that had traveled through the cell. Each somite wave is associated with a strong peak of Notch activity. 2) Cells remembering their segmental identity by counting somites. The segment identity can be digitally encoded as the information is stored in the state of chromatin, if a chromatin mark progresses to a subsequent checkpoint (the next HRC3 motif) along the chromosome each time the Notch pathway peaks. 3) Translate segmental identity to the body plan. This process progresses along the Hox cluster, and it produces a pattern of chromatin state specific to every segment, these patterns are in turn directly translated into the activation of specific Hox genes in the specific segments.
The postulated mechanism can work only if the Hox genes are colinear within their clusters; my hypothesis is the first to explain the evolutionary pressure on maintaining the collinearity of Hox genes that has remained a mystery since the Hox clusters were discovered 40 years ago. [9] The distribution of the HRC3 motifs in the Hox clusters directly translates to the numbers of repeats of structures in the body plan. Specifically, from the genomic sequence alone, we can now directly predict the numbers of cervical and thoracic vertebrae in a vertebrate's body. This suggests that this aspect of development is indeed digital, or computable. It is then likely that a similar digital mechanism of directing chromatin modifications to the specific loci may explain other aspects of the body plan and of development in general, supporting the notion of a computable embryo. [1,52] Specifically, coupling a biomolecular oscillator to developmental patterning and cell fate may be responsible not only for metameric segmentation, but also for other aspects of morphogenesis, as patterning of limbs, neurogenesis, [53] or myogenesis. [54] The general concept may also be relevant to segmentation in invertebrates, for example, short germ-band insects, where segments are also produced sequentially, and transcriptional oscillations have been observed [55] (long-germ-band insects appear to be evolutionarily optimized for speedy development, possibly a variant of the process is at play that starts the chromatin counter from different origins in different parts of the developing embryo), while the order of the Hox genes on the chromosome is conserved because of coupling with counters in other regions of the genome.
The association of HRC3 motifs with developmental transcription factors-also outside the Hox clusters and also in invertebrates [26] -suggests that a version of such "computable embryo" theory might also apply to other aspects of evolutionary developmental biology, and may lead to the solution of other open questions, such as the origin of larval forms, the nature of the protostomal-deuterostomal ancestry, and more.