Why do centromeres evolve so fast : BIR replication , hypermutation , transposition , and molecular-drive

analysis of the centromeres from many different species clearly indicated that they evolve unusually rapidly (Henikoff et al. 2001). Some proteins within the kinetochore that binds this DNA also evolve rapidly (Drinnenberg et al. 2016). The most generally accepted hypothesis for this fast evolution is that intragenomic conflict due to centromere drive (a form of female meiotic drive) causes perpetual antagonistic convolution (Henikoff et al. 2001; Burt and Trivers 2006; Malik and Henikoff 2009; Rosin and Mellone 2017). In the model developed here, Break-Induced Repair (BIR) of collapsed DNA replication forks and its down-stream consequences (especially hypermutation, transposition and molecular-drive), are the major causative features for the exceptionally rapid evolution at centromeres. To illustrate the extent and form of rapid centromere evolution, I compare human-chimp nucleotide sequence divergence at centromeres to that of the surrounding chromosomal arms. I first use published data to quantify: i) the local (1 Mb intervals) and chromosome-wide (mean of intervals) sequence divergence between chimps and the human reference genome along the two arms of human chromosome 7 (red dots and horizontal lines in Figure 1A; from Mikkelsen et al. 2005 and Marques-Bonet et al 2009), and ii) this same measure of divergence among different human genomes (blue dots in Figure 1A; from Marques-Bonet et al 2009). I then estimate the minimum centromeric sequence divergence for this chromosome between human and chimp orthologs, as described in Box-1, Figure 1A and the next paragraph.

Human centromeres are composed of very long (average 2-3 Mb; Willard 1991) tandem repeat arrays of Higher Order Repeats (HORs; iterations of groups of two or more monomers) with low sequence variation among HOR units within an array (Willard 1991;Schueler et al. 2001; Schueler and S u l l i v a n 2 0 0 6 ) . B e c a u s e t h e c o n s e n s u s s e q u e n c e o f t h e centromeric HOR of chromosome 7 is known for humans (Waye et al 1987;Rice 2019A) but unknown for chimps, I compared (Box-1 and Figure 1A) the consensus sequence of the human HOR to the closest matching alpha satellite sequence found during the chimp genome project (Mikkelsen et al. 2005). This analysis indicates that t h e c e n t r o m e r e o n h u m a n chromosome 7 and its chimp otholog have diverged by a lower limit of 12.3%: a value far in excess of the maximal sequence divergence seen along the chromosomal arms ( Figure  1A). Average human-chimp sequence d i v e r g e n c e i s 1 . 2 % a l o n g t h e chromosomal arms of chromosome 7 (Mikkelsen et al. 2005). Combining measures, the relative rate of chimph u m a n n u c l e o t i d e d i v e r g e n c e (centromere / arms) is minimally 12.3% / 1.2% = 10.25 times faster for the centromere than the average for the chromosomal arms.
To obtain a more accurate measure of c h i m p a n d h u m a n c e n t r o m e r i c divergence, I next focused on human chromosome 5 (Box-1; Figure 1B). I chose this chromosome because: i) the consensus sequences of the centromere of this chromosome and its chimp ortholog have both been determined (Puechberty et al 1999;Haaf and Willard 1997;Rice 2019A;Rice unpublished), and also ii) the average and range of chimp-human sequence divergence has been measured along the arms of these orthologs (Mikkelsen et al. 2005). At c h r o m o s o m e 5 , t h e e s t i m a t e d sequence divergence between human and chimp centromeres was 39.3% Deviations from the human reference sequence among 100 kb intervals along the arms of chromosome 7. The schematic of the chromosome (along the X-axis) is taken from the UCSC genome browser web page and the red square denotes the centromere within the sequencing gap between the assembled arm sequences. Red dots depict deviations for chimp vs. human, blue dots depict deviations among humans, and the horizontal red lines along the bottom of the graph are the arms-wide average of 1 Mb regions (data are from Mikkelsen et al. 2005 andMarques-Bonet et al 2009). Above the centromere, I show the lower limit for the sequence divergence between the human and chimp centromeres (see Box-1 for details). B. Chimp sequence deviations from the human reference sequence among 1 Mb intervals along the arms of chromosome 7. Narrow, horizontal red lines on either side of the sequencing gap depict the average deviation from the human reference sequence and the wide horizontal black lines depicts the total range of deviations about the mean; from Mikkelsen et al. 2005). The schematic of the chromosome is taken from the UCSC genome browser web page and the red square denotes the centromere within the sequencing gap between the assembled arm sequences. The first red line above the centromere depicts the average pair-wise sequence divergence between the chimp and human centromeric monomers of the same type: an incomplete measure of total divergence between centromeres. The upper red line red line above the centromere depicts a more complete measure of sequence divergence between the chimp and human centromeres: labeled 'total divergence' (see Box-1 for details).

Box-1. Rapid evolution at centromeres is illustrated by the sequence divergence between chimps and humans.
To contrast the rate of chimp-human sequence divergence of centromeres compared to other regions of the genome, nucleotide divergence between humans and chimps was compared between centromeres and the arms of chromosomes. In Figure 1A, the percent sequence divergence of the two arms of chromosome 7 (at 100 kb intervals) is shown between the human reference genome and: i) a single chimp genome (red dots), and ii) different human genomes (blue dots) (data from Marques-Bonet et al 2009). I also display the average of human-chimp divergences (of 1 Mb intervals) across both chromosomal arms (1.2%, shown by the horizontal red lines above the arms; data from Mikkelsen et al. 2005). This graph illustrates the level of variation in divergences about the mean value at small, noncentromeric regions of the genome. The centromere (red square in the chromosome image) resides within the sequencing gap between the chromosomal arms, where divergence values have not been reported.
The human centromeric sequence (i.e., the consensus HOR sequence) for chromosome 7 has been determined with high accuracy (Waye et al 1987;Rice 2019A), but the corresponding centromeric sequence for the chimp ortholog is not known. To obtain a minimum estimate for sequence divergence at the centromere, I blasted the consensus sequence for the centromeric higher-order repeat (HOR) of human chromosome 7 against a large set of archived chimp sequences from the NCBI web site (https:// blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastSearch&PROG_DEF=blastn&BLAST_PROG_DEF=megaBlast&BLAST_SPEC=OGP__9598__12467). The archived chimp sequences were the data set Clint_PTRv2, which contains all assembled chromosomes plus unplaced and unlocalized scaffolds from the reference assembly in Annotation Release 105.
I found a cluster of 488 hits with the closest matching sequences that had a mean divergence from the human HOR of 11.31%. However, this value represents a lower bound for the true divergence between the orthologous chimp and human consensus centromeric HOR sequences because an in situ hybridization study by Archidiacono et al. (1995) found that the closest matching chimp centromere to human chromosome 7 is not located on the chimp ortholog. To better estimate the true divergence between chimp and human othologous centromeres, next I focused on human chromosome 5, where the centromeric chimp (Haaf and Willard 1997;Rice unpublished) and human (Puechberty et al 1999;Rice 2019A) consensus HOR sequences are known with high accuracy. The mean, variance and range of human-chimp divergence along the arms of chromosome 7 are highly similar to chromosome 5 (Mikkelsen et al. 2005), as is the level of SNP variation among humans (Shen et al. 2013). The consensus sequence for human chromosome 5 is a short HOR containing a two-monomer dimer: one 'b-box' monomer (containing a 17 bp b-box sequence that binds CENtromere Protein B [CENP-B]) and one 'nob-box' monomer that lacks the b-box sequence. The chimp ortholog is a 5 monomer HOR containing two b-box monomers and three no-b-box monomers. In humans, b-box and no-b-box monomers have substantially different consensus sequences (~16% diverged) with most differences outside the b-box region (see Supplemental Figures S13 and S16 in Rice 2019A).
I characterized chimp-human sequence divergence in two ways. First, I determined the average pairwise percent divergence between the human b-box monomer and the two chimp b-box monomers. I next made this same pair-wise calculation between the human no-b-box monomer and the three chimp no-b-box monomers. These two pairwise divergences averaged 23.35% ( Figure 2B). Average pairwise divergence, however, does not reflect the fact that the chimp HOR has three additional monomers and that the chimp monomers are highly diverged from each other (Supplemental Figure S1). To construct a measure that better captures this multi-monomer divergence, I calculated total divergence = the percent of nucleotide positions that differ between: i) the human b-box monomer and either of the two chimp b-box monomers, or ii) the human no-b-box monomer and any of the three chimp no-b-box monomers. This total divergence was 39.3% ( Figure 1B). In sum, centromeric sequence divergence is 39.3 / 1.2 = 32.7 times more than the average at non-centromeric regions of chromosome 7. These data indicate that human-chimp sequence divergence at the centromere is far in excess of the range of values found along chromosomal arms (broad horizontal black line in Figure 1B) and more than an order of magnitude higher than the average across the chromosomal arms of chromosome 5 (narrow horizontal red lines at the base of the graph in Figure 1B).
(Box-1, Figure 1B), a value far in excess of the range of divergence values found along the orthologs' arms (wide black lines in Figure 1B) and substantially more than an order of magnitude greater that the mean divergence found between the ortholog's arms (narrow red lines, Figure 1B). What causes this extreme divergence at centromeres? At a functional level, centromeres are the DNA regions that recruit -and tightly bind-the large and complex group of proteins that make up the kinetochore. In most organisms, centromeres are composed of long tandem repeat arrays -which in humans can be as large as 8 Mb (Miga et al. 2014). DNA-kinetochore attachment is achieved by a network of kinetochore proteins (the CCAN = Constitutive Centromere-Associated Network), the basal members of which bind the centromeric DNA throughout the cell cycle (Musacchio and Desai 2019). Bound protein is well established to cause stalling of DNA replication forks (Mirkin and Mirkin 2007;Beuzer et al 2014), and stalled forks are prone to collapse to form a one-sided doublestrand break (DSB). These one-sided DSBs are repaired by the BIR pathway (Costantino et al. 2014).
There is empirical evidence that kinetochore proteins bound to D N A i n b o t h t h e p o i n t centromeres of budding yeast (Greenfeder and Newlon 1992) and the regional centromeres of Canida albicans (Mitra et al. 2 0 1 4 ) c a u s e f o r k -s t a l l i n g / collapse, rather than it being caused by DNA secondary structure. There is also evidence for elevated levels of forkstalling/collapse at human centromeres (Crosetto et al. 2013;Aze et al. 2016).
One expected downstream effect of an elevated rate of forkstalling/collapse at centromeres is recurrent duplications and deletions (indels) within their tandem repeat arrays that causes a continual turnover of monomers ( Figure 2, yellow box). Because centromeric tandem repeat arrays have numerous regions of nearby, flanking homology, BIR repair of collapsed replication forks within a centromere would be expected to re-initiate DNA replication at: i) the same location where the collapse occurred (inregister -with no deletion nor duplication), ii) a downstream location (already replicated) with a sequence matching the break point (out-ofregister -leading to a duplication of one or more repeat units), or iii) an upstream location (not previously replicated) with a sequence matching the break point (out-of-register -leading to a deletion of one or more repeat units). These alternatives are illustrated in Supplemental Figure  S3 in Rice (2019B). Studies of fork-collapseinduced BIR at rDNA repeat arrays in yeast have shown that downstream out-of-register BIR is more frequent than up-stream repair, causing, on average, a net increase in repeat array length in response to recurrent fork collapses -but only when cohesin-binding of sister chromatids was low (reviewed in Kobayashi 2014). This bias toward array expansion may be the result of differences in chromatin structure in the regions upstream and downstream of replication forks (Poot et al. 2005; also see explanation in Rice 2019B).
4 Figure 2. A diagram illustrating the chain of events connecting protein bound to DNA, the fork-stalling and fork-collapse that it generates, BIR repair of collapsed replication forks, and the downstream consequences of BIR repair.
An additional factor that is expected to cause monomer turnover within centromeres is recurrent deletions when two-sided DSBs are repaired by the Single-Strand Annealing (SSA) repair pathway (Ozenberger et al. 1991; see also Supplemental Figure S1 in Rice 2019B). Tsouroula et al. (2016) demonstrated that the SSA pathway is used (along with other pathways) during repair of two-sided DSBs at the centromeres of the laboratory mouse (Mus musculus domesticus).
A second expected downstream consequence of increased levels of fork-stalling/collapse at centromeres is a substantially elevated nucleotide-substitution mutation rate (bp substitutions in Figure 2, pink box). BIR replication forks that form after fork-collapse use a unusual combination of DNA polymerase subunits, and this configuration is associated with highly elevated nucleotide-substitution mutations: estimated in yeast to be ~1,000-fold higher compared to normal S-phase replication forks (Sakofsky et al. 2012). The elevated use of BIR after fork-collapse at centromeric repeat arrays would be expected to cause homologous centromeres in related species to diverge in sequence at an elevated rate -as is observed between humans and chimps on their X chromosomes (Rice 2019B).
A third expected downstream effect of expanded levels of fork-stalling/collapse at centromeres is an increased rate of formation of new mosaic monomers that are composed of pieces of extant -usually nearby-monomers ( Figure 2, orange box). A substantial proportion of BIR events involve template switching (possibly as high as 20%; Smith et al. 2007). BIR is sometimes mediated by low levels of homology (mmBIR = minimal-homology-mediated BIR; Zhang et al. 2009;Hastings et al. 2009A,B). The process of mmBIR with template-switching (mmBIR/templswitch) is plausibly the major factor leading to Copy Number Variation (CNV) within genomes (Hastings et al. 2009A,B). It has the potential to generate new, mosaic monomers when the template switch occurs within the body of a monomer and exchanges part of one monomer with the sequence of another monomer. The resulting mosaic monomer represents a small quantum leap in HOR sequence when the recombined monomers have substantially different sequences -as is common among the monomers that make up the centromeric repeats of humans (see Supplementary Table S1 of Rice 2019A for a complete list of the HOR sequences at all of the active centromeric repeats from a single human genome).
Evidence for mosaic monomers can be found in humans, where monomers cluster into two major groups: those containing a 17 bp CENP-B-binding b-box at their 5' end (b-box monomers), and those lacking this feature (no-b-box monomers) (see Box-1, and also Supplemental Figures S13 and S16 in Rice 2019A). Some monomers are found at human centromeres in which the 5' end contains a b-box sequence while the majority of the remaining monomer has a sequence that strongly clusters with the no-b-box monomers (for examples, see in Rice 2019A Figure 4 and Supplemental Figure S14). These b-box/no-b-box mosaic monomers are consistent with the process of quantum leaps in monomer sequence due to ectopic recombination via mmBIR/templ-switch.
A fourth downstream effect of expanded levels of fork-stalling/collapse at centromeres is 'ectopic lineage swapping' in which one or more monomers from one chromosome's centromere are transposed into the centromere of another non-homologous chromosome, and eventually become the new centromeric repeat sequence (Figure 2, green boxes). Less commonly, the transposed DNA may originate from a noncentromeric location. This transposition process is a larger-scale extension of the mmBIR/templswitch process described in the previous two paragraphs concerning mosaic monomers except that: i) the template switching is usually between different chromosomes, and ii) the transposed DNA segment is large enough to span one or more monomers. In the human genome there is extensive evidence for transposition between To become a new centromeric tandem repeat array, transposed DNA must fortuitously already contain a tandem duplication or else a new tandem duplication must be formed de novo. As described earlier, substantial evidence indicates that repair of collapsed replication forks via mmBIR is a major mechanism generating tandem CNVs (Hastings et al. 2009B;Hsiao et al. 2015) and the elevated incidence of mmBIR at centromeres would feasibly provide a pathway to duplicate newly transposed sequences. Elevated rates of BIR at centromeres (due to DNA-bound CCAN proteins) may explain the observation that centromeres of most species are composed of tandem repeat arrays because: i) mmBIR would be expected to generate an initial tandem duplication at an initially non-repetitive centromeric sequence and ii) out-of-register BIR would next be expected to expand this minimal tandem repeat array into the long arrays typically observed at centromeres.
A new and small subarray of a tandemly duplicated transposition that is embedded within a centromere would not lead to centromere-wide sequence change unless the subarray expands to form a much longer tandem repeat subarray -and ultimately becomes the predominant repeated sequence there. As described earlier, BIRinduced duplications and deletions of monomers -and also deletion of monomers from SSA repair of two-sided DSBs-are expected to generate c o n t i n u a l t u r n o v e r o f m o n o m e r s w i t h i n centromeric repeat arrays. This turnover process produces an opportunity for analogs of genetic drift and natural selection to operate within repeat arrays, i.e. 'molecular-drift' and 'molecular-drive,' respectively (Dover 1982).
The process of molecular-drift occurs when: i) there is heterogeneity in the sequence of repeat units (single monomers or HORs) within a tandem repeat array, and ii) indels cause repeat units to continually turnover, and iii) the proportion of different repeat units (with different sequences) changes over time due to random differences in their deletion and duplication rates. This process can lead to 'molecular-fixation' (all repeat units within an array share -or are descended from when mutated-the same sequence) by chance alone during the continual stochastic turnover of constituent repeat units. In this way, a new and small subarray could spread to an entire centromere.
M o l e c u l a r fi x a t i o n c a n a l s o b e d r i v e n deterministically by molecular-drive when repeat units with different sequences differ in their duplications/deletions ratio: the repeat unit with the highest ratio is expected to eventually predominate within the array. So if a new and small subarray (initiated by a transposition event) had a higher duplications/deletions ratio, it could expand to the entire centromere by molecular drive.
To recap, the increased prevalence of mmBIR at centromeric repeat arrays is expected to: i) increase the rate of transpositions into centromeric repeat arrays, and ii) foster tandem duplications of transposed DNA segments when not already in tandem repeat form. BIR after forkcollapse and SSA repair of DSBs will generate turnover of repeat units at centromeres and thereby create the opportunity for newly transposed tandem repeats to replace the old centromeric sequence via molecular-drift or molecular-drive. This process represents a form of 'ectopic lineage-swapping' in which a transposed, unrelated sequence replaces an existing centromeric sequence: leading to a 'quantum leap' in centromere sequence that generates highly elevated sequence divergence among othologous centromeres of closely related species.
To examine this ectopic lineage-swapping process in a more detailed molecular context, I next focus on humans and the laboratory mouse, where the structure and function of centromeric tandem repeat arrays has been extensively studied. The influence of molecular-drift and molecular-drive on sequence evolution within centromeric repeat arrays is expected to be influenced by the ability of the repeat unit to: i) expand via tandem duplications, and ii) recruit histone CENH3 -which is the epigenetic mark required to be a functional centromere that assembles a kinetochore (Bodor et al. 2014). Consider the case where transposition via mmBIR generates a small, new tandem-repeat subarray (green rectangle in Figure 3) within an extant, much larger, centromeric repeat array (blue rectangles in Figure 3). The fate of the subarray is expected to be influenced by its sequence and its position within the array.
Centromeric repeat arrays in mice and humans are divided into two functional regions: i) a contiguous centric core that recruits CENH3 at many locations and thereby constitutes the active centromere that assembles a kinetochore (region within the red oval background in Figure 3), and ii) the remainder of the array, i.e., the pericentric flanks (Ross et al. 2016;Iwata-Otsubo et al. 2017). Out-of-register, BIR-induced expansion and contraction of the array is expected only (or at least predominantly) within the centric core because only this region i) binds the kinetochore proteins that cause fork-stalling/collapse, and ii) recruit low levels of cohesin (Supplementary Figure S2; see also Rice 2019B). Assuming that the new subarray has a sequence that generates a molecular-drive advantage, and that it resides within the centric core, it will only persist and achieve molecular-fixation ( Figure 3D) when it retains (by chance or its molecular phenotype) the CENH3 epigenetic mark as it expands ( Figure  3B,C).
To recap, transposition that generates a new tandem-repeat subarray within an extant centromeric array will only lead to replacement of the extant array by the newly transposed sequence when the new subarray resides within the centric core and, i) molecular-drift leads to the new subarray expanding to large size while fortuitously retaining the CENH3 epigenetic mark, ii) molecular-drive favors the new subarray because it has an expansion advantage (i.e., a faster tandem duplication rate of monomers or HORs) that indirectly leads to its retention of the CENH3 epigenetic mark as it expands (described more fully in Supplemental Figure S3), or iii) molecular-drive favors the new subarray because it more strongly recruits/retains CENH3 and thereby pulls the centric core off the larger, extant array as it expands (described more fully in Supplemental Figure S4). In Rice (2019B), I Figure 3. The hypothesized process of ectopic lineage swapping at centromeres in mammals. A. A new subarray (green bar) originating from a different genomic location is transposed into the centric core region (red oval background) of an extant centromeric repeat array (blue bar) via mmBIR with template-switching. The two regions outside the centric core (no red oval background) are the pericentric flanks. If the new subarray is not already a tandem repeat, this structure is generated via mmBIR. B. Because the subarray is within the CENH3-rich centric core of the resident centromeric repeat array (which as a high density of protein-bound to DNA and a low density of cohesin), it expands via duplications exceeding deletions during out-of-register re-initiation of BIR replication after the collapse of replication forks. C. By random chance (molecular-drift) or because the new subarray has a molecular-drive advantage, i) the new subarray retains the CENH3 epigenetic mark as it expands and eventually grows to span the entire centric core, and ii) the extant array is pushed completely out of the centric core and is restricted to the pericentric flanks. D. Over time, the old array eroded away by recurrent deletion pressure and only the new subarray is found at the centromere region. Figures  S2-S4 I summarize a molecular model from this paper for the operation of molecular-drive at centromeres in humans.

provide a more general description of the o p e r a t i o n o f m o l e c u l a r -d r i v e w i t h i n centromeres, and in Supplemental
In situ hybridization studies demonstrate that centromeric sequencers at all, or nearly all, human and chimp autosomes have diverged to such an extent that probes for the human centromeres no longer bind their chimp orthologs, and sometimes bind non-orthologs (Archidiacono et al. 1995). This finding indicates that, within the human-chimp clade, repeat units (HORs) of centromeres are rapidly (< 5-6 x 10 6 years; De Manuel et al. 2016) replaced by distantly related sequences -and sequencing studies of centromeres at six chimp autosomes support this conclusion (Jorgensen et al. 1992;Haaf and Willard 1997;Warburton et al. 1996). The rapid replacement of all (or nearly all) autosomal centromeric HORs (by distantly related sequences) during divergence between humans and their closest living relative is consistent with the conclusion that the observed centromere evolution is more plausibly driven by deterministic moleculardrive rather than the much slower, stochastic process of molecular-drift.
If molecular-drive has been operating at the centromeres of eukaryotes for eons, and centromeres have essentially retained the same function in organisms as diverse as humans and fungi, then why would there continue to be molecular-drive for new sequences? Put another way: Why haven't optimal molecular-drive centromeric sequences evolved long ago and then persisted? In humans, I have described evidence for an intransitive competitive hierarchy between different HOR sequences that is expected to drive rapid and perpetual evolution of centromeric HOR sequences (Rice 2019B). However, this intransitivity is dependent on the influence of CENP-B at human centromeres -and most organisms do not recruit this protein to their centromeres. An alternative to competitive intransitivity is a continually changing optimal sequence for 'winning' in molecular-drive at centromeres. One way that such a 'moving target' optimum could occur is simply due to the inevitable and continuous evolution of the genomic background of numerous molecules that influence molecular-drive among centromeric subarrays (Rice 2019B). In addition, there may be positive feedback between the DNA sequence of centromeres and the protein sequence of CENH3. Below I describe this speculation.
Natural selection is expected to cause CENH3 to evolve a protein sequence that reliably recruits this molecule to the centromeres and that increases cellular functioning by assembling a well-operating kinetochore. The DNA sequences of centromeres will also experience natural selection for proper cellular functioning, but these sequences will additionally experience selection in the context of molecular-drive to better compete against other subarrays within a sequence of CENH3 is selected to promote kinetochore and cellular functioning. Part of this adaptive process is assumed to depend on the DNA sequence of the centromeric repeat unit.

Bottom:
The DNA sequence of the centromeric repeat unit is also selected to promote kinetochore and cellular functioning, but it is also selected for competitive ability in the context of molecular-drive between alternative sequences of the centromeric repeat unit, and much of this competitive ability is expected to depend on the ability to preferentially recruit CENH3. Part of this molecular-drive selection is assumed to d e p e n d o n t h e p ro t e i n s e q u e n c e o f C E N H 3 . T h e interdependence of selection on the protein sequence of CENH3 and the DNA sequence of the centromeric repeat unit will feasibly cause perpetual coevolution between them. centromeric repeat array. This molecular-drive phenotype would be expected to include the capacity to better retain the CENH3 epigenetic mark as its subarray expands -compared to other subarrays within the same centric core-in order to remain within the centric core and not lose centromeric functioning ( Figure S3 and Supplemental Figure S4).
The observation that ectopic neocentromeres have been found to occur at a wide diversity of DNA sequences (Marshall et al. 2008), would intuitively indicate that centromere sequence is relatively unimportant in CENH3 recruitment. However, the findings that: i) neocentromeres recruit reduced levels of CENH3 (Bodor et al. 2014;Fachinetti et al. 2015), ii) centromeric HORs differ in their ability to recruit CENP-A (Bodor et al. 2014;Fachinetti et al. 2015;Aldrup-MacDonald et al. 2016) and iii) different DNA sequences are better at recruiting CENH3 to artificial chromosomes (Masumoto et al 1998;Basu et al. 2005;Molina et al. 2017), supports the conclusion that DNA sequence does influence CENH3 recruitment and hence molecular-drive within tandem repeat arrays. The combination of i) natural selection on the protein sequence of CENH3, and ii) both natural selection and selection via molecular-drive on centromeric repeat sequences, motivates the potential for coevolution between the DNA sequence of centromeres and the protein sequence of CENH3 ( Figure 4). Selection on CENH3 for cellular functioning is expected to depend at least in part on some form of congruence between the protein sequence of CENH3 and the DNA sequence at centromeres (top Figure 4). Similarly, selection on centromere DNA sequence via molecular-drive performance is expected to depend in part on the protein sequence of CENH3 (bottom of Figure 4). As centromeric DNA sequences continually evolve in response to molecular drive, the resulting cumulative change in their DNA sequences would be expected to gradually 'push' the optimal protein sequence of CENH3 away from its current position. Eventually, the cumulative change in optimum would be sufficient to lead to an evolutionary change in the protein sequence of CENH3, which in turn would 'push' the optimal DNA sequence of a centromere away from its current position. Collectively, these interactions would be expected to drive open-ended, 'push-away' coevolution between the protein sequence of CENH3 and the DNA sequence of the centromeres because: i) each sequence type is evolving in response to different selection regimes, and ii) evolution by each type of sequence changes the optimum of the other type of sequence (Figure 4).

Summary
Centromere sequences are expected to evolve rapidly because of a fundamental aspect of their phenotype: they tightly bind centromeric proteins throughout the cell cycle. Bound protein is established to lead to substantially elevated rates of fork-collapse during DNA replication -and subsequent repair of collapse-generated onesided DSBs via the BIR pathway. BIR in turn is e x p e c t e d t o l e a d t o m a n y d o w n s t r e a m consequences at centromeres: i) the formation of tandem repeat structure at new centromeres or new centromeric subregions lacking this structure, ii) expansion of centromeric arrays due to an excess of monomer duplications over deletions, iii) perpetual monomer turnover that creates an opportunity for molecular-drift and moleculardrive, iv) an elevated nucleotide-substitution mutation rate, v) small-scale transpositions that create new, mosaic monomer sequences, and vi) larger-scale transpositions that create new subarrays of monomers or HORs within extant centromeres. The increased mutation rate alone would be expected to make centromeres evolve more rapidly than other genomic regions, but this e f f e c t i s m a g n i fi e d b y m m B I R -i n d u c e d transpositions. These horizontal transfer events, in combination with molecular-drift and especially molecular-drive, are expected to cause monomer sequence to rapidly change due to quantum leaps (large steps) in sequence over time via the formation of mosaic monomers and ectopic lineage swapping. Repeated episodes of molecular-drive may be perpetual because: i) of intransitivity among centromeric sequences, as seems plausible in humans, or more generally, ii) the genetic background of genes coding for molecules that interact with the centromere continually evolve and change the optimal centromeric sequence for molecular-drive. More speculatively, push-away coevolution driven by positive feedback between the protein sequence of CENH3 and the DNA sequence centromeres may also contribute to rapid evolution of centromeric sequences. Supplemental Figure S1. A neighbor-joining cluster diagram of: i) the consensus sequence of the two monomers from the human dimeric HOR at chromosome 5 (one bbox monomer and one no-b-box monomer), and ii) the consensus sequences of the two b-box and three no-b-box monomers from the orthologous 5-monomer HOR in the chimp. Note the substantial sequence divergence. Red symbols are for b-box monomers and blue for no-b-box-monomers. Circles depict human monomers and stars depict chimp monomers. Symbol letters: H = human, C =chimp, b = b-box, n = no-b-box, numbers (when present) denote different monomers of the same type within a species. Figure S3. A new subarray with a faster lateral expansion rate can 'capture' the Switch-point and 'win' in molecular-drive at human centromeres. A-B. Consider a centromeric tandem repeat array. The centric core [assembled as centrochromatin] of the array is shown in black, the pericentric flanks [assembled as heterochromatin] in grey, and the centric/pericentric boundary as a white line. The switch-point is shown by the fountain symbol. The centromeric repeat array has acquired (via transposition and tandem duplication) a new subarray that has a faster lateral expansion rate (red region with a white double-arrow; see Rice 2019B for molecular examples of this phenotype). The Switch-point can be defined as the position where the the average expansion rate is equal on both sides, assuming that the edges of the centric core are equally permeable to ingression by the pericentric flanks as tandem duplications within the centric core accumulate. The newly established subarray with faster lateral expansion will cause the right half of the centric core to expand faster and thereby move the Switch-point toward the right: the faster the lateral expansion rate of the new subarray, and the larger the size of this subarray, the greater the displacement of the Switch-point toward the new subarray. C. If during its expansion and movement toward the right, the new subarray 'captures' (spans) the Switch-point before it is pushed off the side of the centric core, its sequence will spread bidirectionally: causing the new subarray's sequence to eventually spread to the entire centric core (D.) and ultimately the entire centromeric array once monomers from the original centromeric array (that have been pushed into the pericentric flanks) are removed via recurrent deletion pressure (E.). Note that regions of the original centric core that recently have been pushed into the pericentric flanks are shown in dark grey. The model developed here operates both when the tandem repeat array is composed of repeated monomers or HORs.