Chaos, order and systematics in evolution of the genetic code

: The genetic code evolved by a combination of chaotic and ordered processes. Liquid-liquid phase separation (hydrogels), a chaotic process, constructs diverse membraneless compartments within cells, resulting in regulated hydration and sequestration and concentration of reaction components. Hydrogels relate to chaotic amyloid fiber production. We propose that polyglycine and related hydrogels (i.e. GADV; G is glycine), phase separations, membraneless droplets and amyloid accretions organized protocell domains to drive the earliest evolution of the genetic code and the pre-life to cellular life transition. By contrast, evolution of tRNA, tRNAomes, aminoacyl-tRNA synthetases and translation systems followed highly ordered and systematic pathways, described by well-defined mechanisms and rules. The pathway of evolution of aminoacyl-tRNA synthetases, which tracked evolution of the genetic code, is clarified. Hydrogels and amyloids form a chaotic component, therefore, that complemented otherwise systematic processes. We describe with detail a pre-life world in which hydrogels and amyloids provided the Darwinian selections of the first life. We posit that hydrogels sequestering tRNAs may have been involved in the earliest folding and class divergence of aaRS enzymes. Class IA and class IIA aaRS folding was initially directed by Zn-binding and the N-terminal extension of class IA enzymes, which comprises part of the class I aaRS active site. Because class I and class II aaRS bind opposite faces of their cognate tRNAs, tRNA binding might have also promoted appropriate aaRS folding. Hydrogels can sequester RNAs and could promote early aaRS class I and class II folds.


Introduction
Eukaryotic cells divide into compartments. Some compartments are set aside by membranes but others are membraneless and divided instead by liquid phases. Components of membraneless compartments concentrate through local interactions and selection and exclusion of defining components. Hydrogels and liquid-liquid phase separation (LLPS) form these functional units. In human neurological disease and cancer, hydrogel compartments can disassemble and, in some cases, lead to generation of amyloid accretions.
Although not yet as extensively studied, hydrogels and LLPS are also becoming recognized in prokaryotic systems. In this paper, we explore hydrogel and LLPS compartments as drivers of the establishment of the genetic code.
Evolution of life on Earth required a small number of key transitions (Figure 1). In this paper, we concentrate on the pre-life to cellular life transition and the evolution of coding systems, but we use examples from later evolution to highlight very early events. We consider evolution of life on Earth to be a fairly simple outline with overwhelming relevant detail. The first cellular life on Earth is described as LUCA (the last universal common cellular ancestor), which we consider to be the first organisms with an intact DNA genome and an intact cell [1][2][3][4]. The second major transition is the great divergence of Archaea and Bacteria [5][6][7]. Based on our analyses and those of others, we consider Archaea to be most similar to LUCA and Bacteria to be more diverged. The third major transition is the genetic fusion of multiple Archaea and multiple Bacteria to generate Eukaryota [8-pausing mechanism of RNA polymerase II and the RNA polymerase II CTD (carboxy-terminal domain) [13][14][15]. The best descriptions of the key stages relate to evolution of biological coding, translation and transcriptional mechanisms. We discuss the importance of hydrogels (liquid-liquid phase separation; LLPS) in transitions. In this paper, when we refer to hydrogels or LLPS, we consider these features in all their complexity, including membraneless organelles and associated amyloids [16][17][18]. It is our opinion that peptide disorder compartmentalized and regulated hydration and caused separation and concentration of reactants in protocells and was a major driving force in the early evolution of life and, specifically, in the evolution of the genetic code, which is the most central feature of evolution of complex life on Earth. RNAP) RNA polymerase [12,19].
To understand the pre-life to life transition requires bottom-up and top-down approaches [20][21][22][23][24]. In a bottom-up approach, a goal is to develop plausible prebiotic, coacervate and self-replicating polymerization systems. The top-down approach, by contrast, is intended to infer some major pathways in the pre-life world often from analyses of conserved sequences. The advantage of the bottom-up approach is that many prebiotic reactions are interesting and potentially on-pathway. The potential disadvantage of a bottom-up approach is that too many pathways are possible and too many plausible pathways may be dead ends or may not result in dominant pathways. The potential advantage of the top-down strategy is that inferences based on sequence are likely to reflect dominant and successful pathways. The limitation of a top-down strategy is that many important pathways may not be represented or may not be recognized in existing sequence data sets. Because authors of this manuscript are molecular biologists, our approach has been sequence-based and top-down. We find that top-down strategies describe the early evolution of translation systems and transcription systems and the divergence of Archaea and Bacteria. Top-down approaches also enrich bottom-up views. For instance, based on top-down methods, we posit models for pre-biotic chemistry (see below). Of course, when top-down approaches and bottom-up approaches meet, a richer analysis of the pre-life to life transition has been achieved.
We posit that the major event in the divergence of Archaea and Bacteria was the evolution of bacterial  transcription factors (Figure 1) [7,25]. In Bacteria,  factors bind to RNA polymerase to facilitate binding to the promoter.  helix-turn-helix (HTH) factors are homologs of archaeal TFB, which itself includes two very regular HTH motifs, termed "cyclin-like repeats" [26]. Comparing bacterial  factors to archaeal TFB, however,  factors alter bacterial promoter recognition and transcriptional control in fundamental ways. This radical shift in core transcriptional mechanisms, promoters and control caused Bacteria to become significantly different from Archaea, while Archaea remained very similar to LUCA. Bacteria also adopted a new replicative DNA polymerase (PolC) relative to Archaea (PolB and PolD), so this is another fundamental difference comparing Bacteria and Archaea [27].
Much is not yet known about the evolution of Eukaryota. We view eukaryotes as genetic fusions of multiple Archaea and multiple Bacteria without a very clear model for how this transition occurred. We view the transition as a multi-stage process of endosymbiotic or other large horizontal gene transfer events (i.e. by feeding: referred to as "foodchain gene adoption") [11,12,[28][29][30]. It is our opinion that horizontal transfers of small packets of genes are generally less successful than larger transfers. In Figure 1, we indicate a few possible events in the FECA (first eukaryotic common ancestor) to LECA (last eukaryotic common ancestor) transition, focusing on evolution of cell architectures and transcriptional mechanisms. Splicing appears to have evolved near the time of LECA [31]. It appears that Eukarya evolved a new use for hydrogels (liquid-liquid phase separation; LLPS) involving intrinsically disordered regions (IDRs) of proteins [13,17,32,33]. Histone tails and the carboxy-terminal domain (CTD) of RNA polymerase II are IDRs with regulated interactions and protein readers [13][14][15]. Other factors with IDRs cooperate in the transcription cycle, sequestering complexes in LLPS compartments. So far as we can ascertain, prokaryotes utilize LLPS probably utilizing short disordered protein regions.
There is some redundancy in this paper compared to previously published work from our laboratory on evolution of tRNAs, tRNAomes, aminoacyl tRNA synthetases (aaRS enzymes) and the genetic code. Because the paper describes and combines multiple complex subjects, some redundancy was inevitable. The current paper provides refined models, insights and perspectives. A dominant theme of this review is that hydrogels, amyloids and LLPS drove the early evolution of the genetic code. Specifically, we posit that hydrogels and related assemblies provided the main Darwinian driving force behind genetic code evolution. We provide highly detailed and connected models for key intermediates in the pre-lifelife transition. Specifically, our tRNA evolution model, aaRS evolution model and genetic code evolution model are mutually reinforcing and highly predictive.

Artificial intelligence in evolution of life
At some level, evolution of life on Earth can be described according to principles of artificial intelligence [23,34]. A system is capable of "learning" (teaching itself) if it can build up intellectual property that enhances its subsequent capabilities. In a biological system, evolution must solve the coding problem, because, without sophisticated biological coding, no life as recognized on Earth is possible. So evolution of the genetic code is the core feature of evolution of life on Earth. tRNA was the central driver to evolve biological coding. Therefore, evolution of tRNA is a central story. We consider tRNA to be the core intellectual property in evolution of translation systems, including: 1) tRNAomes (all of the tRNAs in an organism) [35]; 2) the genetic code [23,34]; 3) mRNA; 4) aminoacyl-tRNA synthetases (aaRS; i.e. GlyRS-IIA; IIA indicates the aaRS I or II structural class and A-E subclass) [22,36]; 5) rRNA; and 6) ribosomes [37]. The system taught itself to code, centered on tRNAs, and then vastly enriched the code and its expression by coevolving proteins. Biological coding expands the capacity of the system to create highly functional proteins and protein assemblies and then to evolve complex organisms.
Evolution of tRNA is also a story of artificial intelligence. tRNA evolved from ligation of 3-31-nt minihelices, as described below. Furthermore, the minihelices were comprised of repeating sequences and inverted repeats, so minihelices and tRNAs were constructed from highly ordered ancient sequences from about 4 Ga ago. Many have considered earliest evolution from random biopolymers, but tRNA did not evolve from random sequences, and evolution of tRNA is the central issue in evolution of translation systems and the genetic code. For computational studies, it remains

Evolution of tRNA
A number of models have been advanced to describe tRNA evolution. We favor the 3-minihelix model advanced by our laboratory, which we find to be best supported by sequence and statistical analyses and also most predictive [35,[45][46][47][48]. The model fully accounts for the sequences of type I and type II tRNAs. Most tRNA are type I with a 5-nt V loop (V for variable). Type II tRNAs (i.e. tRNA Leu and tRNA Ser in Archaea) have an expanded V loop that initially was 14-nt, although type II V loops have expanded and contracted in evolution. The 3-minihelix model is strongly predictive to describe evolution of the genetic code and evolution of ribosomes. We identify a tRNA primordial sequence (tRNA Pri ) from which tRNAs radiate. We showed that, in Archaea, tRNA Pri is very close in sequence to tRNA Gly , indicating that glycine was the first encoded amino acid [35]. Remarkably, tRNA Pri is very close in sequence to a typical tRNA sequence (similar to a consensus sequence) from Archaea [49]. 4.1. Evolution of type I tRNA tRNA Pri was formed by ligation of three 31-nt minihelices of almost completely known sequence ( Figure 2). Figure 2A shows the structure of a type I tRNA colored according to the model. Figure 2B shows a typical type I tRNA (similar to a consensus sequence) from Pyrococcus furiosis, an ancient Archaea [49]. Figure 2C outlines the 3-minihelix model for tRNA evolution. In the model ( Figure 2C), a 93-nt tRNA precursor was formed by ligation of three 31-nt minihelices. The 93-nt precursor was then processed into type I and type II tRNAs. The 31-nt minihelices that became the anticodon stemloop-stem and the T stem-loop-stem were initially identical (~GCGGCGGCCGGGUU/AAAAACCCGGCCGCCGC; stem-loop-stem microhelix core: ~CCGGGUU/AAAAACCCGG; there is slight sequence ambiguity in the primordial ~UU/AAAAA (/ indicates a U-turn) loop; there is no ambiguity in the 5-nt stems). The minihelix that became the D loop sequence is distinct because it has a 17-nt microhelix core based on a UAGCC repeat (initially GCGGCGGUAGCCUAGCCUAGCCUACCGCCGC; 17-nt UAGCC repeat microhelix core: UAGCCUAGCCUAGCCUA). Remarkably, the UAGCC repeat in the D loop is apparent in typical tRNAs from ancient Archaea ( Figure 2B; UAGCNUAGCCUGGUNNA). To generate a type I tRNA Pri requires two internal 9-nt deletions in the 93-nt precursor surrounding the anticodon stem-loop-stem within ligated acceptor stems (type I tRNAs missing 3'-ACCA were initially 75-nt) ( Figure 2C). In type I tRNA in Archaea, only a few small D loop deletions (i.e. 1-4 nt) and 1-nt deletions in the 5-nt V loop were tolerated [49]. To form a functional tRNA that could attach an amino acid, ligation (or genetic and/or enzymatic attachment) of 3'-ACCA was necessary.  [49]. Arrow colors in A-C: red) internal deletion endpoints; blue) U-turns; yellow) amino acid placements. C) the three 31-nt minihelix model. Primordial type I and type II tRNAs were derived from a 93-nt precursor that was formed by ligation of three 31-nt minihelices of mostly known sequence. A polymer world preceded minihelix world. Currently, we occupy a tRNA world that, because of its success, has persisted for ~4 Ga (1 Ga = 1 billion years) on Earth. Colors in A and C: green) 5'-acceptor stems and 5'-acceptor stem remnants (5'-As*); magenta) D loop 17-nt microhelix; yellow) 3'-acceptor stems and 3'-acceptor stem remnants (3'-As*); cyan) 5' anticodon and T loop stems; red) U-turn loops (anticodon and T loops); and cornflower blue) 3'anticodon and T loop stems [23,34]. SLS indicates stem-loop-stem. Molecular graphics was done using the program UCSF ChimeraX [50][51][52].

Archaeal tRNAs radiated from tRNA Gly
In support of the three 31-nt minihelix model for tRNA evolution, we show that tRNA Pri , tRNA Gly and tRNA Typical are closely related sequences ( Figure 3). Figure 3A shows a typical tRNA Gly from three Pyrococcus species. Figure 3B shows an annotated multiple sequence alignment of tRNA Pri , tRNA Gly and tRNA Typical . Despite ~4 Ga of evolution, the three sequences are nearly identical [35]. Sequence deviations from tRNA Pri can be explained based on tRNA folding [45]. . tRNA Gly was the primordial tRNA (tRNA Pri ) from which other tRNAs radiated [35]. A) A typical tRNA Gly from 3 Pyrococcus species [49]. B) An annotated sequence alignment. PRI) tRNA Pri ; GLY) tRNA Gly (as in A); and TYPICAL) tRNA Typical from Pyrococcus furiosis ( Figure 2B). Arrows are as in Figure 2. Purple indicates the anticodon. / indicates a U-turn. Other colors are as in Figure 2.

Evolution of type II tRNAs with an expanded V loop
The same model describes evolution of type I tRNA and type II tRNA with an expanded V loop, indicating that both models are correct (in Archaea, tRNA Leu and tRNA Ser are type II tRNAs) [46]. To generate a type II tRNA Pri required a single internal 9-nt deletion corresponding precisely to the more 5'-deletion in generation of type I tRNA Pri (type II tRNAs were initially 84-nt without 3'-ACCA) ( Figures 2C and 4). Figure 4A shows a structure of tRNA Leu . The expanded V loop was generated from a 3'-acceptor stem ligated to a 5'-acceptor stem, as indicated in the model ( Figure 2C). Figure 4B shows a typical tRNA Leu from three ancient archaeal Pyrococcus species [49]. In ancient Archaea, tRNA Leu is closer in sequence to a type II tRNA Pri ( Figure 2C) than tRNA Ser [35]. Figure 4C describes the tRNA segments and coloring in Figure 4A. In Figure 4B an alignment of the primordial type II tRNA V loop and the typical tRNA Leu V loop is shown. The length of the tRNA Leu V loop is 14-nt, as predicted from the model ( Figure 2C). The V loop (numbered V1-V14) is selected to form a G26~UV1 wobble pair and a G15=CV14 reverse Watson-Crick base pair (referred to as the Leavitt base pair), as indicated [46]. The primordial V loop would pair along its entire length. In type II tRNAs, by contrast, the V loop is evolved to form a loop with a short stem. Also, the sequence of the tRNA Leu V loop has diverged from the tRNA Ser V loop, which is a direct determinant for SerRS-IIA serine addition, to avoid tRNA charging errors [53]. The tRNA Leu V loop is evolved to be an anti-determinant for SerRS-IIA [46]. To attach amino acids to tRNAs, we posit that, initially, ACCA was ligated to tRNAs, minihelices, microhelices and other RNAs, utilizing a ribozyme ligase. In the ancient world, RNAs . Type II tRNAs. A) A type II tRNA structure (tRNA Leu ) colored as in Figures 2A, 2C and 4C. 4-nt in the anticodon loop were missing from the structure, so the anticodon loop shown is from PDB 4TRA (tRNA Phe ). B) A typical tRNA Leu from three Pyrococcus species. The sequence alignment shows a comparison of the typical tRNA Leu V loop to the primordial sequence. C) A color key for a type II tRNA Pri and description for the image in A.

Demonstration of the model
The evidence for the three 31-nt minihelix model is compelling. For instance, statistical analysis shows p-values of 0.001 (highest indication of homology) for: 1) homology of the anticodon and T stem-loop-stems (17-nt microhelix segments); 2) homology of the last 5-nt of the D loop (5'-As*; As for acceptor stem) and the last 5-nt of the 5'-acceptor stem; and 3) homology of the 5-nt V loop (3'-As*) and the first 5-nt of the 3'-acceptor stem [45,47]. Inspection of the typical tRNA ( Figure 2B) is sufficient to confirm the homology of the anticodon stem-loop-stem  and the T stem-loop-stem (49-CCGGGUUCAAAUCCCGG-65) (see also Figure 3). Standard tRNA numbering does not match tRNA Pri because standard numbering is based on tRNAs with a 3-nt deletion in the D loop (based on eukaryotic tRNAs). In some ancient Archaea, tRNA Gly ( Figure 3A) and tRNA Leu ( Figure 4B) have full-length D loops. Homology of the anticodon and T stem-loop-stems, which is obvious from inspection, is sufficient to confirm the three 31-nt minihelix model (Figures 2-4). We showed that the expanded V loop of type II tRNAs was initially a 3'-acceptor stem ligated to a 5'-acceptor stem, as predicted by the model for processing of the 93-nt tRNA precursor ( Figures 2C and 4) [46].
We consider these analyses to prove our tRNA evolution model is correct and to falsify alternate models [45]. We showed that tRNAomes in ancient Archaea cluster tightly around tRNA Pri [35]. Ancient Bacteria (i.e. Thermus thermophilus) have fairly compact tRNAomes centered on tRNA Pri . More derived Bacteria (i.e. Escherichia coli) have more diverged tRNAomes centered on tRNA Pri . Because of internal homologies in tRNA sequences, no accretion model (involving random insertionsdeletions; indels) can be correct for tRNA evolution [45]. Other tRNA evolution models (i.e. 2minihelix and Uroboros) are accretion models with random indels [55,56]. In both the 2-minihelix and Uroboros models, random indels lead perplexingly to ordered and repeated sequences in tRNAs [45]. Given rules of genetics, we do not know how this is possible. By contrast, we do not think the three 31-nt minihelix model (our model) can be falsified.
Because tRNA evolution indicates an ancient polymer world and minihelix world preceding the current tRNA world ( Figure 2C), about 200-300 million years of pre-life evolution are described by the top-down generated three 31-nt minihelix model. Surprisingly, polymer, microhelix, minihelix and tRNA sequences were derived from ordered sequences: repeats and inverted repeats (Figures 2-4). The ancient, pre-life world, therefore, included ordered polymers from which tRNAs evolved. Features of the model derive from tRNA sequences and structure.

Evolution of the genetic code (overview)
We posit that the genetic code evolved around the tRNA anticodon following a simple set of rules, which appear never to have been violated [23,34]. For the 2 nd and 3 rd positions of the anticodon, the rules are C>G>U>>A. Preferences are much stronger for the 3 rd anticodon position than for the 2 nd anticodon position, because the 2 nd anticodon position is most central and, therefore, the easiest to read [37]. Consistent with the rule, however, C is strongly preferred in the 2 nd position, just as it is in the 3 rd position, as evidenced by the position of glycine in the code (see below). We posit that the genetic code initially sectored on the 2 nd anticodon position, because the 2 nd position was easiest to read on a primitive ribosome. Essentially, the system was teaching itself to encode proteins by accurately matching and reading codons and anticodons. Furthermore, we posit that, on a primitive ribosome, the 1 st and 3 rd anticodon positions were initially wobble positions. At a wobble position, only pyrimidine-purine discrimination was initially possible, so tRNA wobbling in translation limited the size of the code. Because of wobbling, tRNA, not mRNA, limited the final size of the genetic code. Considering a genetic code of 64 assignments in mRNA, therefore, is not reasonable. Because of wobbling in the 1 st anticodon position, the genetic code has a maximum complexity in tRNA of 32 assignments (2x4x4). Because some genetic code sectors cannot easily be split, the standard genetic code evolved to 20 amino acids plus stops (21 assignments) rather than encoding additional amino acids (up to 32 assignments).
At the 3 rd anticodon position, wobbling was abolished by evolution of the elongation factor (EF)-Tu "latch" (also referred to as conformational closing of the 30S ribosome subunit) [23,[57][58][59][60][61]. tRNA enters the ribosome bound to the GTPase chaparonin EF-Tu. On the ribosome, EF-Tu holds the tRNA until GTP is hydrolyzed and the 30S ribosome subunit tightens its conformation and the EF-Tu GTPase latch is set. Then EF-Tu dissociates, allowing the verified tRNA with its tightened mRNA codon attachment to rotate its 3'-aa end into the ribosome PTC A site (addition or aminoacyl site). Setting the latch allows 4-base discrimination at the 3 rd anticodon position. 4-base resolution was readily achieved at the 2 nd anticodon position, because the 2 nd anticodon position is most central and the easiest to read. 4-base resolution, therefore, was obtained at the 3 rd anticodon position through evolution of the EF-Tu latch. The latch includes Thermus thermophilus (Tth) rRNA positions 16S rRNA G530, A1492 and A1493 and 23S rRNA A1913. The latch checks for Watson-Crick pairing to the mRNA codon at the anticodon 2 nd and 3 rd positions. The latch also checks the accuracy of pairing at the wobble position. Wobbling is necessary to evolve a genetic code based on RNA, and wobbling is a major story in the evolution of the code. The EF-Tu latch was a major determinant of translational accuracy and an essential evolutionary advance in building the code.
At the wobble 1 st anticodon position, the sequence preference rule is G>(U~C)>>>>>A [23,34]. Only purine versus pyrimidine discrimination is initially possible at a wobble position. Wobble G appears to be favored over U~C, because Asp (wobble G) appears to enter the code before Glu (wobble U/C) (see below). A is seldom or never used in the wobble anticodon position in Archaea. When wobble A is encoded in Bacteria and Eukarya, A is modified by deamination to inosine [36,62].
Essentially, A is not tolerated in the tRNA anticodon wobble position. Partly, A is not tolerated because A in the tRNA wobble position does not pair well with U in the mRNA wobble position. As noted above, before evolution of the EF-Tu latch, A was also poorly tolerated in the anticodon 3 rd position. A is not necessary in the anticodon wobble position because G pairs with C (Watson-Crick pairing) and with U (wobble pairing) [63].
In the wobble anticodon position, U and C are read degenerately. Initially, one might expect anticodon wobble C to show reasonable specificity for codon G. Similarly, anticodon U might be expected to read codon A (Watson-Crick pairing) and codon G (wobble pairing), resulting in anticodon wobble ambiguity. Generally, Archaea use both anticodon wobble C and U tRNAs to encode the same amino acid, indicating that anticodon wobble ambiguity was too high a barrier in evolution to easily separate wobble C and U tRNAs to encode two different amino acids. In principle, such separation of functions might be achieved by tRNA wobble modifications [63,64]. To encode tryptophan, the anticodon CCA is used. The UCA anticodon, however, is not generally utilized because UCA corresponds to the UGA stop codon, which is recognized in mRNA by a protein release factor [65]. To encode methionine, anticodon CAU is utilized to read AUG codons. In Archaea, isoleucine also utilizes CAU with C modified to agmatidine to read only codon AUA (Ile) and not AUG (Met) [66][67][68][69]. To avoid ambiguity in coding, anticodon UAU is rarely utilized in Archaea and Bacteria [36]. With very few exceptions, tRNA wobble modifications cause U and C to be read with more ambiguity than expected for an unmodified base. Generally, tRNA wobble modifications support broader reading of synonymous codons rather than evolving higher tRNA specificity in coding [63,64].
We strongly support the concept that the genetic code evolved as a 32-assignment code, primarily around the tRNA anticodon [23,34,37,55]. To make sense out of the genetic code, therefore, requires a view centered on tRNA and the tRNA anticodon. By contrast, 64-assignment codes, based on mRNA codons, are not reasonable nor descriptive of the evolutionary process. Below, we describe a detailed pathway for evolution of the genetic code based on these ideas.

Evolution of ribosomes
We support the model that rRNA arose from tangles of ligated RNAs that included amalgamations of tRNAs, as also has been proposed by others [38][39][40][41][42]. We imagine an ancient world in which RNAs were replicated by ligation, catalyzed by a ribozyme ligase, followed by complementary replication catalyzed by a template-dependent ribozyme replicase. Attaching a snapback primer to RNAs would prime their complementary replication. 31-nt minihelices can function as snap-back primers. 17-nt microhelices (i.e. anticodon and T stem-loop-stems) can also function as snap-back primers ( Figure 2C). Minihelices and microhelices can be removed from larger RNAs via endonucleolytic cleavage of the RNA catalyzed by a ribozyme (i.e. cutting at the base of stems). In such a world, long RNAs with diverse sequences were generated, and some of these could function as a primitive decoding center scaffold and others as a mobile PTC [37,70]. Such tangled RNAs were also an incubator for evolution of novel ribozymes.
The patterns of rRNAs were established before LUCA. One indication of this conclusion is that 16S and 23S rRNAs in Archaea and Bacteria are very similar in sequence and have similar functional RNA motifs [43,44]. Archaeal and bacterial rRNA sequences align essentially over their entire lengths without frequent insertion-deletion. As some examples, in 16S rRNA, both Archaea and Bacteria have similar sequences for: 1) the decoding center; 2) the EF-Tu latch; and 3) the ribosome attachment site. In 23S rRNA, Archaea and Bacteria have similar sequences for: 1) the A-site (addition or aminoacyl site); 2) the P-site (peptidyl site); 3) the EF-Tu latch; and 4) the SRL (sarcin-ricin loop). We conclude, therefore, that 16S and 23S rRNAs were largely established before LUCA and persisted in Archaea and Bacteria with only minor changes and few large insertions-deletions.

rRNA may be derived in part from amalgamated tRNAs
In support of the idea that segments of rRNAs may initially have been generated from ligated tRNAs, we show Figure 5. We searched an aligned region of the archaeal and bacterial PTC, located between the tRNA 3'-end CCA-binding segments named the P-loop and the A-loop, using the Pyrococcus furiosis (Pfu) tRNAome, which is very similar to a LUCA tRNAome [35]. We searched aligned segments of archaeal (Methanocaldococcus infernus; Min) and bacterial (Thermus thermophilus; Tth) PTCs ( Figure 5A). We find tRNA-like sequences that were identified using multiple tRNA probes that align in both archaeal and bacterial sequences. For this search, the smallest (most likely homologous) e-value obtained was 7 x 10 -4 (~1 chance in 1400 of being due to random chance) for an alignment of a Pfu tRNA (Arg (TCT)) to the Tth PTC ( Figure 5B). The same region is detected as tRNA-like with aligned tRNA segments in the archaeal Min PTC using multiple Pfu tRNA probes. The alignment appears to extend in the plus/plus orientation from the tRNA D loop across the anticodon stem-loop-stem and a 5-nt (type I tRNA) V loop to the first base of the T loop of the tRNA, indicating that full-length tRNAs rather than minihelices or microhelices ( Figure 2C) were present for evolution of the PTC. Probably, the homology is to a type I tRNA because it appears to extend over a 5-nt V loop. This same alignment can be obtained using a search with a typical type I tRNA sequence from ancient Archaea. We conclude, therefore, that type I tRNAs probably evolved prior to the 23S rRNA PTC, and tRNA sequences probably contributed to PTC evolution. We have done similar analyses with 16S rRNA and other segments of the 23S rRNA with similar results. We detect both plus/plus and plus/minus alignments to tRNAs, indicating that complementary replication predates evolution of rRNAs. In Figure 5, plus/plus alignments are prominent ( Figure 5B). Others have reported similar findings using other bioinformatics approaches [38][39][40][41][42]. We note that in the tRNA-aligned segment of the PTC no tRNA-like stem-loop-stems were detected (not shown). RNAs tend to fold according to longer range RNA contacts, so this result was not unexpected.

Figure 5.
A tRNA-like segment of the PTC of 23S rRNA. A) Alignments of Pfu tRNA sequences (black bars) to aligned archaeal Min (top) and bacterial Tth (bottom) PTC fragments. Type I and type II tRNAs were searched separately. B) A top alignment in this search that was identified using multiple probes. tRNA Colors: Magenta) D loop; green) 5'-acceptor stem remnant; cyan) 5'-anticodon and T stem; red) anticodon and T loop; purple) anticodon; cornflower blue) 3'-anticodon stem; and yellow) 3'-acceptor stem remnant (V loop). The e-value is dependent on the size of the PTC fragment used in the search, which in this case is short (~117 nt), decreasing the e-value compared to longer PTC fragment searches.
In addition to the decoding center of the ribosome (16S rRNA; 30S subunit), which forms a scaffold on which to run the mRNA, and the PTC (23S rRNA; 50S subunit), at which amino acids are joined to a peptide chain, the ribosome has additional features, which we consider to be subsequent evolutionary add-ons [37]. Remarkably, the ribosome must be coevolved with the genetic code, and a model for evolution of the code parallels these advances, most of which occurred prior to LUCA.
Ribosomes, of course, continue to evolve in Eukarya, but these enhancements are generally regulatory to support cell-and organism-specific functions [71,72].

The Prokaryotic Ribosome
A recent cryo-electron microscopy paper reveals the Thermus thermophilus ribosome and its dynamics and fidelity in amazing detail [61]. We highly recommend this paper to any with an interest in general translational mechanisms and fidelity. Here, we provide a general description of the translational mechanism with particular attention to evolution of the EF-Tu GTPase "latch", which we consider to be the fundamental advance in evolution of translation systems and the genetic code. In the paper referenced above, the "latch" is described as conformational closing of the 30S ribosome subunit. Also, of importance is the recognition that IF2, EF-Tu and EF-G are ancient homologous GTPases that function in translation.
So far as we can discern, the prokaryotic ribosome was evolved before LUCA and the same basic functional design was maintained in Archaea and Bacteria. Initiation occurs on the small 30S subunit aided by initiation factors (IF1, IF2 and IF3). IF2, elongation factor (EF)-Tu and EF-G are homologous GTPases that function as chaparonins in the translation process [37]. Many Archaea and Bacteria have a UCCU sequence near the 3'-end of the 16S rRNA to orient the sequence ~AGGA on mRNA (the ribosome attachment site) relative to the AUG start codon sequence, which must be positioned in the ribosome P site for translation initiation. Incoming tRNAs first associate with EF-Tu before binding to mRNA. The 16S rRNA (30S subunit) has a "head", "neck" and "body". The head can adjust its rotation to orient the mRNA for initiation and to help by reversible swiveling and mRNA sliding with forward translocation during elongation. The mRNA runs along the neck where it is ratcheted forward via reversible swiveling of the head. The mRNA is held forward in part by bound tRNAs, maintaining the translation register.
For elongation, the 23S rRNA (50S subunit) associates with the 16S rRNA (30S subunit) with bound mRNA and the IFs then dissociate. The aa-tRNA-EF-Tu complex enters, GTP is hydrolyzed, the ribosome latch tightens (the 30S subunit closes) and the aa-tRNA rotates its 3'-XCCA-aa end (X is the discriminator base for aaRS discrimination and amino acid placement on tRNA) into the A-site (addition or aminoacyl site), also aligning peptide-tRNA in the P-site (peptidyl site). During tRNA rotation, EF-Tu dissociates from the A-site aa-tRNA and EF-G binds to the same ribosome site that had been occupied by its homolog EF-Tu during previous steps. EF-G hydrolyzes GTP and stimulates forward translocation. There is limited reversible rotation of the 23S rRNA (50S subunit) versus the 16S rRNA (30S subunit), facilitating forward translocation. tRNAs advance from the A-site to the Psite to the E-site (exit site). Having 2-3 tRNAs bound to the mRNA during elongation helps to maintain the translation frame.
EF-Tu tightens the ribosome "latch", which is a central feature of ribosome evolution [57][58][59][60]73]. The latch closes around the aa-tRNA-mRNA helix bound in the ribosome A-site. Enclosure includes interactions with 16S rRNA G530, A1492 and A1493 and 23S rRNA A1913 (Tth numbering). The closed conformation of the ribosome confirms 4-base recognition at the 2 nd and 3 rd tRNA anticodon positions. Evolution of the latch, therefore, allowed evolution of the genetic code to advance beyond ~8 amino acids (i.e. 2x4 assignments; we posit that only a single wobble position can be read at one time on the primitive ribosome) [23,34]. Prior to evolution of the EF-Tu GTPase latch, both the 1 st and 3 rd anticodon positions were wobble positions, limited to pyrimidine versus purine resolution. Evolution of the EF-Tu GTPase latch, therefore, "teaches" the ribosome to potentially read a 32assignment code, which froze at a 20-amino acid + stop codon standard code [2,3,74]. In order for the A-site tRNA to advance to the P-site, the latch must open. Because tRNAs bound in the A-site, P-site and E-site have anticodon interactions paired at the mRNA, associated with the 16S rRNA (30S subunit), and also 3'-end interactions with 23S rRNA (50S subunit), multiple intermediate structures (referred to as hybrid states) are possible [61]. Hybrid states appear to rotate around the EF-Tu latch. Setting of the latch results in dissociation of EF-Tu and is followed by a large rotation of the 3'-end of the verified aa-tRNA into the PTC A-site, a step referred to as "accommodation". Rotation of the deacylated P-site tRNA into the E site, by contrast, is associated with rotation of the 3'-end of the tRNA associated with opening of the latch and forward translocation. Depending on the step, therefore, tRNAs ratchet independently at their 3'-ends and anticodon ends, creating the hybrid states.
For chemistry, a P-site tRNA has XCCA-peptide at its 3'-end (X=the discriminator base). The Asite tRNA has XCCA-aa at its 3'-end. In the 23S rRNA (50S subunit) P-site, the sequence 2248-CUGGGGCGG-2256 presents 2251-GG-2252 to form Watson-Crick pairs with the 3'-CC of the P-site peptide-tRNA. In the 23S rRNA (50S subunit) A-site, the sequence 2548-GGGCUGUUCGCCC-2560 presents 2553-G to pair with 3'-CC (the 2 nd C), to orient the A-site aa-tRNA. It appears that proximity of P-site and A-site tRNAs in the dehydrating environment of the PTC may be sufficient to form the next peptide bond [43].
The peptide chain elongates by its transfer to the A-site tRNA, resulting in deacylation of the Psite tRNA. After deacylation, the P-site tRNA can advance its 3'-end to the E site. Because the peptide chain is transferred to the A-site tRNA from the P-site tRNA, the peptide was lengthened by one amino acid. Once the P-site tRNA releases the peptide chain to the A-site tRNA, the deacylated P-site tRNA can then translocate to the E-site, displacing and releasing the E-site tRNA. So the march of tRNAs aided by EF-G through a compact tRNA-shaped tunnel in 23S rRNA helps to ensure forward translocation and maintenance of the translation frame. Because the 23S rRNA apparently evolved to match the shapes of advancing tRNAs, we posit that the final conformation of 23S rRNA and the 50S subunit evolved around tRNAs, and that tRNAs evolved prior to the final evolved shape of the prokaryotic ribosome.
The exiting peptide chain extends from the active site A-peptide (before translocation) or Ppeptide site (after translocation), through a channel in the ribosome. When the peptide exits the ribosome, it can begin to fold or it can be targeted to a membrane for transport or excretion. Translation termination occurs when protein release factors bind to stop codons in the mRNA. Because there are no tRNAs corresponding to stop codons, anticodons that are complementary to stop codons are not represented in the tRNA-centric standard genetic code.
The genetic code expanded by two mechanisms we can identify: 1) tRNA charging errors; and 2) modification of amino acids bound to tRNAs (eg. AspAsn, GluGln and pSerCys (pSer for phosphoserine)). We posit that the first amino acids to enter the code filled large sectors of the code, and these sectors were then invaded by other amino acids. Invasion follows a strict set of rules that we describe in more detail above and below. Significantly, because invasion by incoming amino acids required tRNA charging errors or amino acid modifications, translational fidelity is very important for the eventual freezing of the code. Ribosome fidelity, for instance, evolution of the EF-Tu GTPase latch, was fundamental to first expand and then to freeze the code.
We posit that hydrogels and related LLPS compartments are very important in prokaryotic translational functions, transcription-translation coupling, protein folding and chaparonin functions, although we were unsuccessful at finding specific references. In prokaryotic systems, hydrogels are small and hydrogels are disordered by their nature, making hydrogels difficult to visualize and analyze using current imaging methods.

Aminoacyl-tRNA Synthetases (aaRS)
We posit an updated model for aaRS evolution [23,34,36,37,75]. Remarkably, the pattern of aaRS evolution matches the pattern of genetic code evolution, providing a pathway for evolution of the genetic code. Also, apparent coevolution of the genetic code and aaRS enzymes indicates that the models we present for aaRS enzyme evolution and genetic code evolution are mutually-reinforcing, reliable and predictive. Remarkably, the evidence we cite of coevolution has been maintained during ~4 Ga of evolution with significant potential for divergence. aaRS evolution patterns show coevolution with genetic code columns, which represent the 2 nd tRNA anticodon position, the most important position for translational accuracy. As we discuss below, amino acids appear to add into the genetic code by rows, which represent the 3 rd anticodon position. Recently, our laboratory clarified aaRS evolution using the Phyre2 protein-fold recognition server, which utilizes sequence and structure to align sequences available in the Protein Data Base with a seed sequence [76]. Phyre2 provides evolutionary relationships of all (or most) aaRS enzymes in a structural class at once. The results provide a road map for evolution of the genetic code. Here, we provide an explanation for the radiation of the aaRS enzymes according to models for genetic code and tRNAome evolution.

aaRS structural classes
There are two structural classes of aaRS (class I and class II) with multiple structural subclasses (i.e. A-E) [53]. Class I and class II aaRS have incompatible folds but are homologs by sequence (see below). Class I aaRS enzymes have an active site arranged on a set of parallel -sheets. As a result, class I aaRS have been referred to as a "Rossmann-like" fold. Class I aaRS, however, are not homologs of Rossmann fold proteins. By contrast, class II aaRS mount their active sites on a set of antiparallel -sheets. Both class I and class II aaRS enzymes are among the very first proteins to evolve on Earth, so aaRS are ancient proteins that evolved before LUCA and coevolved with the genetic code. Both class I and class II aaRS enzymes can have an extra editing domain to remove an inappropriately attached amino acid from a tRNA. Remarkably, in Archaea, only amino acids found in the left half of the genetic code (columns 1 and 2) have editing active sites (see below).
Archaeal GlyRS-IIA was the first aaRS enzyme to evolve. Identifying GlyRS-IIA as the primordial aaRS indicates, once again, that glycine may have been the first encoded amino acid and that glycine maintained the dominant position in the evolving code. Of course, GlyRS-IIA is a product of protein encoding, so significant evolution of the genetic code must have been supported initially by ribozymes charging tRNAs (i.e. GlyRS-RBZ; RBZ for ribozyme) [77][78][79]. We posit that, because of coevolution, divergence of aaRS enzymes followed the pattern of evolution of the evolving tRNAome, and tRNA Gly held the most favored position in the code. GlyRS-IIA, therefore, evolved as the first protein aaRS, and all class I and class II aaRS diverged from GlyRS-IIA.
Folding of primitive GlyRS-IIA was directed by a Zn-finger near the protein N-terminus [36]. All class II aaRS enzymes derive in lineage from GlyRS-IIA. To form class IA aaRS enzymes (i.e. IleRS-IA and ValRS-IA), a primitive GlyRS-IIA was extended at its N-terminus and then refolded. Because of the N-terminal extensions, class IA enzymes are about twice as long as GlyRS-IIA. Other class I aaRS enzymes were derived from a class IA aaRS (probably ValRS-IA) [23,34,36,37]. Initially, class IA enzymes folded around the N-terminal protein extension and two Zn-fingers [36]. The more Cterminal Zn-finger found in IleRS-IA and ValRS-IA, in some ancient Archaea, corresponds to the single Zn-finger in ancient archaeal GlyRS-IIA. In Figure 6, sequence alignments are shown comparing a common segment of archaeal GlyRS-IIA enzymes and IleRS-IA and ValRS-IA enzymes. To succeed, these searches must be done using ancient archaeal species. The e-values for the alignments are 3x10 -13 (GlyRS-IIA to IleRS-IA) and 6x10 -11 (GlyRS-IIA to ValRS-IA). The chances of these independent alignments of homologous GlyRS-IIA regions being due to random events (i.e. convergent evolution) would be ~1 to 10 23 against. We conclude that class IIA and class IA aaRS enzymes are homologs by sequence that have subsequently diverged in evolution to maintain the fidelity of translation. Other models (i.e. Carter-Ohno-Rodin) for class I and class II aaRS evolution have been published [80][81][82][83], but these models are not correct. In the Carter-Ohno-Rodin model, "urzymes" for class I and class II aaRS were posited to be generated from both strands of a primordial bidirectional gene. Such a model is inconsistent with simple homology of class I and class II aaRS, as we demonstrate ( Figure 6) [36]. Methanobacterium congolense; Mbr) Methanobacterium bryantii. These alignments can be extended [36].
We posit that hydrogels sequestering tRNAs may have been involved in the earliest folding and class divergence of aaRS enzymes. Class IA and class IIA aaRS folding was initially directed by Zn-binding and the N-terminal extension of class IA enzymes, which comprises part of the class I aaRS active site. Because class I and class II aaRS bind opposite faces of their cognate tRNAs, tRNA binding might have also promoted appropriate aaRS folding. Hydrogels can sequester RNAs and could promote early aaRS class I and class II folds. Figure 7 shows divergence of aaRS enzymes as they relate to the standard genetic code in Archaea. The graph represents the closest homologs in the Protein Data Base identified using the Phyre2 protein-fold recognition server ( Figure 7A), so alignments and homology models represent sequence similarity and structural modeling [34,76]. Distances in the map represent evolutionary distances, so clustered aaRS are closely related. Remarkably, all class I aaRS enzymes were connected using the Phyre2 server. Many of these connections would not have been detected using sequence alignments. By contrast, some of the nodes in the class II aaRS map could not be connected using Phyre2. For instance, no relevant connection of class IIA and class IID enzymes could be obtained.

The pattern of aaRS evolution gives the pattern of amino acid placements in the standard genetic code
Other approaches to aaRS evolution have not provided as clear a picture or one so clearly correlated with genetic code evolution [75]. Figure 7B shows how the tRNA anticodon relates to the genetic code. The anticodon 2 nd position relates to code columns. The anticodon 3 rd position relates to code rows (1-4). The anticodon 1 st wobble position relates to A and B rows. In Figure 7C, the standard genetic code (codon-anticodon table) is shown for Archaea with coloring for closely related aaRS enzymes, strongly indicating genetic code evolution within code columns (anticodon 2 nd position). Because the genetic code evolved around the tRNA anticodon, and because genetic code evolution is tracked by aaRS evolution, we strongly advocate presenting the code as a codon-anticodon table including aaRS evolutionary data.

A model for Code Sectoring based on aaRS coevolution
We posit that the genetic code coevolved with aaRS enzymes and tRNAomes and that a record of that coevolution is maintained in the pattern of aaRS divergence and the distributions of amino acids in the code. Here, we posit models for the sectoring of the genetic code correlated with aaRS evolution in Archaea and for modifications of the model in Bacteria [23,34]. In Figure 1, we indicate that Bacteria may have been derived from Archaea [12,19]. As a first consideration, most genetic code sectoring is within code columns, indicating powerful coevolution of the genetic code, aaRS enzymes and the 2 nd anticodon position of tRNA. In column 1 (2 nd anticodon position A), valine, leucine and isoleucine are hydrophobic amino acids, and ValRS-IA, MetRS-IA, IleRS-IA and LeuRS-IA are all closely related class IA aaRS enzymes (Figure 7). From metabolic pathways, valine can be converted to leucine in 5 enzymatic steps. We posit that this conversion may have initially occurred with valine bound to tRNA. So, Val-tRNA Leu Leu-tRNA Leu (catalyzed by 5 enzymes) prior to evolution of LeuRS-IA, which we posit was derived from ValRS-IA after duplication. PheRS-IIC may have been derived from a similar enzyme to pSerRS-IIC (pSer for phosphoserine) [84][85][86]. pSerRS-IIC was probably the route by which cysteine was first introduced into the genetic code (see below). To suppress translation errors, the aaRS enzymes in genetic code column 1 have separate editing active sites to remove an inappropriately attached amino acid [53], further demonstrating their similarity and their evolution in genetic code columns. Significantly, in Archaea, only amino acids found on the left half of the genetic code (columns 1 and 2) utilize aaRS enzymes with editing active sites ( Figure  7C) [23,34,36].
In column 2 (2 nd anticodon position G), serine and threonine are similar amino acids, and SerRS-IIA, ProRS-IIA and ThrRS-IIA are closely related class IIA aaRS enzymes. We posit that AlaRS-IID may have been derived from a similar enzyme to pSerRS-IIC, i.e. by duplication, mutation and repurposing. Probably, AlaRS-IIA was replaced by AlaRS-IID early in code evolution (i.e. before LUCA), in order to enhance the fidelity of tRNA charging. SerRS-IIA, ThrRS-IIA and AlaRS-IID have editing active sites. In Archaea, AlaX is a tRNA editing function homologous to AlaRS-IID but without a synthetic active site to add alanine to tRNA Ala . We consider these observations to strongly support coevolution of amino acids, aaRS enzymes and the genetic code within column 2.
In column 3 (2 nd anticodon position U), aspartate and asparagine are related amino acids, and AspRS-IIB, AsnRS-IIB and HisRS-IIA are reasonably closely related enzymes. We posit that AspRS-IIB was initially AspRS-IIA from which HisRS-IIA was derived (see below). AsnRS-IIB was derived from AspRS-IIB. In some ancient Archaea, Asp-tRNA Asn is converted enzymatically to Asn-tRNA Asn by Asp-tRNA Asn amidotransferase, indicating an important mechanism for evolution of the genetic code through modification of amino acids bound to tRNAs [87][88][89][90][91]. Of course, aspartate and glutamate are closely related amino acids, drawing a further linkage of most of the amino acids in column 3. In column 3, glutamate and glutamine are closely related amino acids and GluRS-IB, LysRS-IE and GlnRS-IB are closely related enzymes. The structural classification of LysRS-IE is deceptive ( Figure 7A). Despite the different sub-classifications, LysRS-IE is very similar to GluRS-IB and GlnRS-IB. In some ancient Archaea, Glu-tRNA Gln is converted to Gln-tRNA Gln by modification utilizing a Glu-tRNA Gln amidotransferase, indicating an evolutionary intermediate leading to replacement with GlnRS-IB [87][88][89][90][91]. Tyrosine and TyrRS-IC are late additions to the genetic code. No column 3 aaRS enzymes in Archaea have editing active sites. In Bacteria, LysRS-IIB replaced archaeal LysRS-IE. In Bacteria, LysRS-IIB edits [53]. Bacterial LysRS-IIB appears to be derived from AspRS-IIB, further indicating evolution within code columns, even when an aaRS appears to have been replaced in evolution. Surprisingly, there appears to be very little chaos in evolution of the genetic code.
In column 4 (2 nd anticodon position C), a jumble of amino acids and aaRS enzymes is found. Archaeal GlyRS-IIA is the aaRS enzyme from which all aaRS enzymes class II and class I were derived ( Figure 6A). Class I aaRS enzymes were generated by refolding an ancient GlyRS-IIA probably to ValRS-IA ( Figure 6). In Bacteria, GlyRS-IID replaces archaeal GlyRS-IIA. GlyRS-IID, in Bacteria, is probably derived from AlaRS-IID, which arose prior to LUCA ( Figure 7A). Using Phyre2, no direct homology was detected linking class IIA and class IID aaRS enzymes (without intermediates), indicating that GlyRS-IID and AlaRS-IID were reinvented. We posit that, in Bacteria, archaeal GlyRS-IIA and, before LUCA, AlaRS-IIA were replaced in evolution to increase translational fidelity. ArgRS-ID and subclass IA enzymes are closely related despite the sub-classification of ArgRS-ID. ArgRS-ID is also closely related to CysRS-IB, and cysteine and arginine are found in nearby sectors, in column 4. In some ancient Archaea, pSer-tRNA Cys is converted to Cys-tRNA Cys by Sep-tRNA:Cys-tRNA synthase (Sep for phosphoserine), indicating how cysteine first entered the genetic code and how CysRS-IB arose [84,86]. Cysteine was needed in proteins from an early time in evolution to ligate metals [4]. Subsequently, CysRS-IB could have evolved from ArgRS-ID to charge tRNA Cys directly. Tryptophan and TrpRS-IC are posited to be the final additions to the genetic code [92,93]. TrpRS-IC was probably derived from TyrRS-IC. We posit that serine invaded column 4 of the genetic code by jumping from column 2 (see below).

aaRS accuracy
The genetic code evolved around the tRNA anticodon. To an extent, this coevolution relates to aaRS enzymes recognizing the tRNA anticodon as a direct determinant to accurately place an amino acid on the cognate tRNA. Exceptions to this general rule, however, are also of interest for understanding evolution of the code (see below). The accuracy of amino acid placement by aaRS enzymes is a complicated issue that is only addressed briefly here. Because tRNAs are so similar in form and sequence, subtle determinants and anti-determinants are recognized by aaRS enzymes. As examples, aaRS enzymes may recognize some of the following determinants and anti-determinants in tRNAs: 1) the discriminator base; 2) the acceptor stems; 3) the anticodon; 4) the V loop; and 5) the D loop [53]. Generally, class I and class II aaRS enzymes recognize opposite faces of the tRNA, so they may recognize different features as determinants and/or anti-determinants for discrimination and amino acid placement. The active site of the aaRS also has particular properties that accept the appropriate amino acid and reject incorrect substrates. For instance, the size of the active site pocket is appropriate for the amino acid substrate rejecting larger substrates. Amino acids with greater character, i.e. charge, hydrogen bonding and flexibility or rigidity, tend to more easily be discriminated in the aaRS active site. Hydrophobic and neutral amino acids, by contrast, are associated with aaRS enzymes with editing active sites. Remarkably, aaRS enzymes that edit were largely restricted to the left half of the genetic code (columns 1 and 2). Also, hydrophobic and neutral amino acids that, generally, require aaRS editing are found in columns 1 and 2.
The aaRS enzymes that lack tRNA anticodon recognition include AlaRS-IID, LeuRS-IA and SerRS-IIA [53]. Significantly, these aaRS enzymes that lack anticodon recognition have editing active sites to suppress charging errors. In ancient Archaea, the editing function of AlaRS-IID is supplemented by AlaX enzymes that edit inappropriately aminoacylated tRNA Ala but lack an active site to add alanine. Apparently, a different evolutionary route was taken to support the accuracy of alanine charging on tRNA Ala . Probably, AlaRS-IID evolved to replace a now extinct AlaRS-IIA to reduce tRNA Ala charging errors. Remarkably, tRNA Leu and tRNA Ser are the only type II tRNAs in the archaeal standard code. Furthermore, leucine, serine and arginine are the only amino acids with 6codon sectors. Below, we propose a model to explain the evolution of the 6-codon sectors. In 6-codon sectors, multiple columns (serine) and rows (leucine, serine and arginine) are crossed. Because this causes ambiguities reading the anticodon, other strategies for tRNA discrimination became necessary. Recognition of the expanded V loops, for instance, aids tRNA Leu and tRNA Ser discrimination. Arginine is a large, stiff amino acid with fairly unique hydrogen-bonding potential, so ArgRS-ID active site specificity for arginine and other tRNA Arg determinants largely describes the specificity of tRNA Arg charging. ArgRS-ID does not edit. Also, ArgRS-ID does recognize the tRNA Arg anticodon 2 nd position. In Archaea, only SerRS-IIA on the right half of the code (column 4) edits, and SerRS-IIA is also found, and probably initially resided, in the left half of the code (column 2). Other than SerRS-IIA, only column 1 and 2 aaRS enzymes edit in Archaea.

Pre-life to LUCA
The pathway of the pre-life to life transition on Earth is largely unknown [4,94]. Our contention has been that the key advance in evolving to cellular life was evolution of tRNA, leading to evolution of tRNAomes, the genetic code and translation systems [23,35,36]. As a guesstimate, we consider the evolution of the code to be a "frozen accident" [34,74,95] that might have taken place over about 200-300 million years. To be more accurate, the code was established systematically rather than accidentally. The code was "frozen" by translational fidelity mechanisms. According to our view, evolution of the genetic code was the dominant pathway to enable life, making other metabolic, energy and motor pathways [96][97][98] of potentially secondary importance. We have published detailed models that we consider to be highly informative and reliable for evolution of the genetic code [23,34].
The genetic code evolved around the tRNA anticodon ( Figure 7B), and tRNA evolved from a highly patterned primordial sequence that is known almost to the last nucleotide (Figures 2-4) [45,47]. The patterning includes sequence repeats and inverted repeats (stem-loop-stems). The pre-life world, therefore, was capable of accurately producing repeating RNA sequences. At a minimum, GCG repeats (5'-acceptor stems), CGC repeats (3'-acceptor stems) and UAGCC repeats (D loop microhelix) were generated. Because the pre-life world generated 31-nt minihelices ( Figure 2C), a capacity to "measure" the truncations of repeat units must also have existed. The pre-life world must have been capable of complementary RNA replication, otherwise inverted repeats found in tRNA would not be notable. So, before evolution of the first tRNA, ribozymes must have existed to generate RNA repeats, to accomplish complementary replication and to excise functional RNAs from longer RNAs.
According to our view, the code in mRNA evolved from the code in tRNA, as we have described [23,34]. This conclusion follows from the hypothesis that the genetic code evolved initially to synthesize polyglycine [23,[34][35][36][37]46]. Adoption of this model, yielded the following insight. The genetic code appeared to have evolved by filling in large sectors of the code, which were then invaded by incoming amino acids. Because the code initially encoded only polyglycine, tRNA Gly was the first tRNA. Essentially, all anticodons must then have mutated from tRNA Pri , which is a primitive tRNA Gly , to all possible sequences. This is easy to imagine. The anticodon loop is exposed in tRNA, so the anticodon could mutate without affecting overall tRNA structure. Mutations in other positions of the anticodon loop (i.e. loop positions 1, 2, 6 and 7), by contrast, may disrupt the 7-nt anticodon loop conformation, which has a characteristic U-turn between loop positions 2 and 3 [99]. If all anticodons encoded glycine, all mRNA encoded polyglycine. Remarkably, in ancient Archaea, the tRNA Pri (the primordial tRNA) is most closely related to tRNA Gly ( Figure 3). As newly added amino acids invaded, displaced amino acids retreated, retaining the most favored anticodons and surrendering less favorable anticodons to the invader. According to this model, the entire genetic code can be populated, as the standard genetic code was populated and subsequently maintained for ~4 Ga. The rules for anticodon preference are as follows. In the 2 nd and 3 rd anticodon positions, the preferences are C>G>U>>A. These preferences are most apparent in the 3 rd anticodon position, rather than the 2 nd position, which was easier to read on the primitive ribosome. In the 1 st anticodon position (the wobble position), the preference is G>(U~C)>>>>>A. In Archaea, A is strongly disfavored in the anticodon wobble position, and A is rarely or never encoded. In Bacteria and Eukarya, wobble A can be modified by deamination to inosine [36,62]. In the wobble position, only purine/pyrimidine resolution was achieved [23,34]. Evidence for this model includes the following. In Archaea, tRNA Gly is closest in sequence to tRNA Pri (a primordial tRNA) (Figure 3). Archaeal GlyRS-IIA is the primordial aaRS from which all aaRS enzymes radiated (Figure 7). Glycine, which is the first amino acid encoded, retains the best anticodons (2 nd and 3 rd anticodon position C). Glycine, alanine, aspartic acid and valine appear to be the first four encoded amino acids [2,3,21,23,34], and they occupy the most favored row 4 (3 rd anticodon position C). We posit that aspartic acid entered the code before glutamic acid, and Asp retained the preferred anticodon (1 st anticodon position G (Asp) appears to be preferred over 1 st anticodon U/C (Glu)). Some of the last amino acids to enter the code occupy disfavored 3 rd position A (Phe, Tyr, Cys, Trp). Stop codons, which are read in mRNA by protein release factors [65], occupy disfavored row 1 (3 rd position A). Other arguments for the model and its detailed description have been published elsewhere [23,[34][35][36][100][101][102].
So, evolution of tRNA, mRNA and the genetic code can reasonably be understood. Much of the genetic code evolution can be inferred from the relatedness of aaRS enzymes, which is a largely solved problem (Figure 7). Because our analyses are all based on existing sequences, this is a topdown analysis that penetrates deep into the pre-life world. Because the standard genetic code dominates life on Earth, the top-down approach clearly identifies a winning evolutionary strategy. By contrast, bottom-up approaches may identify reasonable pathways that never became dominant or that may have gone extinct.

Polyglycine World
We posit that the genetic code initially evolved to encode polyglycine. First of all, tRNA Pri is almost a tRNA Gly sequence in ancient Archaea (Figure 3). An archaeal typical tRNA sequence (similar to a consensus sequence) is essentially a tRNA Gly , indicating that other archaeal tRNAs radiated from tRNA Gly [35]. Glycine is the simplest amino acid, and glycine was present early on Earth [103]. GlyRS-IIA in Archaea is the root of both the class I and class II aaRS trees ( Figure 7A), indicating that GlyRS-IIA was the first aminoacyl-tRNA synthetase. We posit that when aaRS ribozymes were replaced by encoded protein enzymes, GlyRS-IIA was first because glycine occupied the dominant position in the code.
We further posit that the prior minihelix world that existed before tRNA world ( Figure 2C) also evolved to synthesize polyglycine. In tRNA world, two 31-nt minihelix sequences were preserved. We posit that numerous other 31-nt minihelices with ligated 3'-ACCA supported polyglycine synthesis. Polyglycine appears to have been of strongly selected value in the prebiotic world. Interestingly, hemolithin (a polyglycine/hydroxyglycine polymer with coordinated metals) has been identified in meteor samples. Hemolithin appears to be a pre-biotic modified polyglycine transported from outer space that was not genetically encoded [103]. We consider identification of hemolithin to be evidence of a polyglycine world before evolution of biotic systems.
In the ancient world, polyglycine could have functioned as a hydrogel to enhance protocell chemistry. In this regard, we note that some human transcription factors include long polyglycine tracts. Human transcription is highly dependent on hydrogel (LLPS) compartments [13][14][15]18]. One example is the human androgen receptor, which includes a Gly23 tract. Forkhead Box Protein F1 has a Gly11 tract. zinc finger homeobox protein 3 has a long polyglycine tract. Alpha-fetoprotein enhancer binding protein, AT-rich interactive domain-containing protein and SWI/SNF chromatin remodeling complex subunit OSA2 are other examples. UNC-80 (ion transport) and phosphatidylinositol 4kinase (signaling) also have polyglycine tracts and may rely on hydrogels for functional compartmentalization. We posit that these human factors could be models for the functions of polyglycine in ancient systems. In studying ancient evolution, we posit that polyglycine included in protocell systems will improve their coacervate properties and structural stability. Silk fibroin is glycine-rich and includes polyalanine tracts [111]. Fibroin forms -sheet amyloid-like assemblies and can form hydrogels. We posit that protocell systems packed with tRNAs, polyglycine, short peptides, large RNA assemblies and other early metabolites will be shown to have enhanced activities. Advancing to a GADV (Gly, Ala, Asp, Val) world, of course, would enhance the potential for forming hydrogels and related compartments (see below). Figure 8 shows a proposed order for the entry of amino acids into the genetic code. Figure 9 gives a highly detailed model for evolution of the code, considering anticodons, codons and aaRS enzymes [23,34,36]. We posit that glycine was the first encoded amino acid. Then, the code sectored on the 2 nd anticodon position to encode Gly, Ala, Asp and Val [2,3,21,[112][113][114][115][116][117][118]. The 8-aa code may have encoded Gly, Arg, Asp, Glu, Ala, Ser, Val and Leu. The 8-aa code represents a bottle neck in evolution because the EF-Tu GTPase latch was necessary to push the code beyond 2x4 complexity (1 wobble (1 st or 3 rd anticodon position) + 2 nd anticodon position). At the ~16 amino acid stage, we posit that the code may have included Gly, Arg, Asp, Glu, Asn, Gln, His, Lys, Ala, Thr, Pro, Ser, Val, Ile and Leu. From this stage, the standard genetic code evolved, mostly by filling row 1 (disfavored 3 rd anticodon position A), which was the most difficult row to fill. Our proposed order for amino acid entry is very similar to models proposed by others [2,3,21,[114][115][116][117][118]. According to our model, the code transitions from the simplest amino acids to more complex amino acids, indicating that amino acid metabolism and the genetic code co-evolved. Our model incorporates negatively charged and positively charged amino acids relatively early, in part, to evolve more complex proteins. Our model incorporates aromatic amino acids last, consistent with their late evolution, as proposed by others [21,93].

Evolution of Stop Codons
The model shown in Figures 8 and 9 was designed in part to better model the evolution of stop codons and to better describe the evolution of the 1 st row (disfavored 3 rd anticodon position A). Also, the model provides potential insight into evolution of 6-codon sectors for leucine, serine and arginine.
The genetic code appears to have filled from the 4 th row (3 rd anticodon position C) to the 2 nd row (3 rd anticodon position G) to the 3 rd row (3 rd anticodon position U) to the 1 st row (3 rd anticodon position A), so the 1 st row was the most difficult to fill, and A was strongly disfavored in the 3 rd anticodon position (Figure 8). We posit the following explanation. Before evolution of the EF-Tu GTPase latch, the 1 st row may have been filled with tRNAs that were utilized inefficiently, often resulting in termination of translation and relatively short peptide release. In Archaea, A is very inefficiently utilized in the 1 st anticodon wobble position. We posit that A was very inefficiently utilized in the 3 rd anticodon position, when the 3 rd anticodon position was a wobble position, before evolution of the EF-Tu latch. We guess that row 1 tRNAs were charged with amino acids but were infrequently utilized. We posit, therefore, that stop codons recognized by protein release factors evolved after evolution of the EF-Tu latch. Because A was strongly disfavored in the 3 rd anticodon position, the location of stop codons to row 1 is telling. Because row 1 (3 rd anticodon position A) anticodons are disfavored and because stop codons are recognized as mRNAs, it makes sense that stop codons are located to row 1. Within the standard code, no tRNAs correspond to stop codons.

A Working Model
In Figure 9, we show a highly detailed model for evolution of the genetic code. This is a variation of models previously published by our laboratory [23,34]. Possible advantages of this model are: 1) an improved description of the evolution of stop codons; 2) an explanation for serine jumping from column 2 to column 4 of the code; 3) possible new insight into evolution of 6-codon sectors; and 4) disfavoring of A in the anticodon 3 rd position. When we first tried to formulate these models, we thought that such detailed accounts of the genetic code evolution were not reasonable. Now, we are convinced that these models are highly descriptive and likely highly accurate. We were surprised at how easily these models unfolded from a very small number of reasonable initial assumptions and, also, how readily hypotheses for tRNA, aaRS and genetic code evolution combined.

Assumptions and background
The main initial assumption that we made was that glycine was the first encoded amino acid. Furthermore, we assumed that the entire genetic code (anticodons and codons) initially encoded polyglycine (Figures 8 and 9A). We considered that polyglycine could have multiple purposes in the ancient world. Polyglycine could function as a hydrogel, and polyglycine could act as a cross-linking agent, as it does in bacterial cell wall peptidoglycan layers [108][109][110]. So, polyglycine can be a structural and hydrogel (LLPS) component, enhancing protocell function (i.e. membrane function and ion transport) and rigidifying protocell membranes. One reason to believe this is a reasonable assumption is that tRNA Pri (the primordial type I tRNA) is essentially tRNA Gly in ancient Archaea ( Figure 3). As noted above, GlyRS-IIA in Archaea is the primordial aaRS enzyme (Figure 7). If the entire genetic code initially encoded polyglycine, this provides a model for co-evolution of mRNA and tRNA. If all mRNA codons and tRNA anticodons initially encoded glycine, it was very simple to coevolve mRNA codons and tRNA anticodons with new invasions of amino acids from outside the code.
The concept of filling in the genetic code with glycine and then adding amino acids by invasion resulted in a simple model for additions of amino acids to the code. Glycine looked like the first encoded amino acid, and glycine utilizes anticodons GCC, UCC and CCC (2 nd and 3 rd anticodon position C). As we discuss, C is favored in the 2 nd and 3 rd anticodon positions. A is (essentially) never found in the anticodon wobble 1 st position in Archaea. Furthermore, the four simplest amino acids glycine, alanine, aspartate and valine appear to be the first four amino acids encoded, and these amino acids are all found in row 4 of the genetic code (3 rd anticodon position C) [114][115][116][117][118]. It began to look as if C was a favored base in the tRNA anticodon. Because A is so strongly disfavored in the anticodon wobble position in Archaea, we began to wonder whether A was also disfavored in the 3 rd anticodon position. Phenylalanine, tyrosine, tryptophan, cysteine and stop codons appear to be late additions to the genetic code that are found in row 1, which is 3 rd anticodon position A. We began to think that A was disfavored in both the 1 st (wobble) and 3 rd anticodon positions. Stop codons are recognized by protein release factors as codons in mRNA, with no corresponding anticodon in tRNA, consistent with A being disfavored in the anticodon 3 rd position. Protein release factors recognize stop codons to release the nascent peptide from the ribosome [65]. So, C is favored in the anticodon, in the 2 nd and 3 rd anticodon positions. A is disfavored in the anticodon and, probably, in all three anticodon positions, although the effect is not obvious in the 2 nd anticodon position, which is the easiest to read. So, glycine occupies the most favored position in the genetic code (anticodons GCC, UCC, CCC), consistent with glycine being the first encoded amino acid.

Evolution in columns
Analysis of aaRS evolution indicates that much of the evolution of the genetic code occurred within code columns, which represent the 2 nd anticodon position (Figure 7). On the ancient ribosome (i.e. before LUCA), the 2 nd anticodon position was the easiest to read, in part because the 1 st and 3 rd anticodon positions were initially wobble positions. In column 1 of the genetic code, ValRS-IA, MetRS-IA, IleRS-IA and LeuRS-IA are all structural class IA enzymes ( Figure 7A). Valine, isoleucine, and leucine are hydrophobic amino acids. Valine can be converted to leucine via a 5-step pathway. Methionine is a late invader that we posit attacked an isoleucine 4-codon sector to add methionine and to evolve translation start codons, but MetRS-IA is also structural class IA, indicating additional evolution in column 1 (i.e. IleRS-IAMetRS-IA). In column 2 of the genetic code SerRS-IIA, ProRS-IIA and ThrRS-IIA are closely related enzymes of structural class IIA. Serine and threonine are closely related amino acids. There is substantial evidence, therefore, for coevolution of aminoacyl-tRNA synthetases and amino acids in column 2. In column 3, HisRS-IIA, AspRS-IIB and AsnRS-IIB enzymes are reasonably closely related, and aspartate and asparagine are related amino acids. Also, in column 3, GlnRS-IB, LysRS-1E and GluRS-IB enzymes are closely related, and glutamate and glutamine are closely related amino acids. In column 4, ArgRS-ID and CysRS-IB enzymes are closely related. In some cases, the assigned structural class for an aaRS does not represent its closest relatives. We take these data to overwhelmingly support evolution of the genetic code within columns (anticodon 2 nd position) as indicated in the model in Figure 9.
In the model (Figures 8 and 9), the genetic code sectors from one that encodes only glycine ( Figure 9A) to one that encodes glycine, alanine, aspartate and valine ( Figure 9B). Others have supported these four simple amino acids as the first four encoded amino acids [2,21,[114][115][116][117][118]. In the standard code ( Figure 9F), glycine retreats from occupying the entire code ( Figure 9A) to occupying only column 4 ( Figure 9B) and, finally, to occupying only row 4 and anticodons GCC, UCC and CCC (2 nd and 3 rd anticodon position C) ( Figure 9F). Aspartate retreats from initially occupying all of column 3 ( Figure 9B) to occupying only anticodon GUC ( Figure 9F). Alanine retreats from occupying all of column 2 ( Figure 9B) to occupying only anticodons GGC, UGC and CGC ( Figure 9F). Valine retreats from occupying all of column 1 ( Figure 9B) to occupying only anticodons GAC, UAC and CAC ( Figure 9F). In this way, the first four encoded amino acids land in the 4th row after occupying the entire code. Newly added amino acids, therefore, invade previously occupied sectors of the code. Amino acids that enter the code first surrender less-favored anticodons to invaders but retain the most favorable anticodons according to clear and inviolable rules.

An evolutionary bottleneck
On a primitive ribosome, the 1 st and 3 rd anticodon positions were initially wobble positions. The consequence of this bottleneck in genetic code evolution is that the complexity of the code freezes at ~8 amino acids (Figures 9C and 9D). Wobble positions were read with only pyrimidine-purine discrimination, and only one wobble position could be read at one time, limiting the code complexity to 2x4=8 assignments. Columns 1, 2 and 4, initially sectored on the 2 nd and 3 rd (then wobble) anticodon positions. Column 3, by contrast, initially sectored on the 1 st (wobble) and 2 nd anticodon positions. In columns 1, 2 and 4, it appears that incoming amino acids leucine (column 1), serine (column 2) and arginine (column 4) may have first invaded the 2 nd row (3 rd anticodon position G) ( Figure 9C). Interestingly, leucine, serine and arginine are the three amino acids that occupy 6-codon sectors in the fully evolved genetic code. Leucine, serine and arginine are posited to occupy row 2 first, because row 1 is difficult to occupy (3 rd anticodon position A is disfavored). It appears that the tRNA anticodon bases were selected for small size (pyrimidine>purine) and stronger hydrogen bonding potential (C>G>U>>A). These rules are most apparent in the 3 rd anticodon position (Figures 8 and 9).

Evolution of 6-codon sectors
Then, we posit that leucine (column 1), serine (column 2) and arginine (column 4) may have invaded row 3 (3 rd position U; Figure 9D). Some reasons to think this invasion of row 3 might have occurred are as follows. First, serine jumps to row 3, column 4, from column 2 in the code. This jump is easiest to imagine if serine occupies column 2, row 3, before making its jump to column 4, row 3. According to the model, only a single base change in the tRNA anticodon (GGUGCU) was necessary for serine to jump. Second, arginine occupies rows 2 and 3 in the code, as if arginine made the posited invasion of row 3. In this regard, it is notable that leucine, serine and arginine are the three amino acids to occupy 6-codon sectors in the final evolution of the standard code. Here, we suggest that 6-codon sectors may have arisen from the history of code sectoring.

Evolution of the EF-Tu latch
To proceed beyond the bottleneck of only 8 amino acids required the evolution of the EF-Tu GTPase latch. The latch sets a closed conformation of the codon-anticodon pair and the ribosome involving 16S rRNA (30S subunit) residues G530, A1492 and A1493 and 23S rRNA (50S subunit) residue A1913. Closing of the latch allows 4-base recognition at the 2 nd and 3 rd anticodon positions [57][58][59][60]119,120]. The latch also improves the accuracy of the wobble position, but the wobble 1 st anticodon position only has purine versus pyrimidine resolution. Evolution of the EF-Tu latch allows a genetic code with up to 2x4x4=32 anticodon assignments.

Completion of column 1
To complete sectoring of the genetic code in column 1, we posit that isoleucine invaded row 3, displacing leucine, which retained row 2 (3 rd anticodon position G (row 2) (Leu) was favored over U (row 3) (Ile)). Isoleucine formed a 4-codon sector with anticodons GAU, UAU and CAU. When methionine invaded CAU, isoleucine retained CAU, but isoleucine codon AUA is specified by anticodon wobble Cagmatidine modification, which does not read AUG methionine codons [68,69]. In Archaea, generally, isoleucine UAU is not utilized. So, methionine was a late invader of an isoleucine 4-codon sector. Methionine is utilized at start codons. In column 1, row 1, phenylalanine was a late addition to the code.

Completion of column 2
To complete sectoring of column 2, serine first jumped to column 4 ( Figures 9D and 9E), then, after evolution of the EF-Tu latch, the serine sector expanded to disfavored row 1 (disfavored 3 rd anticodon position A). Then proline invaded row 2, displacing serine, and threonine invaded row 3, displacing serine. Because serine jumped to column 4, row 3, serine occupied a favorable anticodon (GUC), and serine, therefore, could give up otherwise favored anticodon positions in column 2 to proline and threonine. Serine is the only amino acid to have jumped in evolution of the genetic code, indicating the overall orderly evolution of the code. Other models are possible for evolution of column 2 [23,34].

Evolution of column 3
Because of sectoring on the 1 st anticodon (wobble) rather than the 3 rd position, genetic code column 3 is the most innovated column. We posit that sectoring on the wobble position caused this innovation and the pattern of sectoring. We posit that initially, aspartate filled column 3 ( Figure 9B). Aspartate was displaced by the related amino acid glutamate in rows 4B, 3B and 2B. Aspartate retained rows 4A, 3A and 2A and surrendered rows 4B, 3B and 2B to glutamate, because, in the anticodon, wobble G is favored over wobble U and C, and aspartate entered the code first. Histidine displaced aspartate in row 2A. We posit that HisRS-IIA evolved from a primitive AspRS-IIA. AspRS-IIA then evolved to AspRS-IIB to suppress translation errors. In row 3A, first an amidotransferase evolved to convert aspartate bound to an emerging tRNA Asn to asparagine. Subsequently, AsnRS-IIB evolved from AspRS-IIB. Note that the order of invasions and modifications are indicated by the structural classes of the aaRS enzymes (Figures 7 and 9).
Glutamate occupied column 3, sectors 4B, 3B and 2B. GluRS-IA evolved to GluRS-IB. In sector 2B, glutamate bound to tRNA Gln was modified to glutamine by an amidotransferase [87,88,[121][122][123]. Subsequently, this system evolved a GlnRS-IB to substitute for the amidotransferase. We note that, metabolically, lysine can be derived from a pathway that utilizes glutamate, although the lysine carbon skeleton is derived from -aminoadipic acid. It appears, therefore, that lysine invaded sector 3B from outside the code, displacing glutamate. Despite its different stated structural class, LysRS-IE in Archaea is very closely related to GluRS-IB and GlnRS-IB ( Figure 7A). Probably, lysine invasion of the code occurred before full establishment of the GlnRS-IB system, which is incompletely evolved in some Archaea. In Bacteria, LysRS-IIB is found. We posit, therefore, that LysRS-IIB was a bacterial innovation that replaced the archaeal LysRS-IE. Bacterial LysRS-IIB is closely related to AspRS-IIB and AsnRS-IIB ( Figure 7A). We posit that, in Bacteria, LysRS-IIB evolved within column 3 to better specify accurate tRNA Lys charging. LysRS-IIB in Bacteria evolved an editing active site to remove improperly attached amino acids. It appears that LysRS-IIB mostly uses editing to discriminate against amino acids invading from outside the genetic code [53].

Column 4
Metabolically, arginine can be derived from ornithine, which may have been a more primitive positively charged amino acid utilized in pre-life [124]. It is possible, therefore, that arginine replaced encoded ornithine during genetic code evolution. We posit that arginine (or ornithine) occupied column 4, rows 2 and 3, displacing glycine, which, as the first encoded amino acid, retained the most favored anticodons, GCC, UCC and CCC (2 nd and 3 rd anticodon C) ( Figure 9C and 9D). After evolution of the EF-Tu GTPase latch, row 1 could be occupied, and additional amino acids could be encoded. We posit that serine invasion from column 2 to column 4 occurred early in code evolution, for instance, before proline and threonine invasion of column 2. We note that serine invasion of column 4 could be initiated by a single base change in the 2 nd position of the tRNA anticodon (GGUGCU). Also, SerRS-IIA is a very different enzyme than ArgRS-ID, facilitating the invasion of column 4 by limiting tRNA charging errors. Although ArgRS-ID is classified as a structural subclass ID enzyme, ArgRS-ID is closely related to subclass IA and Cys-IB enzymes ( Figure 7A).

Late evolution of row 1
We posit that phenylalanine, tyrosine, cysteine, tryptophan and stop codons, all located on disfavored row 1, were late additions to the code. Here, we suggest that these amino acids and stop codons could not be added before evolution of the EF-Tu latch. Essentially, the disfavored first row (3 rd anticodon position A) could not be efficiently occupied before evolution of the latch. Prior to evolution of the latch, the stop signal is posited to have been inefficiently functioning tRNAs with 3 rd position A. After evolution of the latch, row 1 tRNAs could be efficiently utilized, and leucine and serine could effectively invade row 1. Phenylalanine then invaded column 1, row 1A, displacing leucine. Leucine retained favored row 2 anticodon positions with phenylalanine invasion. Uncharacteristically, within column 2, serine surrendered more favorable anticodons to proline and threonine, but serine retained a favorable anticodon in column 4, row 3A. Also, serine utilizes a type II tRNA Ser . Type II tRNA Ser has an expanded variable loop that functions as a positive determinant for accurate SerRS-IIA serine addition. Because tRNA Ser is a type II tRNA, in which the expanded variable loop is a SerRS-IIA-contacted determinant for accurate charging, this facilitated serine jumping in the code, from column 2 to column 4, and allowed serine to maintain a favored anticodon in column 4 (GCU) and to surrender otherwise favored anticodons in column 2 to invading proline and threonine. At about the time of the evolution of the EF-Tu latch, we posit that protein release factors evolved to take over stop codon functions, allowing proteins to become longer and more complex with accurate starts and stops [65].

The genetic code model and perspectives
The model offered here is a variation of models published previously. The genetic code evolved around the tRNA anticodon. Taking a tRNA-centric view, therefore, simplifies the understanding of code evolution. The model presented here provides clear Darwinian selections for the locations of all amino acids in the genetic code. Amino acids enter the code by two identifiable mechanisms: 1) invasion from outside the code; and 2) enzymatic modifications of amino acids bound to tRNAs (i.e. AspAsn, GluGln and pSerCys) followed by subsequent evolution of aaRS enzymes (AsnRS-IIB, GlnRS-IB and CysRS-IB). Evolution occurred first in code columns because the 2 nd anticodon position is most important and easiest to read on a primitive ribosome. Evolution also occurred by rows according to clear anticodon preference rules (Figure 8). Column 3 sectored differently than columns 1, 2 and 4. Column 3 sectored early on the 1 st (wobble) and 2 nd anticodon positions, between aspartate and glutamate. As a result of this sectoring strategy, column 3 became the most innovated column in the code, encoding the most amino acids. Columns 1, 2 and 4, which sectored on the 2 nd and 3 rd anticodon positions, are characterized by larger blocks of anticodons (i.e. 4-and 6-codon sectors). Columns 1 and 2 are characterized by aaRS enzymes with editing active sites. In Archaea, ProRS-IIA is an exception. Because editing is a fidelity mechanism and because amino acids invade the code through tRNA charging errors, editing probably protects larger blocks of anticodons (i.e. 4and 6-codon sectors). Because arginine and glycine have unique characteristics, ArgRS-ID and GlyRS-IIA were under little selection pressure to evolve editing. Arginine is a stiff and bulky, positively charged amino acid with unique hydrogen-bonding capacity. By contrast, positively charged ornithine and lysine are very flexible. Arginine, therefore, forms more structured ion pairs, particularly with aspartate (ion pairs with glutamate are more flexible). Glycine is the smallest amino acid, so a compact active site in GlyRS-IIA limits mischarging of tRNA Gly with larger amino acids. We posit that aaRS editing protected 4-and 6-codon sectors in columns 1 and 2 by limiting mischarging of tRNAs. Editing was not necessary for the aaRS enzymes in column 4 because of glycine and arginine properties. Column 3 broke into 2-codon sectors because of early sectoring on the 1 st anticodon (wobble) position.
Once the genetic code evolved and protein enzymes came to dominate, the potential to enrich metabolism and energy utilization exploded. Whatever systems predated current systems, therefore, were replaced by enzymatic and protein motor pathways. For these reasons, we do not favor models for genetic code evolution based primarily on metabolism. Of course, an amino acid must have been available in order to have been added to the code. On the other hand, the expanding code helped drive the evolution of metabolism to provide more amino acids because, with the advent of coding, amino acids were of enhanced selective value. With regard to energy transduction, it is clear that primitive energetic systems were sufficient to support the evolution of the standard genetic code. After code evolution, an explosion in energy transduction systems occurred, leading to modern systems. We note that multiple pathways are identified in which tRNA-bound amino acids are substrates for metabolic reactions. In the pre-life world, we posit that RNA-bound peptides and amino acids were substrates for many reactions [22,54]. One effect of covalent RNA-amino acid binding was to shield a potentially reactive group on the amino acid from unproductive side reactions.
The mutually reinforcing tRNA, tRNAome, aaRS and genetic code evolution models presented here make many testable predictions. The model for genetic code evolution follows strongly from the aaRS evolution model (Figure 7). Essentially, evolution of aaRS enzymes directs the genetic code model. Detailed hypotheses were generated for polyglycine world and GADV world, and these predictions can be tested experimentally (see below). The sequence analyses underlying the tRNA, tRNAome and aaRS evolution models described above can be further challenged using additional sequence data and more sophisticated bioinformatics and computation. We make suggestions about RNA-linked reactions in the ancient pre-life world that can be pursued. For instance, if Val-tRNA Val was converted to Leu-tRNA Leu through a series of tRNA-linked reactions and evolutionary steps, as we suggest, then the history of column 1 evolution becomes significantly richer and more interesting. Such a model for column 1 evolution reinforces the interaction between our views and those that support a metabolic coevolution theory. Also, such a view enriches the possibilities for RNA-linked and tRNA-linked reactions in the ancient world [22,54]. We posit the existence of diverse ribozymes in the ancient world, some of which have not yet been generated by researchers. As an example, we imagine a telomerase-like ribozyme with a guide RNA template that accurately generated RNA repeats from an RNA 3'-end to synthesize tRNA precursor sequences ( Figure 2C). We also posit that diverse ribozyme aaRS enzymes with reasonable accuracy could be generated to initiate the genetic code before enough amino acids have joined the code to encode aaRS protein enzymes. We posit diverse RNA-linked reactions in the ancient world with many yet to be discovered.

The Great Divergence
How did Archaea and Bacteria diverge? Which domain is most similar to LUCA? Despite their many similarities, how are Archaea and Bacteria distinct? After LUCA, the great divergence occurred, which we posit resulted in the splitting of Archaea and Bacteria (Figure 1). Although this point has been argued, we identify LUCA as most similar to Archaea [12,19]. Interestingly, a recent paper placed LUCA in the midst of the archaeal domain. We noticed that tRNAomes (all of the tRNAs of an organism) were much more compact in ancient Archaea and much more diverged in Bacteria. Radiating tRNAomes from tRNA Pri , archaeal tRNA Gly is most similar to tRNA Pri (Figure 3) [35]. Archaeal tRNAomes are much more similar to tRNA Pri than bacterial tRNAomes. Archaeal tRNAomes are also more highly structured than bacterial tRNAomes, as if convergent and divergent evolution have scrambled bacterial tRNAomes. We posit that ancient archaeal tRNAomes are structured similarly to LUCA tRNAomes. A similar argument could be made for archaeal aaRS trees and genetic code structures. These analyses support the hypothesis that Archaea are more closely rooted to LUCA than Bacteria. Archaea and Bacteria have distinct membrane lipids and replication systems that may in some way be linked [27]. Eukaryotes made an evolutionary choice of bacterial membrane systems over archaeal systems, indicating a potential selective advantage to the adopted bacterial membrane system at least in the evolving eukaryotic system.

Models to Describe Genetic Code Evolution
We suggest that the dominant models for description of genetic code evolution be re-evaluated. We found these models confusing and largely unhelpful. Formerly, views of genetic code evolution broke primarily into three main categories: 1) the stereochemical theory; 2) the coevolution (metabolism) hypothesis; and 3) the error-minimization theory [2,3,74]. Genetic code evolution has also been described as a "frozen accident" that occurred very rapidly and was fixed as the standard code because too many deviations from the code were lethal [74]. The stereochemical theory posits that originally nucleic acids and amino acids interacted chemically, resulting in the code between tRNA anticodons and amino acids. We find the stereochemical theory to have little predictive power for evolution of the code, although the stereochemical theory appears to reasonably describe evolution of riboswitches [125]. The coevolution hypothesis has some value. Of course, amino acid metabolism and the genetic code coevolved [126]. How could it be otherwise? If an amino acid could not be generated by primitive metabolism, it could not be added to the code. Amino acids, therefore, were added to the code from simple to complex, with glycine, alanine, aspartate and valine the first and simplest amino acids [114][115][116][117][118] and phenylalanine, tyrosine and tryptophan among the last and most complex additions [93]. We do not see metabolism of amino acids, however, as a strong driving force in selection of new amino acids added into the code. We also see the error minimization theory as a limiting idea. The error minimization theory indicates that the genetic code was structured to minimize translation errors. We do not think that idea is correct. Our opinion is that translational fidelity mechanisms drove the freezing of the code. Interestingly, the EF-Tu GTPase latch drove first the expansion of the code from an 8 amino acid bottleneck and later the freezing of the code at 20 amino acids plus stops (Figure 9).
Our view is that the standard genetic code evolved around tRNA and the tRNA anticodon [23,[34][35][36]. The identification of 31-nt minihelices and 17-nt microhelices that can attach ACCA via a ribozyme ligase indicates a rich prebiotic chemistry involving covalent RNA-amino acid linkages and diverse ribozyme activities ( Figure 2) [22,[45][46][47]54]. The capacity for doing chemistry on tRNA-amino acid linkages persists, as we describe (i.e. pSerCys, AspAsn, GluGln and possibly ValLeu) [84,[86][87][88]90,91,122]. We posit that ACCA bound to amino acids at its 3'-end as a substrate for chemistry before the advent of 31-nt minihelices and tRNA. We strongly advocate for the tRNAcentric view, that evolution of tRNA from ligation of three 31-nt minihelices drove the evolution of the code, mRNA and rRNA. Therefore, tRNA was the central advance in biological intellectual property that enabled evolution of the code. We describe powerful Darwinian selections driving evolution mostly within code columns but ultimately with amino acids distributing in an ordered manner in code rows, according to clear selection rules. We strongly argue that the history of amino acid additions to the code follows these interpretable patterns. For instance, column 3 of the genetic code becomes the most innovated column because of sectoring between Asp and Glu, initially utilizing the 1 st and 2 nd anticodon positions rather than the 2 nd and 3 rd anticodon positions, as for columns 1, 2 and 4 ( Figure 9). Similarly, the history of evolution in columns 1, 2 and 4 appears to result in 6-codon sectors that encode leucine, serine and arginine. Of course, the failure to subdivide 6-and 4-codon sectors results in a code with fewer amino acids than could potentially be encoded. With regard to the freezing of the code, the genetic code was built by modifications of amino acids bound to tRNAs and by tRNA charging errors (invasions of amino acids from outside the code). tRNA charging errors and modifications, therefore, drove innovations of the code, and translational fidelity mechanisms froze the code. The EF-Tu latch is a major translational fidelity mechanism. The EF-Tu latch evolved to expand the code from an ~8 amino acid bottleneck to a richer code (Figure 9). Fidelity mechanisms such as the EF-Tu latch, aaRS editing, aaRS active site specificity, tRNA modifications and tRNA specialization drove the freezing of the code at 20 amino acids + stops. In this regard, we note that aaRS editing on the left half of the genetic code appears to protect 4-and 6codon sectors from further divisions to encode additional amino acids. Anticodons specifying specific amino acids evolved as we describe through the coevolution of tRNAomes, aaRS enzymes and translational fidelity mechanisms. We show clearly that aaRS enzymes and the genetic code were powerfully coevolved (Figure 7).

Polyglycine World (a working model)
This paper reveals significant detail about the ancient pre-life and protocell worlds. We describe fully the evolution of tRNAs and the genetic fragments and sequences from which tRNA was derived. We describe evolution and radiation of aaRS enzymes and the relationship between evolution of aaRS enzymes and sectoring of the genetic code. We describe how tRNA and hydrogels/LLPS could have contributed to the earliest aaRS folding. In this review, we attempt to link these ancient events to the activities of hydrogels, LLPS, membraneless compartments and amyloids. We believe the mechanisms and descriptions will lead to advances in analyses of hydrogels in pre-life reactions, protocell functions and prokaryotic systems. We consider hydrogels and related assemblies to be a formerly largely missing consideration in analyses of ancient evolution. For some future studies, we recommend increasing the system complexity and inclusion of hydrogels to help identify reactions of interest that may lead to an understanding of earlier events. Specifically, experimental probing of the richer chemistry of a GADV world ( Figure 9B) would be expected to lead to insights into a prior polyglycine world ( Figure 9A).
Significant work has been done with coacervate systems to enhance prebiotic chemistry. Examples of coacervates include clays, polymers and mica [20,21,[127][128][129]. Such materials can concentrate reactants, control the access and activity of water, participate in wet-dry cycles and provide polar surfaces to help with enantiomer fractionations. Here we propose that polyglycine was a potent prebiotic hydrogel component that drove the earliest evolution of the genetic code. Above, we describe a number of human proteins that may rely on hydrogels and that have polyglycine tracts. Shorter polyglycine tracts (i.e. length 6-8) can be found in archaeal, bacterial and phage proteins. We do not know the extent to which such short tracts can function to generate localized hydrogels. Based on our model for evolution of the genetic code, we further propose that polymers of Gly, Ala, Asp and Val may enhance hydrogel functions in prokaryotic systems, as indicated above. In pre-biotic systems, short peptide linkers can cross-link polymers to make more complex networks. Some human proteins (i.e. transcription factors) include polyalanine, polyhistidine and polyglutamine tracts.
Hydrogels (LLPS) appear to be incompletely characterized and somewhat difficult to analyze in prokaryotic systems [124,[130][131][132][133]. We guess that hydrogel compartments are important for many bacterial processes including cell division, coupling of transcription and translation, nucleoid body maintenance and rearrangements, ion transport and signaling. LLPS affects septation in Bacteria. In eukaryotic systems, LLPS has been more aggressively analyzed and is perhaps better understood. For instance, LLPS compartments tend to be larger and easier to visualize in eukaryotic systems. As a critical phase of their evolution, Eukaryotes appear to have powerfully enriched LLPS systems by increasing the use of proteins that include intrinsically disordered regions with the potential for covalent modification and non-covalent bonding of diverse hydrogel components [13][14][15]17,18,32,33,132]. Examples of intrinsically disordered regions with these properties include histone tails and the carboxy-terminal domain of RNA polymerase II. Covalent modifications of these disordered regions alter activities and factor binding. For instance, as RNA polymerase II traverses the transcription cycle, polymerase can move between LLPS compartments that support different phases of the cycle. Transcriptional super-enhancers that direct cell-specific gene regulation programs organize hydrogel compartments, and, in some cancers, super-enhancers become disorganized and sometimes fuse together [15,17]. The nucleolis is a hydrogel compartment for RNA polymerase I transcription and ribosome assembly [134]. RNA polymerase III also separates into hydrogel compartments. In eukaryotic cells, hydrogel compartments appear to segregate and regulate many other activities including translational control, translational delay via microRNA, signaling and ion transport. Within cells, hydrogel compartmentalization is a mechanism to support diverse biological processes, the activity of water and the potency of acids and bases to support chemistry.
We try to imagine polyglycine world. In Figure 10, we show a working model for a protocell that utilized polyglycine as a hydrogel, cross-linking agent and component and stabilizer of protocell walls. As discussed in this paper, many of the predicted components of the protocell matrix are indicated. We posit that the interior of the protocell was packed with polyglycine hydrogels, membraneless compartments, polyglycine amyloid accretions, primitive metabolites, tRNAs, diverse ribozymes and pre-ribosomes. Our idea of a pre-ribosome is a pre-16S scaffold, on which to mount a mRNA (of any sequence), and a mobile and independent PTC [70]. We envision that tRNAs with essentially all anticodons represented are charged (essentially) only with glycine by a GlyRSribozyme. The idea is that the most rapid sequence to mutate successfully in tRNA is the anticodon, because most other changes cause structural defects in the L-shaped tRNA structure. We imagine that ACCA was ligated to early tRNAs, so the tRNA could be charged with glycine. ACCA is the most common 3'-end in archaeal tRNAs [49], indicating a pre-life function of ACCA ligated to RNAs. In such a system, essentially all encoded protein products were polyglycine of varying lengths. We posit that translation termination initially tended to occur at NNA anticodons, because the 3 rd anticodon position was a wobble position, and 3 rd position A had difficulty pairing to U in mRNA. Because we imagine a minihelix world before a tRNA world ( Figure 2C), we posit that 31-nt minihelices with attached ACCA (i.e. via ligation) were also directed to synthesize polyglycine, and a tRNA-based polyglycine world would, therefore, have inherited many features of the prior minihelix world. Polyglycine hydrogels and RNAs are expected to powerfully regulate the activity of water in protocells. Dehydration stimulates polymerization reactions including RNA synthesis, DNA synthesis, polypeptide synthesis and polysaccharide (i.e. cell wall) synthesis. RNAs bind water and, therefore, include dehydrating pockets to support chemistry of nearby reactants. For instance, the PTC of the ribosome has been considered a molecular crowding and dehydration chamber to support peptide bond formation utilizing chemically diverse amino acid substrates [43]. The "trigger loop" in RNA polymerase II closes over the active site expelling water to support RNA polymerization [135,136]. When the activity of water is decreased, acid-base reactions and polymerization reactions become more potent.
We imagine that polyglycine can be cross-linked to short polypeptide chains similar to those found in peptidoglycan bacterial cell walls and using similar chemistry. The protocell could have been encapsulated by peptidoglycan protocell walls. Capping polyglycine with other amino acids (i.e. lysine) and reduction of polyglycine C-termini to aldehyde groups would allow Schiff's base cross-linking, so a protocell matrix mostly of polyglycine with short peptide linkers could be constructed. Some reactions could be supported by the reducing environment and others by ribozymes. If protocells had cell walls, they also had long polysaccharides in addition to long RNA polymers. We hope that the model we outline can provide some utility in developing new approaches to enrich studies of pre-biotic chemistry. We see value in a top down experimental approach starting with more complex systems (i.e. GADV world), in which more diverse chemistry might be detected, and then moving to simper systems (i.e. polyglycine world).

Conclusions
Chaotic processes initiated evolution of the genetic code by providing a Darwinian selection. Specifically, polyglycine and poly-GADV (i.e. polyalanine) hydrogels, LLPS and amyloids provided compartments and coacervates to stimulate polymerization reactions and novel chemistry within protocells. By interesting contrast, tRNAs and the genetic code evolved by highly ordered and systematic processes. Based on sequences in ancient Archaea, tRNA evolution appears to be a solved and highly systematic problem. tRNA-linked amino acid reactions in Archaea demonstrate the importance of diverse RNA-linked chemistries in early evolution of life and the genetic code. Unexpectedly, divergence of aaRS enzymes strongly indicates the pathway for evolution of the genetic code. This result was somewhat unexpected because how can protein enzymes evolve before they can successfully be encoded. We posit that the answer lies in coevolution of interacting systems, as we describe here. The EF-Tu GTPase latch (closing of the 30S ribosomal subunit), which suppressed wobbling at the 3 rd anticodon position, appears to have been a major driving force required for code expansion beyond the first 8 amino acids. Wobbling, therefore, was of fundamental importance in code evolution. The standard code expanded to 20 amino acids, but the maximum theoretical complexity of the code in tRNA is 32 anticodon assignments. To make additional anticodon assignments would require division of 4-and 6-codon sectors. aaRS editing and other fidelity mechanisms protected 4-and 6-codon sectors from further divisions.
Ancient evolution of the pre-lifelife transition from about 4 Ga ago is becoming increasingly well understood. Evolution of the genetic code is a simpler problem in archaeal systems, which are closest to LUCA (Figure 1) [12,19]. Coevolution of tRNA, mRNA, rRNA, aaRS enzymes, the genetic code and ribosomes appears to be a largely outlined problem [23,34]. The pathway of evolution of translation systems was driven via the evolution of tRNA, which is a solved problem [45][46][47]. Because of wobbling, the genetic code has a maximum complexity of 32 assignments, as in tRNA anticodons (2x4x4), not 64 assignments, as in mRNA codons (4x4x4). The standard genetic code froze at 20 amino acids + stops because of translational fidelity mechanisms [23,34,36]. Column 3 of the code is the most innovated column, encoding the most amino acids, because column 3 evolution was driven at an early stage by the tRNA anticodon 1 st wobble position, rather than the 3 rd anticodon position, as in columns 1, 2 and 4 ( Figure 9). aaRS enzyme structural classes and some chemically similar amino acids align with genetic code columns demonstrating the importance and centrality of the 2 nd anticodon position. We posit that glycine, the first encoded amino acid, initially filled the genetic code. According to the model, after glycine, other amino acids enter the code by invading previously occupied sectors and, therefore, displacing previously encoded amino acids. Amino acids that first entered the code retained the most favored anticodons according to clear rules. Glycine lands in code column 4, row 4, indicating that C is preferred in the anticodon 2 nd and 3 rd positions. The genetic code, however, appears to fill via rows: row 4row 2row 3row 1, indicating that the 3 rd anticodon position has the following preference rules: C>G>U>>A. Filling row 1 (disfavored 3 rd anticodon position A) appears to have required evolution of the EF-Tu latch, a major feature of translational fidelity that allowed expansion of the code to 20 amino acids + stops from an ~8 amino acid bottleneck. The preference rules for the anticodon wobble position are G>(U~C)>>>>>>A. Only purinepyrimidine discrimination was initially achieved at the anticodon wobble position. Using these simple rules, the entire genetic code was populated, as observed in the standard code ( Figure 9F). Other features of translation systems evolved around tRNAs. It appears that polyglycine and GADV may have constituted potent hydrogels, LLPS compartments and amyloid accretions driving strong Darwinian selection of the standard code particularly during early stages before protein enzymes were sufficiently encoded. All features of the hypotheses arise from models for tRNA, tRNAome, aaRS and genetic code evolution and related literature.