Part I. Synthetases and Amidotransferases
Most of the compiled data is too complex to be conveniently presented in tabular form within the main body of this report; it will be relegated to the Supplementary Information section accessible at the reader’s request. Supplementary Table 1 lists the Archaea studied; it presents the currently accepted name, including strain designation, found within GTdb arranged in alphabetical order by genus. It also provides names according to NCBI, which is not authoritative, but many nonspecialists are more familiar with its nomenclature. GTdb and NCBI names are linked by a nonvarying NCBI ID descriptor even as taxonomic changes are recorded by newer versions of GTdb. Segregation into Halobacteriota, Methanobacteriota, Thermoproteota phyla are indicated; an asterisk (*) signifies species used in the sequence comparisons for which both aminoacyl tRNA synthetases and tRNA data are on-hand.
Supplementary Table 2 presents cell culture growth conditions (optimal plus range, if available) organized alphabetically by genus and species within each phylum. Information has been compiled for (1) temperature (℃); (2) pH (standard log units); (3) NaCl concentration (molar units); (4) O2 necessity. In a few cases where the literature provides specific quantities, pressure (atm) is included with the temperature information. Oxygen necessity turns out to be more complicated than a simple aerobic versus anaerobic dichotomy. Some organisms grow under either condition relatively equally depending upon the nutrients supplied in a culture medium, while others are essentially anaerobic but able to tolerate 1−5% O2 by volume and denominated facultative anaerobes.
In the early stages of information collection, individual articles were consulted; midway through the process,
Google introduced an experimental AI search engine summarizing values from the literature based on a generic query: “Name (genus plus species) cell culture growth conditions.” When presented, this summary was used without searching any linked articles. If more than one reference was uncovered, occurring when multiple species are associated with a single genus, or in rare cases if sources did
not agree quantitatively, both sets of data were included in the tabulation. At the other extreme, upon perusal of
Table S2, it will be observed some parametric values were not recorded. This was the case frequently for effects of salt on growth for Methanobacteriota and Thermoproteota, where less than 50% were found for these organisms.
Terms classifying organisms by level of preference for any given parameter, such as thermophile, acidophile, halophile, etc. vary widely in scope of coverage when quantified. For example, temperature(s) distinguishing a hyperthermophilic archaeon from one which is thermophilic is not universally employed. Since agreed-upon standards are lacking, these descriptors are avoided, and in their place, parsing into categories based on specific value ranges is arbitrarily imposed. The exception is a requirement for presence of O2 in culture to support cell growth. Aerobic and anaerobic are quantitative because mutually exclusive, and facultative has been defined above, as has the term both in this context.
Table 1 summarizes
Table S2 by counting the number of species in each category. When phylum distinctions are used in conjunction with values for each variable, it is seen that most Archaea are properly assigned:
39 of 43 Thermoproteota satisfy the hyperthermophile or thermophile designation usually employed for them;
34 of 38 Methanobacteriota are anaerobic, the natural consequence of being methane producers by reducing oxygen-containing substrates such as acetate or formate;
36 of 60 Halobacteriota demand sodium chloride at concentrations exceeding 1.0 M, and as shown in
Table S2 (not
Table 1), seven of the remaining twenty-four species for which numbers were compiled prefer salt amounts not to exceed 0.1 M (5.8 g/L);
24 of 43 Archaea from phylum Thermoproteota would also be construed as acidophiles;
18 of 69 halophiles (Halobacteriota) and 25 of 38 methanogens (Methanobacteriota) would also be defined as thermophiles;
92 of 150 Archaea, at minimum, are classified as anaerobic.
Table 1.
Cell Culture Growth Conditions Summary.
Table 1.
Cell Culture Growth Conditions Summary.
| |
All Archaea |
Halobacteriota |
|
Methanobacteriota |
Thermoproteota |
| Temperature |
|
N = 69 |
|
N = 38 |
N = 43 |
| > 80℃ |
52 |
5 |
|
16 |
31 |
| 50−79℃ |
30 |
13 |
|
9 |
8 |
| 25−49℃ |
66 |
50 |
|
13 |
3 |
| < 25℃ |
2 |
1 |
|
|
1 |
| |
|
|
|
|
|
| pH |
|
|
|
|
|
| > 9.0 |
7 |
7 |
|
|
|
| 6.0−9.0 |
99 |
54 |
|
26 |
19 |
| < 6.0 |
37 |
5 |
|
8 |
24 |
| |
|
|
|
|
|
| NaCl |
|
|
|
|
|
| > 3.0 M |
31 |
31 |
|
|
|
| 1.0−3.0 M |
5 |
5 |
|
|
|
| < 1.0 M |
59 |
24 |
|
17 |
18 |
| |
|
|
|
|
|
| Oxygen |
|
|
|
|
|
| Aerobic |
47 |
33 |
|
2 |
12 |
| Anaerobic |
92 |
35 |
|
34 |
23 |
| Both |
4 |
|
|
2 |
2 |
| Facultative |
7 |
1 |
|
|
6 |
The inevitable existence of taxonomic violators, no matter how few or numerous, is due solely to the fact that systematic classification utilizes multiple criteria (Gong et al., 2020). First and foremost is genotype grounded on primary sequence alignment for 16S rRNA, despite claims of being an unreliable indicator for halophilic Archaea, and possibly more generally (Boucher et al., 2004). Second is phenotype, usually determined by cell morphological shape based on width and diameter in addition to ether lipid composition in its membrane. In third place, at best, are growth conditions in culture. The assertion in the Introduction that there is no possible pristine taxonomy seems well-supported.
Supplementary Table 3 presents the raw data upon which all subsequent information is grounded: sequences of encoded synthetase and amidotransferase genes arranged alphabetically by genus + species, with separate tables covering each associated amino acid. They are written with constant-width courier font in order to conveniently enable residue numbering using the conventional arrangement with N-terminus on the extreme left and C-terminus on the extreme right. Where multiple genes are found for a single aaRS or amidotransferase, they are recorded on successive rows and labeled ‘1’, ‘2’, ‘3’. Multiple genes for any enzyme may contain strings with: (1) limited numbers of mutations (never more than eight); (2) shorter fragments of full-length sequences; (3) alternative and apparently unrelated full-length strings annotated by UniProtKB as that synthetase. The alternative SerRS2 version (Kim et al., 1998) is located immediately after SerRS, and amidotransferases with subunits A−E are presented after ValRS.
Supplementary Table 4 summarizes
Table S3 data by concisely supplying information for species (strain name, phylum), UniProtKB ID, type of synthetase or amidotransferase, sequence length in number of amino acids. In total, 3626 sequences are encoded by 150 Archaea.
Four synthetase sequences are missing from UniProtKB’s archive, and their absence confirmed by unsuccessfully searching NCBI’s database: GluRS for Thermoproteus tenax; IleRS for Saccharolobus islandicus; SerRS for Ignisphaera aggregans; ValRS for Natrinema limicola. In addition, a sequence for amidotransferase subunit A from Hyperthermus butylicus was not found. Possibilities for absence are: (1) not encoded in genomes; (2) not sequenced by those generating that organism’s other enzyme strings; (3) not annotated, or annotated incorrectly such that search requests failed to find them (Bastian et al., 2015).
Nine IleRS strings for Saccharolobus islandicus strains are available from UniProtKB and NCBI, but for unknown reasons the selected strain (Lassen #1) is not among them. Although the likelihood is high the missing sequence would be a match for at least one of the others available, it is not a certainty; caution dictated no substitute be used. There is no ready explanation for unavailability of the other enzymes beyond the generalized options expressed.
A synthetase initially thought absent from UniProtKB, HisRS from Caldisphaera lagunensis, led to discovery of a disturbing feature. Several months after the first download of HisRS genes, a new search was performed in order to confirm its exclusion because the database is updated continuously. On this occasion, an appropriate string was found, and its existence supported by one from the NCBI repository. These sequences differed though the amino acid compositions were identical: a thirty-six residue segment (LIVDSIKSLGFESGFSLRLNDRRLLSGIFEQELNIK) was located immediately after a longer block of residues (SPLAVYRIIDKLDKIGIDNVKKELLEQINNEEIVNKIIEVISLSGKPEEILENLYSKYGR) for UniProtKB, but immediately preceded it in the NCBI version (Supplementary Table 5).
Rather than consult still another source to attempt to resolve the discrepancy, Clustal Omega was employed to align the contrary sequences against all other Archaea included in this study. The result was unambiguous: the NCBI sequence fit with all other HisRS and a long series of gaps (− − − −) was added by Clustal’s alignment algorithm against the UniProtKB string;
Table S5 has details. As one-third partner in INSDC, NCBI is a primary repository for proteins, whereas UniProtKB is a secondary archive getting information from the Consortium (Goudey et al., 2022). How the sequence became rearranged is unknown; a study involving three representative bacteria found numerous disagreements between NCBI’s Gen Bank and UniProtKB’s swiss-prot databases (Karp et al., 2001).
According to UniProtKB, LeuRS for Cenarchaeum symbiosum encodes in two pieces of 373 and 578 amino acids. When these strings are included individually in a multi-sequence alignment with LeuRS from other species, their relative shortness (40% and 62%, respectively, of the average length of 933 amino acids) causes Clustal Omega to insert many gaps to compensate. It was observed that their combined length was near the average, and inspection made it obvious they really were one sequence: one piece contained HIGH tetrapeptide and the other KMSKS pentapeptide. A repeat multiple sequence alignment with a united C. symbiosum markedly improved the result: fewer gaps led to forty-five invariant amino acids. Whether the C. symbiosum genome contains LeuRS pieces in different genes, or this partitioning was an inadvertent error introduced somewhere in the chain of transmission between sequencing and archiving, is unclear. No other split enzymes were encountered.
Synthetases have one fundamental function: bind ATP, amino acid, cognate tRNA (not necessarily simultaneously or in that priority) in order to unite the second pair by ester formation between the amino acid’s carboxylate and the 2’-OH (or 3’-OH) of the terminal ribose in a given tRNA. Some synthetases display large length variation among its archaeal members, while others are tightly compacted in length range, as signified by standard deviation from the average.
Table 2 displays length ranges by enzyme type as well as the number of total sequences encoded for each. In
Table 2, ± values require division by three to obtain one standard deviation. If arranged by decreasing average length, and focusing only on aaRS, except for IleRS at the high end and TrpRS at the low end, all seven enzymes whose standard deviation is ≥ 40 residues are on the long-length side of the scale: longer length implies greater length variability. Of the seven, LeuRS, ValRS, MetRS, ArgRS are Class Ia synthetases, TrpRS is Class Ic, AlaRS and ThrRS are Class IIa enzymes (Chaliotis et al., 2017). Aside from ArgRS and ThrRS, the others are hydrophobic.
While there is no obvious explanation for extreme enzyme length range within a synthetase type, some variation in residue number between aaRS may be accounted for by addition of an editing domain to a core base region in order to reduce chances of misacylation (Beebe et al., 2004). This does not fully account for the observation: IleRS has an editing domain (Nureki et al., 1998; Fukunaga & Yokoyama, 2006), but a small standard deviation for amino acid number.
Unlike synthetases, the five amidotransferase subunits show little variety in residue number. This is consistent with the outcome anticipated, yet not attained, for synthetases because they also have single biochemical functions (Sheppard & Söll, 2008): (1) act as kinase to phosphorylate aspartyl-tRNAAsn or glutamyl-tRNAGln yielding O-phosphoaspartyl-tRNAAsn (B subunit) and O-phosphoglutamyl-tRNAGln (E subunit); (2) hydrolyze sidechain amides in asparagine (A subunit) or glutamine (D subunit) and deliver released ammonia molecules to activated carboxylates for substitution on their respective tRNA; (3) assist subunit A in function (C subunit).
In contrast to missing enzyme sequences, some found in UniProtKB were subsequently removed. Two amidotransferase subunit C strings are annotated as derived from computer prediction rather than by laboratory experiment. It is unknown why authentic information could not be obtained since there was no difficulty for other related enzymes encoded in Methanocaldococcus jannaschii or Methanothermobacter marburgensis genomes. PheRS−β from Halalkalicoccus jeotgali possessed nine C-terminal amino acids in the second of two otherwise identical genes that had to be a contaminant from another organism. The first, designated H. jeotgali1, continues to full length (567 residues), but H. jeotgali2 terminates at 418 residues (including the foreign component).
Beyond these three near-mandatory exclusions, another justification for removing sequences prior to finalizing multiple sequence alignments is predicated on Clustal Omega’s liberal insertion of gaps to maximize invariance in the identities of amino acids at fixed locations. String length becomes a critical factor because especially short or long sequences distort that evaluation. Average length and standard deviation from that average were calculated to assess which strings might be intentionally excluded from alignments (
Table 2). Statistically, 99.7% of datapoints plausibly construed as obeying normal Boltzmann distributions lie within three standard deviations of the average. It is reasonable that enzyme length should conform to a Boltzmann-type constraint, as there is no
a priori reason to believe they should exhibit bias such that not sorting themselves equally about the average could be anticipated. Therefore, lengths not within three standard deviations were considered for exclusion. It is obvious this approach lends itself to potential gross manipulation of results if not applied judiciously; a statistical justification is insufficient. Alignments were conducted with and without the ‘deviant’ strings. At least a 100% increase in number of invariant amino acids for the ‘without’ test run had to be found before deciding permanent removal from further analysis was methodologically proper.
There was still a third factor to ponder in decisions to remove sequences from alignment without undermining validity of the output. The phylogenetic tree created by Clustal Omega is complemented by a distance matrix calculation. This value runs 0 → 1 carried out to five decimal places, and supposedly signifies how close nearest neighbor strings in each tree are to each other from an evolutionary point of view based on shared residues at each position. Two decimal places were deemed sufficient for the purpose of choosing possible candidates to delete from a phylum-based set, so a tree distance above 0.30 was initially (and arbitrarily) selected as a potential cutoff. After tentative removal of outliers, alignments were run again to observe changes in the assessed number of invariant amino acids. It became clear, after trials with different enzymes and organisms, that 0.30 was too low a bar: alteration in invariant residue number was too minimal to justify such an extreme approach. Empirically, tree outliers ≥ 0.43 caused dramatic adjustment to invariant amino acid number, defined again as a minimum 100% gain; it became the smallest acceptable criterion for permanent removal of the offending string.
In illustration: the original SerRS alignment covered 126 species because the remaining twenty-four encode an alternate version of the synthetase designated SerRS2 (Kim et al., 1998). Three species, all from genus Methanosarcina, possess genes for both SerRS and SerRS2, meaning there were 153 strings total. In the initial comparison, twenty-three invariant amino acids were found when all SerRS species strings were aligned; an accompanying phylogenetic tree showed Methanosphaerula palustris as having a distance value of 0.50, signifying its sequence was very different from 125 others. A second comparison for these 125 increased the number of invariant residues 2.7-fold to sixty-one, Consequently, M. palustris was removed from the set.
It is recognized sequence removal decisions based upon outlier lengths or evolutionary distance measures could be legitimately questioned, with an associated charge of ‘cherry picking’ outcomes. There must exist convincing reasons to eliminate encoded sequences from alignments, and it is hoped the 100% improvement criterion provides a satisfactory rationale. Nonetheless, this procedural fix has an unsettling implication: researchers possess ability to manipulate uncovered data. Such control may alter conclusions, especially if subsequent effort is made to generalize results by assertions of universal conformity beyond the organisms studied. Even if this kind of extrapolation is not pursued, and it is not here so proposed, statements concerning invariant amino acids in a set of proteins alleged to provide a given biochemical function has consequences for assessment of mechanism utilized by that protein or enzyme among the collection of lifeforms investigated.
Rejected enzymes are shown in
Table 3, along with justification for removal and consequences for undertaking such action. Thirteen of 3626 sequences (0.36%) were not considered for further analysis of results.
The first-pass effort in determining invariant amino acid numbers often led to fewer than ten such residues when alignments were performed in a one-batch operation on all acceptable downloaded strings for each synthetase type. Sheppard & Söll (2008) demonstrated Crenarchaeota and Euryarchaeota formed isolated clusters in the resultant phylogenetic trees of archaeal amidotransferase sequences, and it was thought synthetases might perform similarly. Since, from the time their work was done, division into two phyla has been superseded, the new taxonomic categorization was applied.
Table 1 and
Table S2 data strongly support their analysis. Acidity was not used as a classifier because inclusion would overdetermine implied boundaries: acidophiles, neutrophiles, alkaliphiles are sprinkled throughout the three phyla. The first run of sequence comparisons on all archaeal sequences using Clustal Omega became a baseline reference for subsequent trial runs, which always pertained to Halobacteriota, Methanobacteriota, Thermoproteota.
In
Table 2, n > 150 indicate multiple genes. Its occurrence is due to: (1) mutations of one or more residues when sequences are internally compared; (2) one sequence being a fragment of another due to deletion of residues either at the N- or C- terminus; (3) UniProtKB-annotated alternative sequences. The third option dominates: distinction consists of diversity in amino acid composition and order. Allocation of multiple genes, and distribution as to enzyme type, sometimes occur in patterns related to genera:
two Nitrososphaera encode two of four CysRS multiples, and possess single genes for all others;
three Acidianus encode three of twenty-eight LeuRS multiples, and possess single genes for all others;
eight Methanosarcina encode eight of ten LysRS multiples, and possess single genes for all others;
two Methanocella encode two of fourteen ThrRS multiples, and possess single genes for all others;
six Pyrobaculum encode six of eleven TyrRS multiples. and possess single genes for all others.
Some species (not genera) are prone to multiple gene production for more than one enzyme type. Picrophilus oshimae is unique, thriving comfortably in hyperacidic (pH ≤ 2) surroundings (Schleper et al., 1996). It seems plausible its creation of multiple genes for eighteen synthetases plus amidotransferase subunit E ties directly to this environmental preference. Each repeated gene is a full-length copy with 1−8 residue mutations, but otherwise identical every time. Aside from these paired mutation strings, species from Methanobacteriota encode multiple genes once: Pyrococcus furiosus produced a mutation duplicate for AlaRS. The other two phyla are more prone to generate multiple genes either as fragments of encoded full-length genes or as strings of different composition, depending upon enzyme type. Chaliotis et al (2017) claimed synthetase duplicates and fragments are involved in tRNA-dependent amino acid biosynthesis, and it would be a worthwhile project to see if this proposal held over the variety of species examined here. A complete accounting (enzyme type, species, phylum, multiplicity category) is recorded in Supplementary Table 6.
Multiplicity in synthetases in conjunction with noticing a similar phenomenon in archaeal tRNA (Laibelman, 2022), led to an inquiry as to whether the same species were involved in generating these two molecular types crucial for the translation process. Comparing tables of relevant synthetase enzymes with equally pertinent tRNA isodecoders showed an absence of correlation. Aeropyrum camini encodes dual tRNAArg(GCG), but no multiple synthetases. Caldisphaera lagunensis contains nonidentical ThrRS, but no more than one tRNA for an isoacceptor. Halalkalicoccus jeotgali has two tRNAArg(UCG) and tRNAThr(UGU), but its multiple synthetase copy is for the PheRS−β subunit, and exists as a fragment, not a full-length molecule.
Illustration could continue, but the implication is clear: whatever causes replication of genes in Archaea, identically sequenced or not in each case, is not an organism’s global response to environmental stimuli. Causality for tRNA does not carry over to aaRS, or vice versa. Perhaps this outcome is obvious from the simple fact that multiplicity in any given species does not extend to every amino acid-connected molecule meaningful for translation, except for synthetases in P. oshimae, although this archaeon encodes nonidentical multiple copies for tRNAAsn(GUU) alone. With respect to production of nonidentical multiple gene copies in Archaea, tRNA and their correlative synthetases did not co-evolve. A lack of co-evolution was also the conclusion drawn from absence of correlation in phylogenetic trees for LeuRS and tRNALeu sequences (Andam et al., 2012).
Regardless of whether attention is focused on single genes, multiple genes, or combinations of both, quantitative results from sequence alignments for synthetase and amidotransferase enzymes encoded by Halobacteriota, Methanobacteriota, Thermoproteota are distinct in terms of number and identity of invariant amino acids.
Table 4 is unambiguous in its conclusiveness about quantity. Cells with arrows (→) indicate changes incurred when certain species are removed from alignment; it mirrors information from
Table 3.
Alignments for ‘All Archaea’ collections yield fewer invariant residues than obtains from aligning Halobacteriota, Methanobacteriota, Thermoproteota sets individually, and crucially, these quantities differ among the three phyla. This outcome supports distinction when cell culture growth factors are compared in
Table 1,
Table S2. It implies diverse environmental living conditions impact gene pools, which is not exactly an innovative conclusion. More emphatically, as ancient in evolutionary development as these enzymes are said to be (Woese et al., 2000; O
’Donoghue & Luthey-Schulten, 2003), nurture was instrumental in enhancing natural genetic divergence beyond the conventional random mutation channel.
Segregation by phylum did not lead to large numbers of invariant amino acids for LysRS, unlike what was found for all other types of synthetases and amidotransferases. This result is informative. It is the only aaRS proclaimed to exist in Class I and II versions (Ibba et al., 1997a), which would certainly influence the outcome if both classes were mixed indiscriminately in attempts at maximizing alignment of individual species. According to Levengood et al. (2004), almost all Archaea are Class I with a limited number of species being members of Class II.
At the time those statements were made, small numbers of archaeal species were known of either type. It has been difficult to uncover, given the intervening years and discovery of numerous organisms in remote or unconventional geographic areas, which additional Archaea should join the minority category beyond an original characterization as “few”. Class II definitely includes: (1)
Saccharolobus solfataricus (Ibba et al.
,1997b); (2)
Pyrobaculum aerofilum (Woese et al., 2000); (3−5)
Aciduliprofundum boonei, Archaeoglobus A sulfaticallidus, Thermogladius calderae (
https://www.aars.online); (6−13) all members of genus
Methanosarcina (Mahapatra et al., 2007). It cannot be assumed others within these genera share this property; absent proof, they could just as probably encode the Class I type.
Archaeoglobus fulgidus is specifically included within phylogenetic trees for Class I LysRS (Ambrogelly et al., 2002) in contrast to
Archaeoglobus A sulfaticallidus.
Table 4.
Numbers of Invariant Amino Acids in Synthetases and Amidotransferases.
Table 4.
Numbers of Invariant Amino Acids in Synthetases and Amidotransferases.
| Synthetase |
All Archaea |
|
Halobacteriota |
|
Methanobacteriota |
|
Thermoproteota |
| |
Invariant AA |
|
Invariant AA |
|
Invariant AA |
|
Invariant AA |
| AlaRS |
7 |
|
29 → 127 |
|
49 → 115 |
|
92 |
| ArgRS |
17 |
|
43 |
|
49 |
|
36 |
| AsnRS |
83 |
|
3 encoded |
|
137 |
|
98 |
| AspRS |
46 |
|
0 encoded |
|
101 |
|
52 |
| Asn_AspRS |
17 |
|
75 |
|
64 → 135 |
|
42 → 85 |
| CysRS |
24 |
|
36 |
|
71 |
|
47 |
| GluRS |
22 |
|
75 |
|
63 |
|
45 |
| GlyRS |
44 |
|
107 |
|
91 |
|
66 |
| HisRS |
22 |
|
38 |
|
43 |
|
36 |
| IleRS |
60 |
|
182 |
|
140 |
|
89 |
| LeuRS |
0 |
|
17 → 45 |
|
82 |
|
45 |
| LysRS |
1 |
|
4 |
|
4 |
|
2 |
| Class I |
0 |
|
|
|
|
|
|
| Class II |
93 |
|
|
|
|
|
|
| MetRS |
22 |
|
100 |
|
64 |
|
37 |
| PheRS—α subunit |
6 |
|
70 → 77 |
|
53 |
|
8 → 34 |
| PheRS—β subunit |
1 |
|
43 → 53 |
|
38 |
|
8 → 30 |
| ProRS |
19 |
|
30 |
|
78 |
|
45 |
| SerRS |
12 |
|
23 → 61 |
|
24 |
|
63 |
| SerRS2 |
59 |
|
208 |
|
141 |
|
0 encoded |
| ThrRS |
3 |
|
30 |
|
32 |
|
3 |
| TrpRS |
6 |
|
29 |
|
37 |
|
18 |
| TyrRS |
21 |
|
44 |
|
37 |
|
34 |
| ValRS |
46 |
|
119 |
|
110 |
|
85 |
| |
|
|
|
|
|
|
|
| Amidotransferase |
All Archaea |
|
Halobacteriota |
|
Methanobacteriota |
|
Thermoproteota |
| |
Invariant AA |
|
Invariant AA |
|
Invariant AA |
|
Invariant AA |
| Subunit A |
38 |
|
99 |
|
73 |
|
78 |
| Subunit B |
38 |
|
86 |
|
87 |
|
69 |
| Subunit C |
0 |
|
4 |
|
3 encoded |
|
2 |
| Subunit D |
42 |
|
76 |
|
59 |
|
57 |
| Subunit E |
57 |
|
125 |
|
84 |
|
89 |
| |
|
|
|
|
|
|
|
| Direct Comparison |
All Archaea |
|
Halobacteriota |
|
Methanobacteriota |
|
Thermoproteota |
| |
Invariant AA |
|
Invariant AA |
|
Invariant AA |
|
Invariant AA |
| AspRS + Asn_AspRS |
24 |
|
|
|
86 |
|
46 |
| SerRS + SerRS2 |
|
|
18 |
|
16 |
|
|
Alignment of these thirteen Class II LysRS found ninety-three invariant amino acids. However, alignment of the remaining 137 Archaea as if they are Class I members produced zero consensus residues, which proves the membership list for Class II is incomplete. At least some, perhaps many, isolated and sequenced over the last twenty years must be transferred from Class I to Class II. Movement should elevate invariant residue number for the former while not diminishing greatly the latter’s output. Discovering which species are properly designated Class II, such as possibly other Pyrobaculum and/or Saccaharolobus, will require detailed research.
The real issue of far more importance than numbers visualized in
Table 4 is the identities of those invariant residues. Since Clustal Omega adds gap spaces as frequently as the algorithm deems necessary in order to maximize conserved amino acids at each position, relative location within a sequence is lost. Residues adjacent in one sequence may become disconnected after the algorithm aligns multiple strings. A partial solution eliminates mutable amino acids as well as gap spaces, leaving only invariant residues in the order in which they were originally placed as a way to express the consensus outcome for a single phylum. It is desirable, however, to be able to compare consensus results across phyla in order to perceive a more generalized invariance. This approach is
not equivalent to the ‘All Archaea’ alignment because the number of gap spaces introduced by Clustal Omega depends directly on the number of sequences it is trying to align simultaneously. It is for this reason
Table 4 shows higher totals for the separate phyla than for ‘All Archaea’.
Using small clusters of those remaining amino acids that are still adjacent (as few as two, as many as seven) in consensus sequences for each phylum permits alignment across all phyla to be performed manually. The resulting condensed format still eliminates all absolute location information (which is why this approach is only a partial solution), but makes the identities of those conserved easier to determine when large numbers of strings are compared at one time. Moreover, the loss of absolute position data is not as great a liability as might initially appear: folded enzymes frequently unite separated regions viewed one-dimensionally into a close encounter for three-dimensionally functional molecules. In illustration, 152 histidyl-synthetase sequences containing 303−470 amino acids in each string (
Table 2) generates consensus condensed versions for the three phyla as shown (Halobacteriota on the first line, Methanobacteriota on the second line, Thermoproteota on the third line):
| HisRS H/M/T |
| 000000000111111111122222222223333333333444444444455555555 |
| 123456789012345678901234567890123456789012345678901234567 |
| EPE SG FDRPETRPR E QGRRFQGEDKRGYYGVF GGG Y G |
| GRD PEL G FDRP TRPRYEEPQGRR QGED RGYY VFEGGG Y GGRGQ |
| D E K GDTR P R EPQ RRFQG DKRGYYG EGGGRYDLGGR |
Some residues are invariant in one or two phyla, highlighting the fact they cannot be treated as a single large all-archaeal enzyme. Where three phyla display invariance, then universality is functionally significant. For histidine synthetase, they are: EcH7, GcH11, PcH18, RcH23, QcH28, RcH30, RcH31, QcH33, GcH34, DcH36, RcH38, GcH39, YcH40, YcH41, GcH46, GcH47, GcH48, YcH50, GcH53, with one-letter symbols for each amino acid followed in superscript by a lowercase ‘c’ for consensus plus (in this case) capital ‘H’ signifying HisRS plus condensed version position number.
The compacted consensus format can be readily extended to compare invariant residues across multiple enzyme types. For example, ArgRS and CysRS are Class I synthetases, and the literature alleges that all synthetases in this group display invariant HIGH and KMSKS peptides within their full sequence (Eriani et al., 1990). Ten years later, this statement was repeated in the review by Woese et al., (2000) and twenty years after that, still adhered to in an updated review penned by Gomez & Ibba (2020). If this thesis is intended literally, it is patently false; if allowance is made for mutation in one or more positions, then interpretive expansion weakens any impact the proposition possesses. The consensus condensed version for archaeal phyla proves the belief invalid.
| ArgRS (H/M/T) |
| 00000000011111111112222222222333333333344444444445555555555666666666677777777778888 |
| 12345678901234567890123456789012345678901234567890123456789012345678901234567890123 |
| PN N PHGRN D GDGQ Y E EY RD A LP G M TRG A F QYRILA NFYVRLGM |
| D NEHTSNPPHGRNGDR DGQ YE E R DGT YDYHK G G S RG RKFWLF YR F Y R G |
| NE SNP HGRNG D QKD Y ES R G YDY QQHYV M RGD YR N Y G |
| CysRS (H/M/T) |
| 000000000111111111122222222223333333333444444444455555555556666666666777777777788888 |
| 123456789012345678901234567890123456789012345678901234567890123456789012345678901234 |
| CG HG R DLG NTD Y VYGL DFL P PW D GG D HH WH KMS SNR |
| CGT DHGHRFD VNTDDDKII Y YF GLKDF WKWSPGGRPWHIEY DHGGGDLFPHHEQ WHGKMSKSNR FRDIRLD |
| CGTYDHGH D NTDD K LAYGY G KD W G PWH G HGG L PHHE AWH KM KSNRYD |
Concentrating on ArgRS and CysRS, of the six sequences, HxGy appears (HcR12, GcR13, HcC6, GcC7) in all, only two possess a second unchanging histidine (HcC8), and none contain conserved isoleucine. When looking at individual rather than consensus sequences, HxGy tetrapeptide may be expanded to show x is chosen from I / L / M / V and y from A / H / R / S. These permutations probably represent distinctions without biological significance for enzyme function, but that fact does not contravene repudiation of the HIGH invariance dogma. KMSKS is recognizable within CysRS (condensed residues 69−73), but ArgRS only hints at its existence (McR56, ScR57). An unexpected invariant amino acid (RcR59) in place of the second serine should be noted because it does not match expectations from E. coli, upon which the original claim was staked. In contrast to HIGH, KMSKS pentapeptide is part of a loop region claimed to be critical for stabilizing the transition state in aaRS/aa/ATP complexes leading to acylamino adenylate formation (First & Fersht, 1995). Its omission in ArgRS is relevant.
The remaining Class I synthetases, except for LysRS for which no consensus residues were found (
Table 4), also display deviance from the ‘universal’ pattern:
no consensus sequence encodes HIGH in full, and
no consensus sequence except the CysRS Methanobacteriota set just shown encodes KMSKS in full. This failure encompasses 3000 archaeal aaRS (3626 − 626 amidotransferases;
Table 2, column ’n’), A full account by phylum is presented in Supplementary Table 7. That these Archaea differ in this regard not only from
E. coli but amongst themselves is a prime example of
adaptive radiation substantiated by the breadth of physical characteristics displayed in
Table 1.
With respect to Class II synthetases, Carter’s review (Carter Jr, 1993) mentioned three patterns extrapolated from a limited amount of data, yet declared characteristic for these enzymes. All information derives from E. coli primary sequences and/or crystal structures of bacterial complexes for aaRS plus translation-relevant ligands. The shorthand used below comes from Carter: + means charged (Arg, Asp, Glu, His, Lys), φ represents hydrophobic (Ile, Leu, Met, Phe, Trp, Tyr, Val), x stands for any other (Ala, Asn, Cys, Gln, Gly, Pro, Ser, Thr).
motif 1: +G(F/Y)xx(V/L/I)xxPφφ → Permutations = 5×1×2×8×8×3×8×8×1×7×7 = 6,021,120
motif 2: +φφxφxxxFRxE → Permutations = 5×7×7×8×7×8×8×8×1×1×8×1 = 56,197,120
motif 3: φGφGφGφφERφφφφ → Permutations = 7×1×7×1×7×1×7×7×1×1×7×7×7×7 = 40,353,607
To give some perspective, according to Internet sources, the most common full name in the world is Zhang Wei possessed by ~290,000 Chinese; some version of Muhammad (variable spelling) is said to be the most universal given name with an estimated 150 million people using it; those of Chinese heritage also claim the most common surnames with Li/Lee or Zhang possessed by over 100 million individuals. If all 290,000 Zhang Wei (much less all Muhammad or Li/Lee) displayed identical signatures, would they be presumed to possess the same behavioral traits?
The key question, of course, is: how do differences in sequence affect translation? With respect to supposed Class II synthetase motifs, could possible permutations for each pattern be alleged to produce aminoacylation kinetics values for k
cat and/or K
M sufficiently close among themselves to declare plausible both overall functional and detailed mechanistic interchangeability? The question, intended seriously, is actually rhetorical since
Table S7 makes the issue moot: consensus sequences from three archaeal phyla do not adhere to these suggested patterns. In many cases, no motifs can be discerned at all. If motif 2 is abbreviated to FRxE, it makes the most convincing case because it or some version definitively appears nineteen times out of thirty-three (eleven aaRS × three phyla). Motif 3 appears seventeen times, if license is taken on how variable GxGxGxxER might be realized.
Amino acids in proteins are invariant across many species for one or more reasons: (1) induce secondary or tertiary structure for the molecule in isolation; (2) stabilize a bioactive 3D conformation; (3) facilitate function by direct interaction with other agents in vivo. Phylum-based differentials in identity of conserved residues might be insignificant artifacts. Alternatively, they lead to variation in binding affinity for tRNA, amino acid charged to its cognate tRNA, or ATP needed to facilitate aminoacylation, thereby directly affecting catalytic rates for that organism to survive and prosper in its natural environment. It is logical to hypothesize that amino acids found invariant in all three phyla can be considered candidates for all three options, with particular emphasis on the precise role outlined for translation-related processes. In contrast, perhaps unchanging residues in ≤ 2 phyla are relegated to acting in those first two capacities developed specifically for adaptation to their internal biochemistry and native habitat.
Variation in conserved amino acids among Halobacteriota, Methanobacteriota, Thermoproteota is detailed in Supplementary Table 8 using the condensed format. A count for number of unchanging residues in both synthetases and amidotransferases found in all three phyla, as opposed to just one or two, yields the data in Supplementary Table 9; 838 residues total. The numbers in parentheses indicate their distribution:
high frequency → Gly (149), Arg (114), Pro (85), Glu (77), Asp (68)
intermediate frequency → Tyr (47), His (38), Gln (33)
low frequency → Phe (29), Ser (28), Trp (27), Thr (26), Lys (24), Leu (21), Asn (20)
rarely → Cys (18), Ala (13), Val (10), Met (8), Ile (3)
Although any residue can interact with added substrates through hydrogen bonding activity using backbone NH or C=O functionality (Kaiser et al., 2018), glycine has a unique attribute: it can extend a chain to enhance positioning of other residues without incurring spatial consequences demanded by fitting a sidechain into an otherwise crowded region. Arginine as a source of positive charge to counter negative charge on ATP phosphate or incoming amino acid carboxylate has a crucial advantage over similarly basic lysine, namely multiple hydrogen-bonding capability due to its guanidinium moiety. Proline, like glycine, is unique: it forces a protein chain to alter spatial direction leading to preferred final 3D shape. The two acids serve roles opposite to that of arginine by opposing otherwise unbalanced positive charge incurred by addition of externally-sourced metal cations as well as by producing salt bridges with indigenous Arg, His, Lys to stabilize bioactive conformations.
Salt bridge establishment is critically important for sodium chloride-loving halophiles due to the presence of an overabundance of cations, as well as to thermophiles because thermodynamic stabilization via ionic bonds is much stronger energetically than hydrogen bonding, meaning they are more resistant to temperature stresses. Consequently, when halophiles and thermophiles are also acidophiles (Tables 1, S2), successful salt bridge formation is substantively mitigated, if not eliminated altogether, because sidechain carboxylates in aspartic and glutamic aids are protonated to their conjugate forms. Competing effects are complications for aminoacylation kinetics not usually addressed in conventional experiments (Francklyn et al., 2008).
Among rare invariant amino acids common to these Archaea, cysteine is the most intriguing: like proline, it can potentially have a major impact on overall protein shape, but it requires the presence of a suitably-positioned cysteine partner to form an internal disulfide bond. Using
Table S8, one can evaluate whether an even number of cysteines are present in consensus sequences for any studied enzyme. This, of course, does not prove existence of a disulfide linkage; it simply demonstrates that the first criterion (even number) has been met.
In principle,
Table S8 can be used to generate a
Table S9 analog for invariant residues encoded in only one or two of the three phyla, but this has not been done. Experimental structure/function investigations are needed to determine their exact role; how it differs from their all-phyla cousins; why they are conserved in some, but not all, Archaea. One speculative idea: identity differentials for synthetases might enable other reagents to bind in order to produce pre-translationally modified amino acids subsequently incorporated into proteins during translation (Laibelman, 2024).
In combination, Tables 4, S8, S9 convey a problem: some aaRS, despite producing reasonable quantities of invariant amino acids in each separate phylum (
Table 4), yield unacceptably low consensus numbers when integrating these results (Tables S8, S9). The issue was raised with an earlier discussion referencing LysRS, but single digit values are also obtained for LeuRS (eight consensus invariants across all three phyla), ThrRS (two), TrpRS (seven).
Table S6 shows these four encode the most multiple genes possessing compositions and sequences different from single gene Archaea within the same phylum. This connection is not a coincidence: heterogeneity in strings necessarily implies fewer invariant residues upon alignment trials.
There are reports of divergent gene forms for LeuRS within Halobacteriota (Andam et al., 2012; Fang et al., 2014; Weitzel et al., 2020), and of wider-ranging taxonomic scope for archaeal ThrRS (Beebe et al., 2004; Korencic et al., 2004). Weitzel et al. stated that one version lacked tRNA aminoacylation activity, yet could still bind it and produce leucyl adenylate. They speculated the activated tRNA might be useful in chemical modification of other amino acids. As possible precedent for this suggestion, Wan et al. (2014) showed the joining of two lysine amino acids were responsible for the biosynthesis of pyrrolysine. Crystal structures of ThrRS from several archaeal organisms with two versions show one form possessing catalytic activity for aminoacylation while the other is used exclusively to edit tRNA erroneously charged with noncognate amino acids (Shimizu et al., 2009).
Organisms specified in these various publications for multiple forms of LeuRS and ThrRS were subjected to alignment tests, with high numbers of unchanging amino acids obtained for the former, but lower for the latter to the point where practically no invariant residues were produced for Thermoproteota ThrRS strings (
Table 4). Regardless of alignment results, the problem is the same as for lysyl-synthetase: highlighted species in these documents do not cover anywhere close to the full complement of Archaea known and characterized in culture, and for whom sequences are now available from database archives. Without new research, analysis of invariant residues within and between phyla for these aaRS will limit complete understanding of the effects of gene multiplicity on proofreading function or aminoacylation mechanism(s).
Perusal of
Table 4 also reveals a paucity of unchanging residues for amidotransferase subunit C (≤ 4). Unlike synthetases, this is not a troublesome outcome given the brevity in length (71−111 amino acids) and its functional role rendering assistance to subunit A in producing and/or transporting ammonia (Nakamura et al., 2006). As long as a subunit C sequence permits binding to a related subunit A molecule at a minimally effective level evoking successful accomplishment of their joint goal, then a genetic match to other C-type subunits is irrelevant. The same remark could theoretically be uttered for synthetases, i.e., only a match to its cognate tRNA is vital for function, but the fact that all aaRS must also bind identical ATP molecules means there must be greater sequence and 3D structure held in common across all types.
Table 4 contains a category called
direct comparison, under which are found two sets of alignment trials. The first establishes commonalities and differences between a prototypical selective AspRS version and its nondiscriminatory counterpart Asn_AspRS; the second contrasts conventional seryl-synthetase with alternate SerRS2. Although the latter possesses just twenty-seven member species for certain (
Table 2), it may be viewed as a representative analog of what is sought for diverse leucyl, lysyl, threonyl, trytophanyl synthetase types because none have, as yet, twenty-seven known archaeal organisms in their membership lists.
There are noteworthy observations (
Table S3) to make before direct comparison between AspRS and Asn_AspRS is undertaken:
no Halobacteriota encode AspRS; all utilize the nondiscriminatory gene. Justification for this fact is not obvious: why should preference for high salt concentration compromise binding selectivity between aspartic acid and asparagine for charging cognate tRNA?
if species possess genes for AsnRS and AspRS, they do not need nondiscriminatory Asn_AspRS enzyme. Korarchaeum cryptofilum, Nitrososphaera gargensis, Nitrososphaera viennensis apparently overcompensate by encoding the nonselective type as well.
the logical corollary to the second note should hold: if genomes lack both AsnRS and AspRS, they must utilize Asn_AspRS. This thesis does in fact apply to all Archaea studied.
absence of AsnRS should automatically cause genomes to include a gene for nondiscriminatory Asn_AspRS if possessing AspRS. This is valid for Cenarchaeum symbiosum, Methanomassiliicoccus A intestinalis, Methanomethylophilus alvus, Nitrosopumilus maritimus, but not for Aeropyrum camini, Hyperthermus butylicus, Pyrolobus fumarii. How they are able to charge tRNAAsn is a mystery.
absence of AspRS should automatically cause genomes to include a gene for nondiscriminatory Asn_AspRS if possessing AsnRS. This is valid for Methanocella conradii, Methanocella paludicola, Methanocella. A arvoryzae.
To directly compare all archaeal sequences from AspRS and Asn_AspRS, several steps had to be performed. First, separate alignments were conducted for each synthetase. From Table 3, strings from M. A intestinalis,
M. alvus, K. cryptofilum were removed for cause from Asn_AspRS, and from the first note, only two alignments are possible for AspRS. Second, all gaps inserted by Clustal Omega were eliminated, as were nonconserved residues; the resulting condensed format left five invariable residue strings. Third, the pair from AspRS (Methanobacteriota, Thermoproteota) were aligned manually, as were those from the three phyla for Asn_AspRS; these representations can be viewed in Table S8. Fourth, this information was reorganized by relating conserved Methanobacteriota sequences in condensed format for the two aaRS to each other, and performing the same action for the corresponding Thermoproteota strings. Fifth, these sets were manually realigned again in order to identify common invariant amino acids in each phylum. Sixth, these sets of consensus residues were restructured into the condensed format for easy depiction, which is shown below. All stages of the direct comparison process are contained in Supplementary Table 10.
| AspRS + Asn_AspRS consensus M/T |
| 00000000011111111112222222222333333333344444444445555555555666666666677777777778888888888999 |
| 12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012 |
| GWFRDGQ EGELPVTRRDRF PKEGGL FYFALQSPQLYKGEFRAETRHESDEY GDPP FYFDLESGRHLFYAFGM PHGGRNREFPRDRP |
| GW G GE P RFTP E G A Y ALQ PQ R ET HE E D D FFY DLE GR L G PPHGGR RD P |
As anticipated based on function, many invariants are found, with more in methanogens than in (hyper)thermophiles. The final condensed format reveals forty of ninety-two are conserved in both types of synthetases. Class II’s motif 2 FRxE is visible as Rc44 and Ec46, while motif 3 GxGxGxxER can be seen as Gc81, Gc82, Rc83. How invariant residues found in only one phylum relate to kinetics, thermodynamics, or mechanism of aminoacylation has not been explored, but would make an informative study
As anticipated based on function, many invariants are found, with more in methanogens than in (hyper)thermophiles. The final condensed format reveals forty of ninety-two are conserved in both types of synthetases. Class II’s motif 2 FRxE is visible as Rc44 and Ec46, while motif 3 GxGxGxxER can be seen as Gc81, Gc82, Rc83. How invariant residues found in only one phylum relate to kinetics, thermodynamics, or mechanism of aminoacylation has not been explored, but would make an informative study.
From a different perspective, one would suppose all organisms using Asn_AspRS would encode amidotransferase subunits A−C in order to produce asparaginyl-tRNA
Asn.
Table 2’s ’n’ column indicates numbers of sequences = 113, 115, 117, 92 for Asn_AspRS, subunit A, subunit B, subunit C, respectively. It is vital to recognize that encodings do not equal species: multiple nonselective synthetase genes exist for two species and a duplicate gene for subunit B is present in one species (
Table S6). Reflecting this state, the actual number of species involved are 111, 115, 116, 92, respectively.
The notes for direct comparison between AspRS and Asn_AspRS mentioned
K. cryptofilum, N. gargensis, N. viennensis contain copies of the extraneous nondiscriminatory synthetase even though it is not needed because all three also possess selective versions (AsnRS, AspRS). It turns out that the two
Nitrososphaera species retained genes for amidotransferase subunits A−C, presumably fully functional, as well, but
K. cryptofilum no longer contains associated subunits, so its Asn_AspRS sequence is not viable. The last point is moot, however, because
Table 3 shows that same Asn_AspRS gene product was removed from alignment analysis due to a high level of difference in composition from other synthetase sequences of that type, as indicated by the phylogenetic tree generated when it was included. The count is now 110, 115, 116, 92, respectively.
Hyperthermus butylicus is missing a subunit A gene according to UniProtKB. Along with A. camini and P. fumarii, these three lack selective AsnRS or indiscriminate Asn_AspRS as well, and “how they are able to charge tRNAAsn is a mystery” has also been expressed. That riddle can now be both deepened and somewhat clarified: H. butylicus possesses genes for subunits B and C, while subunits A−C are found in A. camini and P. fumarii. Thus, they have the ingredients to form asparaginyl-tRNAAsn without the requisite synthetase to catalyze it, or so it seems. The count is 110, 113, 113, 89, respectively
Just as K. cryptofilum, N. gargensis, N. viennensis do not need Asn_AspRS or amidotransferases because they possess selective AsnRS and AspRS, the same is the case for Acidolobus saccharovorans, Caldisphaera lagunensis, Fervidicoccus fontis. Genes for subunits A and B are in their genomes, but not those for Asn_AspRS or subunit C. Perhaps these sequences constitute remnants from a past evolutionary time period when they were necessary, but, though still present, are no longer functionally expressed. The count is now 110, 110, 110, 89.
An appeal to gene loss also reconciles the remaining gap of twenty-one species lacking subunit C while possessing the other components. The claimed purpose for subunit C is to assist subunit A in NH3 transfer from asparagine to aspartyl-tRNAAsn (Nakamura et al., 2006; Sheppard & Söll, 2008). Subunit C attaches to subunit A and has no direct interaction with subunit B or tRNA (Curnow et al., 1997). Except for Thermoproteota Aeropyrum pernix, Cenarchaeum symbiosum, Ignisphaera aggregans, the remaining eighteen devoid of subunit C are all from Methanobacteriota phylum. Such a large number suggests some Archaea have evolved to a state for which they do not need subunit C. Mechanistic details for ammonia transfer in Methanobacteriota may have diverged from what they are in Halobacteriota.
With respect to amidotransferase subunits, without regard for participation by synthetase, attempts to uncover invariant residues in alignments between subunits A and D, or subunits B and E, met with failure. Despite near-identical biochemistry—amide hydrolysis of asparagine and glutamine, respectively, for the first pair; phosphorylation of sidechain carboxylates in aspartate and glutamate, respectively, for the second—there appeared to be no common conserved amino acids in the same relative chain position. This held true for all three phyla. It seems unlikely that difference in substrate (Asp versus Glu) on which these chemical transformations are intended is sufficient to cause thoroughgoing alteration in subunit sequences along their entire length.
Validating this hypothesis of similarity in some respect, crystal structures depict a very different situation. Subunits D and E from Pyrococcus abyssi co-crystallized as a heterotetramer of composition α2β2 (Schmitt et al., 2005). When folded into an enzymatically active state, residues surrounding the region seen to bind ATP and the terminus of Glu-tRNAGln (subunit E) or Asp-tRNAAsn (subunit B) are the same in 3D space though distantly placed in their respective primary sequences: in other words, structural conservation without sequence homology.
In contrast to this lengthy exposition, exploring the relationship between SerRS and SerRS2 holds no surprises (
Table S10). Seryl-synthetase sequences, minus
Methanosphaerula palustris (
Table 3), and SerRS2 were aligned. Halobacteriota and Methanobacteriota phyla only are in play, since SerRS2 does not encode strings from any species belonging to Thermoproteota. As shown in
Table 4, eighteen residues in the first phylum and sixteen in the second are invariant. This outcome was supported by the accompanying phylogenetic tree displaying separate clusters for these two enzymes with no overlap. It is observed that species from genus
Methanosarcina lack uniformity in their genomes:
genes for SerRS and SerRS2 → M. barkeri B, M. horonobensis, M. vacuolate;
gene for SerRS only → M. acetivorans, M. lacustris, M. mazei, M. siciliae;
gene for SerRS2 only → M. thermophila.
Why this distribution exists in a single genus is unknown; the sole exceptional cell culture growth factor is a ten-degree higher optimal temperature parameter for
M. thermophila (
Table S2). Using Clustal Omega, multi-sequence comparison leads, in condensed format notation, to finding nine amino acids in common between the phyla: P
c4, P
c7, G
c12, R
c13, F
c15, E
c16, S
c23, R
c24, P
c25. This outcome contrasts greatly to that derived from a direct comparison between selective AspRS and nondiscriminatory Asn_AspRS where numerous invariant amino acids are recorded across archaeal phyla.
| SerRS + SerRS2 consensus H/M |
| 0000000001111111111222222 |
| 1234567890123456789012345 |
| P PTEP HGR FEG DEPESRP |
| GLP PREG GRVFE Y SRP |