Homochirality may not be a prerequisite for protein structure and function: a computational model investigation of prebiotic peptides

On the primitive Earth, both Land D-amino acids would have been present. However, only L-amino acids are essential blocks to construct proteins in modern life. To study the relative stability of homochiral and heterochiral peptides, a variety of computational methods were employed. 10 prebiotic amino acids (Gly, Ala, Asp, Glu, Ile, Leu, Pro, Ser, Thr, and Val) were previously determined by multiple previous meteorite, spark discharge, and hydrothermal vent studies. We focused on what had been reported as primary early Earth polypeptide analogs: 1ARK, 1PPT, 1ZFI, and 2LZE. Tripeptide composed of only Asp, Ser, and Val exemplified that different positions (i.e., N-terminus, C-terminus, and middle) made a difference in minimal folding energy of peptides, while the classification of amino acid (hydrophobic, acidic, or hydroxylic) did not show significant difference. Hierarchical cluster analysis for dipeptides with all possible combinations of the proposed 10 prebiotic amino acids and their D-amino acid substituted derivatives generated five clusters. Prebiotic polypeptides were built up to test the significance of molecular fluctuations, secondary structure occupancies, and folding energy differences based on these clusters. Most interestingly, among 129 residues, mutation sensitivity profiles presented that the ratio of more stable to less stable to equally stable D-amino acids was about 1:1:1. In conclusion, some combinations of a mixture of Land D-amino acids can act as essential building blocks of life. Peptides with α-helices, long β-sheets, and long loops are usually less sensitive to D-amino acid replacements in comparison to short β-sheets.


INTRODUCTION
The scientific question we intended to address and explore here was whether homochirality is the natural biosignature of life or not. Homochirality means that the enantiomeric excess is approximately 100% in an enantiomer mixture. Within life systems, all protein-derived amino acids in organisms are in the Lconfiguration, and all monomers of polysaccharides are in the D-configuration, with the exception of few prokaryotes. 1 The other configuration of biomolecules usually causes damages to the health of life. For instance, although L-carnitine is useful in the human body, D-carnitine interrupts the normal transportation of long-chain acyl groups in fatty acids from the mitochondrial intermembranous space into the mitochondrial matrix. 2 It is crucial to investigate whether homochirality is necessary for the emergence of life on early Earth, and how primitive chemical blocks are built up for macromolecules and further for complex organisms. The evolution of biomolecules tends to minimize folding energy to stabilize their structures, a reflection of the second law of thermodynamics. However, the chemical selfreplicative reactions in biological systems, 3 such as homochirality, are far from the favorable thermodynamic equilibrium, since thermodynamic equilibrium prefers racemic mixture. 4 The origin of homochirality is not well understood and may be linked to early symmetry breaking at the beginning of the universe. According to parity violation, longitudinally polarized β-particles and parityviolating energy difference affect the packing energies of peptides at the nuclear and electroweak force levels. 5,6 The intrinsic chirality for atoms contributes to different enantiomeric energy and potentially autocatalysis. 7 Slight energy fluctuation could cause a "butterfly effect" during enantiomer interactions, and L-amino acids eventually won the natural selection in life systems. In addition, Carroll (2009) applied Shannon's information entropy, ( ) = − ∑ =1 2 , and tetrahedral conformation to explain the information capacity of biomolecules, and concluded that homochirality was a more efficient way to store and retrieve spatial information. 8 In the modern world, most natural peptides with D-amino acids are usually shorter than 20 amino acids 9 and D-amino acids play little role in biological systems, but they sometimes are beneficial to health. Evidence shows that D-serine and D-aspartate can modulate and mitigate neurodegradation, schizophrenic symptoms, epilepsy, ischemia, and amyotrophic lateral sclerosis. [10][11][12] D-arginine protects the central nervous system from neurotoxicity and avoids the onset of mental disorders without noticeable side effects. 13 Although the function of D-amino acid oxidase is to prevent the harmful accumulation of D-amino acids, 14 knockouts of D-amino acids oxidase, which inhibits the degradation of D-amino acids, could improve spatial cognition, odor cognition, and short-term spontaneous recognition memory performance for several days. 15 Moreover, various amino acid racemases that transform L-amino acids to D-amino acids exist in nature. 16 As above, it is rational to admit that peptides with D-amino acids are potential to replace this homochiral amino acid living world.
Curiosity about the hypothesis of the first prebiotic amino acid chain expands after 1952's Miller-Urey experiment. These early Earth spark discharge simulations consistently yielded 10 abundant racemic individual amino acids, including glycine (Gly), alanine (Ala), aspartic acid (Asp), glutamic acid (Glu), isoleucine (Ile), leucine (Leu), proline (Pro), serine (Ser), threonine (Thr), and valine (Val). [17][18][19][20][21] Among them, Gly is the only achiral amino acid, and the other 9 are chiral amino acids. A meta-analysis identically determined these 10 prebiotic amino acids based on ten previous studies of extraterrestrial comets or meteorites, hydrothermal vents, and spark discharge experiments. 22 All 9 chiral prebiotic amino acids have been observed in natural peptides. These natural amino acids were mostly discovered in short antibiotic or immunosuppressive peptides, such as gramicidin, actinomycin, valinomycin, tyrocidine, and cyclosporine. 1,2,4,9,16 Minervini et al. (2015) 23 proposed ten potential prebiotic polypeptides from the Protein Data Bank (PDB) in accordance with the proteinoid theory. 24,25 Among them, the SH3 domain in human nebulin (PDB ID: 1ARK), 26 avian pancreatic polypeptide (PDB ID: 1PPT), 27 and leech carboxypeptidase inhibitor (PDB ID: 1ZFI) 28 were chosen for this study, because they cover all primary secondary structures of interest: 1ARK is a component of fibrous protein that is characterized by short β-sheet; 1PPT is a globular peptide characterized by long α-helix; and 1ZFI is characterized by long β-sheet and short α-helix. Additionally, to investigate the random loop structure of primordial peptides, an RNA ligase -ligase 10C (PDB ID: 2LZE) derived from in vitro evolution was chosen as a representative. 29 In this study, we remodeled the prebiotic L-amino acids in targeted peptides into D-form and minimized the energy of new peptides. We analyzed and compared the structure and stability of each peptide and respective "D-mutated" derivatives in terms of different aspects, including secondary structure types, packing energy, B-factor (the fluctuation of root mean square deviation, RMSD), attractive force, repulsive force, salvation energy, electrostatic potential, hydrogen bonds (H-bonds), and disulfide geometry potential. Statistical analyses were conducted to explore the mutuality between these variables. Our results implicate that some combinations of a mixture of L-and D-amino acids can act as essential building blocks for life, and heterochiral peptides are feasible in biological systems beyond Earth.

Prebiotic amino acids and peptides
The 10 prebiotic amino acids focused on in this study were 6 amino acids with hydrophobic side chains (glycine, Gly, G; alanine, Ala, A; valine, Val, V; isoleucine, Ile, I; leucine, Leu, L; and proline, Pro, P), 2 acidic amino acids (aspartic acid, Asp, D; and glutamic acid, Glu, E), and 2 amino acids with the hydroxyl group on their side chain (serine, Ser, S; and threonine, Thr, T). 22 The 4 primordial polypeptides that cover all 10 prebiotic amino acids were 1ARK (60 residues), 1PPT (36 residues), 1ZFI (67 residues), and 2LZE (87 residues) ( Table 1 and Table 2). 23,29 Dipeptides with all possible combinations (in total 279 dipeptides) of the 10 prebiotic L-and D-amino acids were modeled in PyMOL. 30 Lastly, 48 tripeptides with all combinations of Asp (the abundant prebiotic acidic amino acid), Ser (the abundant amino acid with hydroxyl group in prebiotic nature), and Val (the side chain of which represents the average effect of hydrophobic prebiotic amino acid) 22 were modeled.

Computational modeling
Maestro 11 31 was used for the chirality transformations of 4 polypeptides and for the energy minimization to dipeptides only. Energy minimization and root mean square deviation (RMSD) calculations were performed using InteractiveROSETTA. 32 The occupancies of different secondary structures were measured in YASARA (Yet Another Scientific Artificial Reality Application). 33 Force field of the FoldX plugin 34 of YASARA was used for the calculation of the stability energy (ΔΔG), the change in Gibbs free energy (ΔG) of the product after mutation from the starting material during peptide folding. 35,36 Mutation sensitive profiles of peptides were generated in the MAESTROweb (Multi AgEnt STability pRedictiOn) tool. 37

Statistical analyses
Kruskal-Wallis independent sample tests, bivariate correlation analyses, and regression analyses were performed in IBM SPSS Statistics 25. 38 Hierarchical cluster analysis (using unweighted pair group method with arithmetic mean algorithm and Bray-Curtis similarity index) was performed to group 9 chiral prebiotic amino acids in PAST 3.25. 39
Herein, polypeptides substituted with one or more D-amino acids were designated using lowercase letters. For example, 1ark_dA is the D-mutant of the all L-amino acid parent polypeptide 1ARK that switched all L-Ala residues to D-Ala.
The strategy to stabilize D-amino acids in a sequence primarily constituted by L-amino acids is to minimize the χ-angle in peptide bonds and the placement of amino acid backbone for a minimal steric hindrance. Monte-Carlo algorithm was used in structure prediction program InteractiveROSETTA to test different least energy-driven rotational conformations to peptide chain. The 1ARK peptide was relatively unstable because only the D-mutant with three D-Glu (Figure 1Aiv), one D-Leu (Figure 1Avi), or one D-Ser ( Figure 1Aviii) was capable of maintaining the original packing structure; other 6 D-mutations all deconstructed the packed conformation (Figure 1Aii, Aiii, Av, Avii, Aix, and Ax). The reason of its instability might be caused by its dominant short β-sheets and the absence of helical secondary structure within its original all L-configuration structure.
For 1PPT peptide, no matter which species of amino acid residue was replaced, the dominant α-helix segment did not have a dramatic shape transformation, instead, only a small angular twisting at the end of helix or at the central body part (e.g., the bent α-helix barrel in 1ppt_dA, Figure 1Bii). The mutated residues not necessarily affect the shape of their proximal region, such as the D-mutations of D, I, L, T, and V ( Figure 1Biii, Bv, Bvi, Bix, and Bx) within long α-helical structure, while the D-mutation of A affected the original α-helical structure more apparently ( Figure 1Bii). Overall, the α-helix was a stable secondary structure to maintain the higher levels of structure.
1ZFI has five long anti-parallel β-sheets and one stable helix. However, compared to 1PPT, the helical structure within 1ZFI is much shorter. Long β-sheets in D-amino acid substituted 1ZFI could be divided into several short sheets to stabilize the functional structure (e.g., Figure 1Cix).
2LZE is a special primordial polypeptide for its high flexibility of two long loops: 20 different structural models can be found in PDB database, and their RMSD values are significantly different from each other. All D-amino acid substituted 2LZEs shared similar loop-turn-helix-loop structure with two obvious zinc arms (Figure 1Dii-Dx), indicating the maintenance of its function in these heterochiral amino acid sequences.
In all cases of the polypeptides examined, proline made a big difference to original structure because of its rigid ring structure. Taken all together, peptides with α-helices, long β-sheets, and long loops were less sensitive to D-amino acid replacements in comparison to those with short β-sheets.
In this research, the effect of D-mutations on stability was normalized by the following equation:

Normalized stability difference = Stability of D mutants − Stability without mutations Number of mutations
The minimization approach by optimizing the torsion energy predicts folding structures without concerning the function domains in original polypeptides. Here, we reserved the functional structure of polypeptides after D-mutations and determined the stability energy difference from their respective original all L-amino acid polypeptides. The minimization approach by optimizing the placement of original Cartesian coordinates can achieve this goal. The cartesian energy minimization is less flexible on angle constraint and torsion but predict the minimal folding energy while retaining the original tertiary structure. By Cartesian energy minimization, several D-amino acid substituted polypeptides had equivalent folding energies to their corresponding original all L-amino acid polypeptides. They are 1ark_dE, 1ark_dS, 1ppt_dA, 1ppt_dI, 1ppt_dS, 1zfi_dE, 1zfi_dS, 1zfi_dT, and 2lze_dD (Table 4).

Hydrogen bonding and hydrophobic interactions
In order to further investigate how the predicted structures of 1ARK, 1PPT, 1ZFI, and 2LZE were formed, hydrogen bonding and hydrophobic interactions that are essential to peptide conformation were analyzed in YASARA. Hydrophobic interactions from hydrocarbon chains universally participated in the intramolecular interactions between residues ( Figure 2).
1ARK polypeptide has 8 anti-parallel β-sheets connected by H-bonds. The tertiary structure of 1ARK is packed by multiple H-bonds between anti-parallel β-sheets and multiple hydrophobic interactions throughout its primary sequence (Figure 2Ai). Polypeptides 1ark_dD, 1ark_dE, 1ark_dL, and 1ark_dS had expanded β-sheets compared to 1ARK (Figure 2Aiii, Aiv, Avi, and Aviii). This finding suggests that D-mutations were possible to produce the long β-sheet motif. However, in 1ark_dA, 1ark_dI, 1ark_dP, 1ark_dT, and 1ark_dV, H-bonds only occurred at some of turning points of the loop structure because of their less packed tertiary structure (Figure 2Aii, Av, Avii, Aix, and Ax).
The α-helix of 1PPT is stabilized by abundant H-bonds. 1PPT is packed by hydrophobic interactions between its long α-helix and its N-terminal and C-terminal coils. The α-helix was not as easy to be expanded as β-sheet, because the number of turns in all D-amino acid substituted 1ppt mutants were always less than or equal to 1PPT. Readjustments of dominant α-helix, N-terminal loop, and C-terminal pseudo-helix of 1ppt were motivated by new positions of hydrophobic interactions (Figure 2Biv, Bvi, and Bx).
1ZFI is another stable polypeptide because the 4 disulfide bonds of 1ZFI can stabilize the relative locations of its five anti-parallel β-sheets and one α-helix, and thus its tertiary structure. Except 1zfi_dD and 1zfi_dP (Figure 2Ciii and Cvii), other 1zfi D-mutants were still rich in H-bonds between β-sheets and α-helix (Figure 2Cii, Civ, Cv, Cvi, Cviii, Cix, and Cx).
For the relaxed structure of 2lze D-mutants, the short 3-10 helix and short β-turns were preserved constantly by H-bonds (Figure 2Dii-Dx). 2LZE has relatively loose tertiary structure with a segment of linker and two arm-like coils, so it is always convenient to twist necessary peptide bonds and rebuild Hbonds between proximal residues.
These results indicate that α-helix, β-sheet, and β-turn were capable of stabilizing polypeptide folding, while random loop and 3-10 helix could destabilize the folded structure, which is consistent with our previous structure analyses.

Prebiotic polypeptides substituted with D-amino acid clusters
To investigate which D-amino acid cluster can stabilize the folding energy level or maintain the functional motif structure, we substituted each L-amino acid cluster with its counterpart cluster of Dconfiguration in 1ARK, 1PPT, and 1ZFI. The reason to exclude 2LZE was that 2LZE and its D-amino acid substituted mutants are too variable to determine their stability and secondary structure constitution, so the result would not be reliable.
Results demonstrated that D-amino acid mutations of the E cluster lowered 1ark's energy level by 11.4 kcal/mol per residue, D-mutations of the LS cluster lowered 1ark's energy level by 7.74 kcal/mol per residue, the combination of E and LS clusters lowered the energy level of 1ark by 2.10 kcal/mol per residue, but all other clusters or combinations of clusters enlarged the folding energy level of 1ark.
As for 1ppt, no D-mutation of any combinations of amino acid clusters could decrease the stability energies of this polypeptide.
D-mutations of the D cluster was able to decrease the folding energy level of 1zfi by 2.34 kcal/mol per residue. The LS cluster was able to stabilize 1zfi by 1.920 kcal/mol per residue, and the DE cluster combination was able to stabilize 1zfi by 1.335 kcal/mol per residue (Table 6). This result indicated that acidic amino acids (i.e. Asp and Glu) and Leu/Ser cluster were able to lower the folding energy level of polypeptides.
Besides the folding energy-based stability, the overall folded conformation is also important, because it reflects the functional structure of proteins. The folded conformation can be quantified by RMSD. Given RMSD values as displayed in Table 6, only the E cluster mutant was similar to the original 1ARK conformation.
Mutants of the D cluster, E cluster, IVT cluster, the combination of D and IVT clusters, the combination of E and IVT clusters, and the combination D, E, and IVT clusters were all close to the original 1PPT conformation.
Mutants of the D cluster, E cluster, LS cluster, IVT cluster, the combination of D and E clusters, the combination of D and LS clusters, the combination of D and IVT clusters, the combination of E and LS clusters, the combination of E and IVT clusters, the combination of D, E, and LS clusters, and the combination of D, LS, and IVT clusters were all similar to the original 1ZFI conformation.
Hitherto, the stability energy analyses determined how thermodynamically favorable the folded structure is, and RMSD values quantified how superimposable a D-mutant is with its functional parent all L-amino acid polypeptide. Now, we applied secondary structure occupancy analyses to understand reservable, extensible, or producible functional motifs. The 1ark mutants with D-mutations of D, E and IVT single clusters, APDE and APEIVT three cluster combinations, and APDEIVT and APELSIVT four cluster combinations contained noticeable occupancies of β-sheet and β-turn.
All 1ppt D-mutants had considerable amount of functional secondary structure, including helical structure and β-turn, regardless of what mutations it was subjected to.
The 1zfi mutants with D-mutations of AP, D, E and IVT single clusters, APLS, DIVT, ELS, EIVT and LSIVT two cluster combinations, APDLS, APELS, APEIVT, DLSIVT and ELSIVT three cluster combinations, and APDELS and APDEIVT four cluster combinations were occupied remarkably by βsheet and helical structure. Overall, D-enantiomers of APDEIVT, APEIVT, D, E, and IVT were potential replacements of their respective L-enantiomers.

Mutation sensitivity profiles of prebiotic polypeptides
The mutation sensitivity profile illustrates the change in Gibbs' free energy (ΔΔG) of a D-amino acid after being replaced by all 20 proteinogenic L-amino acids, so the higher the ΔΔG in mutation sensitivity profile is, the more stable the D-amino acid substituted polypeptide is. In the result of 1ark polypeptide, 9 (26.5%) of D-residues had higher ΔΔG than their counterpart L-residues, 13 (38.2%) of D-residues had lower ΔΔG than L-residues, and 12 (35.5%) of them were unchanged ( Figure 4A). In the result of 1ppt polypeptide, 3 (15.0%) of D-residues had higher ΔΔG than their corresponding L-residues, 10 (50.0%) of D-residues had lower ΔΔG than L-residues, and 7 (35.0%) of them were unchanged ( Figure 4B). D-amino acids that decrease ΔΔG dominate the primary sequences of 1ark and 1ppt, the polypeptides with short β-sheets, long α-helix, and short loop.

Synthetic D-peptides
Insights of prebiotic peptide foldability and D-amino acids are also provided by previous laboratorybased experiments. Pritsker et al. (2013) artificially synthesized HIV-1 gp41 protein composed of only Damino acids can fold properly 40 under salty conditions on early Earth. 41 Tugyi et al. (2005) turned two Nterminal and three C-terminal amino acids of a mucin glycoprotein (MUC2) epitope into their counterpart D-enantiomers, and found that the biotic function of MUC2 was maintained. 42 Furthermore, as inspired by a natural polymerase X from the African swine fever virus that could synthesize peptide chain by using racemic amino acids as the raw materials, Wang et al. (2016) successfully synthesized a D-amino acid polymerase that was functional to L-DNA strands. 43 Hence, the potential of D-amino acids in life systems is further supported.

CONCLUSION
D-Amino acids are possible to construct secondary structure with L-amino acids. D-mutations to Glu and Ser on 1ARK, Ala, Ile, and Ser on 1PPT, Glu, Ser, and Thr on 1ZFI, and Asp on 2LZE did not change the stability markedly in comparison to their corresponding original all L-amino acid polypeptides when using Cartesian energy minimization method. This result indicated that acidic amino acids (Glu and Asp), amino acids with hydroxyl group (Ser and Thr), and small hydrophobic amino acids (Ala and Ile) were possible to maintain the same structure and function as the current natural proteins. In addition, Asp, Glu, Leu, and Ser residues could also expand short β-sheets in 1ark. Peptides with α-helix and long β-sheet, especially those stabilized by abundant H-bonds within parallel or antiparallel β-sheet cluster, are usually less sensitive to D-amino acid replacements. D-mutations were possible to produce long β-sheet motif, while α-helix was not as easy to be expanded as β-sheet. Tertiary structure could be stabilized by abundant disulfide bonds, such as the four disulfide bonds on 1ZFI. Functional polypeptides characterized by dominant loop occupancy, such as 2LZE, in spite of the less stable and more mobile backbone, were much more flexible than polypeptides with observable α-helix or β-sheet, which made them easier to preserve their functional domains.
Some combinations of D-amino acid clusters could generate functional polypeptides with L-amino acids.
Based on the analyses about stability energy level, secondary structure occupancy, and RMSD of relative position, LS cluster was able to lower the folding energy level of polypeptides in general; D-amino acid substitutions by four clusters APDEIVT, three clusters APEIVT, or single cluster D, E, or IVT were potential alternatives of their counterpart L-amino acid clusters to retain secondary structure; all of D, E, LS, and IVT clusters were potential to generally retain overall folded structure even via torsion energy minimization method.
Additionally in the tripeptide trial, it displayed that position of residues played a role in stability determination, but the classification of residues did not make remarkable difference in stability. Dipeptide modeling disclosed that in short peptides, D-alanine and D-proline were conceivable to decrease the energy level after energy minimization. This computational modeling study implicate that homochirality may not be a prerequisite for functional protein structure. Namely, the origin of life is possible to occur prior to the origin of homochirality, and L-amino acids may not be the only option for complex organism biosignatures. This finding extends the definition of life and pinpoints the possibility of heterochirality in extraterrestrial life systems.      *, D-residue mutation increases the stability of whole polypeptide by more than 1 kilocalorie/mole (kcal/mol) but less than 2 kcal/mol. **, D-residue mutation increases the stability of whole polypeptide by less than 1 kcal/mol. ***, D-residue mutation decreases the stability of whole polypeptide. Table 5. Secondary structure occupancies, number of mutations, B-factors, and stability of 4 prebiotic polypeptides. Polypeptides substituted with one or more D-amino acids were designated using lowercase letters.  Table 6. Effects of D-mutations to residue clusters on normalized stability difference and RMSD by using torsion energy minimization approach compared with original all L-amino acid polypeptides (prebiotic polypeptides were 1ARK, 1PPT, and 1ZFI).  Table 7. Secondary structure occupancies of combined residue cluster D-mutants which were analogous to their corresponding original all L-amino acid polypeptides. Numbers and amino acid single letter codes after the underscore signify the number of the following residue that was transformed from L-to D-configuration.