In Silico Phylogenetic and Structural Analysis of Plant Damps

In plants and animals, endogenous biological molecules, termed damage-associated molecular patterns (DAMPs) or alarmins, are released by damaged, stressed or dying cells following abiotic stress such as radiation and drought stress. In turn, a cascade of downstream signaling events is initiated leading to the up-regulation of defense-related genes. In the present study, in an effort to investigate the conservation status of the molecular mechanisms implicated in the danger signaling, thorough in silico phylogenetic and structural analyses of the effector biomolecules were performed in taxonomically diverse plant species. On the basis of our results, the defense mechanisms appear to be largely conserved within the plant kingdom. Of note is our finding that the sequence and/or function of several components of these mechanisms were found to be conserved in animals, as well.


Introduction
Death-, danger-or damage-associated molecular patterns (DAMPs) are of biological origin and are considered the major immunogenic mediators released passively by damaged, stressed or dying cells, including tumor cells targeted by radiation or chemotherapy [1].DAMPs are host cell-derived, as opposed to foreign (non-self) molecules termed Microbe-Associated Molecular Patterns (MAMPs) or Pathogen-Associated Molecular Patterns (PAMPs) [2].In a seminal study on this field by Matzinger (1994), it was suggested that the immune system does not actually distinguish between the self and non-self but rather detects "danger" through a series of positive and negative signals derived from damaged or stressed tissues mediated by DAMPs [3,4].
DAMPs are perceived by pattern recognition receptors (PRRs), such as plasma membranelocalized receptors, thereby leading to a cascade of events including cytoplasmic Ca 2+ elevation, depolarization of the cell membrane, production of reactive oxygen species (ROS), the transient phosphorylation of mitogen activated protein kinases (MAPKs), and the transcriptional upregulation of defense-related genes.
Plants have evolved mechanisms of innate immunity against detrimental pathogenic microorganisms and herbivorous animals.As mentioned above, DAMPs or alarmins i.e. biomolecules released by stressed cells, in many ways share similarities between plants and animals.For example, the recently discovered Arabidopsis HMGB3, is the counterpart of the pivotal animal DAMP HMGB1 [2].
Plants' immune response against intruders is regulated mainly by two antagonistic defense signaling pathways: i) the salicylic acid (SA)-mediated signal transduction pathway elicited by biotrophic and hemiobiotrophic pathogens and ii) the octadecanoid signaling pathway with the key hormone jasmomic acid (JA) induced by heterotrophic pathogens and herbivores [5][6][7].Of note, the oxidized lipids that participate in octadecanoid signaling are the plant functional equivalents of the mammalian oxidized phospholipids [8].
To date, there is still a knowledge gap concerning the evolutionary origin of the different DAMPs identified in the plant kingdom and how this valuable knowledge can be projected to more complex organisms, and especially animal cells.In this study, we have made an effort to assess the conservation of the defense mechanisms by conducting phylogenetic and structural analyses of the effector molecules implicated in danger signaling in 11 vascular plant species that represent diverse taxonomic divisions: Arabidopsis thaliana (thale-cress), Zea mays, Oryza sativa (rice), Hordeum vulgare (barley), Medicago truncatula (barrel medic), Nicotiana tabacum (tobacco), Populus trichocarpa (cottonwood), Solanum lycopersicum (tomato), Prunus persica (peach), Vitis vinifera (wine grape), Pisum sativum (pea) (Figure S1).

Sequence dataset & homology searching
The names and/or accession numbers of the characterized proteins reported in PubMed (https://www.ncbi.nlm.nih.gov/pubmed)articles were used to retrieve their corresponding sequences from the publicly accessible sequence databases UniProtKB [9] and NCBI's GenBank [10].To identify more orthologous protein sequences, the known sequences were used as query in reciprocal BLASTp and tBLASTn [11] searches of the genomes of 11 vascular plant species, representing diverse taxonomic divisions (Figure S1).The canonical or longest transcripts were selected.Any partial or ambiguous sequences were not included in the subsequent steps of the study.

Phylogenetic analyses
The amino acid sequences of the corresponding proteins (Table S1) were aligned with MAFFT, version 3.7 [12].Phylogenetic analyses were performed by employing a maximum likelihood (ML) method.To this end, the resulting multiple alignments were provided as input to PhyML v.3.0 [13], which optimizes a distance-based starting tree (BioNJ) [14] by employing a heuristic algorithm.The expected number of amino acid substitutions per site was estimated with the JTT model [15].Protein sequences distantly related to those under study were used as outgroups.The robustness of the reconstructed phylogenetic trees was evaluated by bootstrapping (200 bootstrap pseudo-replicates).Trees were illustrated using Dendroscope [16].

Pairwise distances estimation
Pairwise distances between protein sequences were computed using the software package MEGA, version 7.0 [17].

Protein Domain Organization
The protein sequences under study were searched against the protein signature databases SMART v.8.0 [18] and CDD v.3.16[19] in order to determine the boundaries of their constituent domains.

Tertiary structure analysis
The plant MAPK3 and MAPK6 sequences under study were aligned using the PROMALS3D [22,23] multiple sequence alignment program, which incorporates evolutionary and tertiary structural information to improve alignment accuracy.The degree of conservation of amino acid residues of the homologous plant MAPK3 and MAPK6 proteins was estimated with the usage of the ConSurf program [24].To this end, the resulting multiple sequence alignment of the MAPK3 and MAPK6 amino acid sequences was provided as input to ConSurf to map the conservation grades of the amino acid residues onto the resolved tertiary structure of the Arabidopsis thaliana MAPK6 (PDB ID: 5CI6; [25]).The protein structure was displayed using the PyMol molecular graphics program (http://www.pymol.org/).

Structural modeling of protein-protein interactions
The tertiary structure of the extracellular LRR domain of the A. thaliana PEPR2 was predicted by homology modeling.The X-ray crystal structures of the Arabidopsis PEPR1LRR (PDB ID: 5GR8, chain A; [26]) was used as template to model PEPR2LRR using the I-TASSER server [27].The quality of the final modeled protein structures was evaluated with Procheck [28].
In order to assemble the trimeric PEPR2LRR-AtPEP1-BAK1 complex, the following steps were carried out: LRR domains (bold) present in the PEPR1LRR-AtPEP1 (PDB ID: 5GR8) and FLS2LRR-FLG22-BAK1 (PDB ID: 4MN8) complexes, and the PEPR2LRR homology model were structurally superimposed by using the alignment algorithm FATCAT [29].Following this superimposition, the coordinates of FLS2LRR-FLG22-BAK1 and PEPR1LRR were deleted.As a result of this procedure, a crude model of the PEPR2LRR-AtPEP1-BAK1 complex was obtained.This crude model was refined with the water refinement step of the HADDOCK webserver [30].During refinement, the interaction depicted in Figure 7B was used as a restraint, with the following restraint definition: i) assign (resi 438 and segid A) (resi 23 and segid C) 3.00 3.00 0.05 and ii) assign (resi 392 and segid A)(resi 23 and segid C) 3.00 3.00 0.05.All of the interface statistics were carried out through CoCoMaps tool (https://www.molnac.unisa.it/BioTools/cocomaps/).

Classes of DAMPs
The diverse DAMPs that have been identified in plants are arbitrarily grouped, in this study, in the following classes.
Likewise, in animals, glycosaminoglycan hyaluronan fragments are released during damage and are implicated in wound repair and regeneration.These fragments are recognized by the Toll-like receptors TLR2 and TLR4 and activate the expression of inflammatory genes [34].The viridiplantae MAPK3 and MAPK6 proteins have orthologs in metazoa (Figure 1) which are also implicated in danger signaling [35].The plant MAPK3 and MAPK6 have rather evolved independently from their animal orthologs as they form distinct monophyletic groups with bootstrap values of 100 and they are connected to the corresponding metazoan clade with a long branch which reflects long evolutionary distance (Figure 2 and 3).The monocotyledonous plant MAPK3 and MAPK6 sequences form well-separated clades in their respective phylogenetic trees, leading to the suggestion that these sequences have probably appeared after the monocot-eudicot divergence about 200 million years ago (MYA) [36].The tomato and tobacco MAPK3 and MAPK6 sequences cluster together with high statistical confidence, indicating that these sequences emerged after the members of the order solanales branhed off from the other vascular plants (Figure 2 and 3).A cluster of five highly similar WAK (WAK1-5) genes are present in A. thaliana (thale cress) [37].The Arabidopsis WAK1-5 protein sequences form a separate highly supported clade (Figure 4).Based on sequence database searches, a single copy of WAK was found in the other plant species (Figure 1).WAK homologs were not detected in pea; this is probably due to incomplete annotation of the Pisum sativum genome.The encoded protein was arbitrarily referred to as WAKL, where "L" stands for "Like", by virtue of sequence similarity to the thale cress WAK1-5 (Figure 4).A primordial WAK gene might have undergone a series of duplications in A. thaliana, thereby giving rise to fıve WAK paralogs.However, there are not available fully sequenced and annotated genomes of other members of the order brassicales, apart from the one of the model plant organism A. thaliana, which would allow us to examine whether the WAK gene expansion is species-or order-specific.Based on extensive bibliographic searches, no experimental evidence was found about A. thaliana WAK2-5 acting as PRRs.WAK1 has apparently acquired a more specialized function during the course of evolution.WAK proteins are likely restricted to vascular plants, since orthologous WAK sequences were not found in metazoa.The role of these WAKL proteins in the perception of OGs in their host plant organisms remains to be investigated experimentally.Extracellular ATP (eATP), released from stressed or damaged cells into the extracellular milieu, constitutes a class of danger signals both in plants and animals.In plants, eATP plays an important signaling role in plant cells by participating in several processes including cell development, viability and stress responses [38,39].It is recognized by the plant-specific PRR, DORN1 (DOes not Respond to Nucleotides 1), which is a membrane lectin receptor kinase [40].DORN1 is required for the ATP-mediated intracellular influx of Ca 2+ , formation of ROS, phosphorylation of MAPK3 and MAPK6 and stimulation of the expression of defense genes [40,41].
Putative DORN1 homologous sequences were detected in all plant species under investigation (Figure 1).The cereal plants as well as the solanales DORN1 sequences form their own monophyletic group with bootstrap values of 100, suggesting class-and order-specific evolution of DORN1 (Figure 5).DORN homologous sequences were found exclusively in plants, wherein might be implicated in the recognition of eATP.
In animals, eATP is involved in inflammatory response by activating mast cells [42,43].Two types of P2 purinoceptors, the ligand-gated cation channel P2X receptors and the metabotropic (G-proteincoupled) P2Y receptors [44], are activated by eATP and are involved in the eATP-induced increase of cytosolic Ca 2+ levels [45].

Polypeptide-based DAMPs
A wide range of polypeptide-based DAMPs are derived from larger precursors [46].In Arabidopsis thaliana the 23-amino acid PEP1 (Plant Elicitor Peptide) is processed from a 92-amino acid precursor upon damage, JA, and ethylene [47].Based on homology searches, only one ortholog of AtPEP1 was detected in the fellow plants, that is, ZmPEP1 in maize which exhibited DAMP-like behavior [48].In A. thaliana, PEP1 is perceived by two PRRs, PEPR1 (PEPtide Receptor 1) and PEPR2, leading to ROS production and upregulation of defensin-like genes [26,47,49,50].Of note, the only PEP orthologs were found in two species belonging to divergent taxonomic groups, eudicots (A.thaliana) and monocots (Zea mays), and not any taxonomically related species (such as fellow eudicots).This suggest that even distantly related organisms respond to similar stimuli with similar mechanisms.
PEPs' cognate receptors, however, were found in all plants under study (except pea) (Figure 1).Thale cress harbors a total of seven PEPs (PEP1-PEP7) and two PEPR paralogs, the encoded proteins of which share 67% identity.In the inferred phylogenetic tree, the AtPEPR1 and AtPEPR12 protein sequences form a highly monophyletic branch with bootstrap support 100 (Figure 6).However, a single copy of PEPR was detected in the other plant species, the corresponding protein of which is called PEPR1/2 due to its amino acid sequence similarity to both AtPEPR1 and AtPEPR2.The cereal plants form a separate highly supported clade, leading to the suggestion that PEPR1/2 genes evolved after the monocots-dicots split (Figure 6).PEPRs are also kingdom-restricted, since PEPR orthologs were detected exclusively in vascular plants.AtPEPs exhibit functional similarity to the 18-amino acid peptide systemin, which is processed from the 200-amino acid hormone prosystemin (SYST) in tomato upon wounding [51].Prosystemin was found only in tomato, and not in tobacco which is also member of the same solanales order (Figure 1).However, two hydroxyproline-rich systemin (HSY) peptides, designated HSYA and HSYB, cleaved from a larger preprotein, were identified in tobacco (Figure 1) which do not bear any sequence similarity to systemin [52].Both SYST and HSY induce a JA-mediated signaling pathway that leads to the activation of defense-related genes [51,52].
Of particular note, as in the case of AtPEPRs, systemin cognate receptors (SR160) were detected in all plant species under investigation (Figure 1).It would be intriguing to suggest that PEPR and SR160 genes were probably propagated through vertical gene transfer in plants, and the peptides that specifically bind to them evolved later in certain species in order to carry out species-specific functions.
Protein fragments, such as fibronectin fragments (FN-fs), are the equivalents of the plant polypeptide-based DAMPs in mammals, and they are perceived by Toll-like receptors (TLRs) [53].PEPR1/2 and SR160 are transmembrane (TM) proteins comprised of LRR (Leucine Rich Repeats) motifs, one TM domain, and one catalytic kinase domain in the cytoplasmic region.PEPRs resemble the mammalian TLRs in terms that they share LRR motifs in their amino-termini, suggesting that the extracellular LRR domain is used for DAMP perception both in plants and mammals.
The interaction between PEPR2LRR-AtPEP1 (Figure 7) was also modeled by using the crystal structure of PEPR1LRR-AtPEP1 (PDB ID: 5GR8) as a template.According to 5GR8, there are 16 hydrogen bonds formed between PEPR1LRR and its AtPEP1 peptide [26].In our PEPR2LRR-AtPEP1 model, this number increases to 17. Two hydrogen bonds, among all, were found to be strictly conserved, i.e. the ones between TYR395-GLN21 and ASP441-ASN23 across the PEPR1LRR-AtPEP1 interface; TYR346-GLN21 and ASP392-ASN23 across the PEPR2LRR-AtPEP1 interface.This reflects the importance of these stabilizing interactions within the PEPR family.The extent of the PEPR1LRR-AtPEP1 and PEPR2LRR-AtPEP1 interfaces appear to be conserved, where 66 amino acids interact over an area of 1033 A2 in the case of PEPR1LRR-AtPEP1, and 65 amino acids interact through a 1046 A2 sized interface.PEPR2LRR-BAK1 and its template FLS2LRR-BAK1 (PDB ID: 4MN8) entail a similar number of amino acids across their interfaces (57 for PEPR2LRR-BAK1 and 56 for FLS2LRR-BAK1) (Figure 7).Although this is the case, PEPR2LRR-BAK1 makes more polar interactions (through 12 hydrogen bonds) compared to FLS2LRR-BAK1 (entails 5 hydrogen bonds only), indicating the polar character of the PEPR2LRR surface and a possibly higher affinity between PEPR2LRR-BAK1.
The human HMGB1 harbors two HMG boxes, BoxA and BoxB, whereas the A. thaliana HMGB proteins have only one HMG box.In the plant species under study, single HMBG homologs were detected that were found to be related to Arabidopsis HMBG1/2/3.This is because the A. thaliana proteins HMBG1, HMBG2 and HMBG3 share a high degree of sequence identity/similarity.HMBG1/2/3-related sequences were found in all plant species (Figure S1).
Genes encoding SERKs have been identified in the genomes of vascular plants, both monocots and dicots [58].Furthermore, Sasaki et al. [59] reported the presence of SERK homologs in the less evolved plant liverwort Marchantia polymorpha and even in the unicellular green alga Closterium ehrenbergii.SERKs are ancient, essential genes, conserved during speciation.Multiple SERK genes are found in dicots, monocots, and non-vascular plants.This indicates that SERK genes are at least 450 million years old, and were present before the split of non-vascular and vascular plants [60].Putative SERK-related proteins were detected in the species under investigation (except pea) (Figure 1). Figure 8. Sequence motifs identified in the kinase domain of plant proteins.Sequence logos of the conserved protein motifs identified in the catalytic domain of protein kinases.The first and last amino acid residues of each motif are numbered according to A. thaliana WAK1.The conserved/invariant amino acid residues critical for the protein kinase activity are denoted by dots.The height of each letter depicts the relative frequency of the corresponding amino at that position, and the letters are ordered in such a way that the most frequent one is on the top.

Conserved structural features of plant proteins implicated in danger signaling
In the present study, the 11 conserved catalytic sequence motifs reported by Hanks et al [61] were identified in the plant kinase domains under investigation (Figure 8).In motif 1, the consensus signature GxGxxG of phosphate binding, conserved across nucleotide binding proteins [62,63], was detected.The invariant lysine (K) in motif 2 is essential in protein kinase activity as it participates actively in the phosphate transfer reaction [64].The invariant aspartate (D) and asparagine (N) amino acids in motif 6, as well as the stretch of Asp (D), Phe (F) and Gly (G) residues in motif 7 are actively involved in ATP binding [65].In motif 8, the highly conserved Glu (E) and invariant Pro (P) were found to be flanked by a conserved alanine (A) residue in [61].However, in the plant kinase domain, at the same position, Asp (D) was found to be the most frequent residue instead.Phosphorylation of these residues was associated with enhanced catalytic activity in a number of protein kinases [66,67].Moreover, several highly conserved amino acids with an unknown role identified by Hanks et al [61] were also found to be conserved in plant kinases, including Asp (D) in motif 3, Asp (D) and Gly (G) in motif 9 and Arg (R) in motif 11.
The degree of conservation of the amino acid residues critical for kinase activity is depicted in the threedimensional structure of A. thaliana MAPK6 [25] (Figure 9).A total 13 out of the 16 amino acids found to be conserved/invariant in the primary structures of the kinase domain of plant proteins (Figure 8) are also shown to be conserved in the tertiary structure of this domain (Figure 9).The other important conserved residues can be listed as (from left to right) GLY251, ALA 231, PRO232, GLU233, ARG336, ASP246, ASP189, ASN194, GLU110, LYS92.The numbering is given as in 5CI6.

Conclusion
It has been only about two decades has the significance of DAMPs for the survival and homeostasis of multicellular organisms has surfaced.Related to the more than twenty DAMPs discovered in animals to date, relatively few have been recognized in plants [68,69].Based on our study, several of these plant DAMPs have counterparts in animals, including eATP, HMGBs and MAPKs.This finding supports the idea that all efforts to elucidate the pathway(s) through which innate immunity is triggered will likely identify additional signaling components that are shared by all kingdoms, from planta to humans.The "ancestral" mechanisms of defense are conserved across kingdoms.Several key molecular "players" of those mechanisms appear to be largely conserved (e.g.eATP, calcium, MAPKs).Some plant effector proteins were found to have sequence homologs in animals (e.g., MAPKs), whereas most of them were detected exclusively in plants.Both in plants and animals, selective pressure has been likely exerted to maintain the functional integrity of the defense mechanisms by recruiting or co-opting into the existing molecular pathways novel players.These players may have no apparent sequence / structural similarity but, through convergent evolution, have rather acquired similar functions, as in the case of DORN1 and P2 purinoreceptors, in plants and animals, respectively.These findings could provide the foundation for expanding the current knowledge on DAMP molecules in plants and their role in adaptive defense mechanisms.
Supplementary Materials: Figure S1.A cladogram extracted from NCBI's Taxonomy database [70] depicting the evolutionary relationships among plant species.Table S1.Accession numbers of the protein sequences investigated in the present study.
Author Contributions: A.P. and A.G.G. designed and supervised the study.A.P. and E.K. performed data analyses.A.P., E.K., A.B. and A.G.G. wrote the manuscript.All authors reviewed and approved of the final manuscript.
Funding: Alexander von Humboldt Foundation Return Fellowship.

Figure 2 .
Figure 2. ML-based tree of MAPK3 protein sequences.The branch lengths are proportional to the evolutiotary distance.Bootstrap support values ≥ 50% are shown at the nodes.The scale bar at the upper right denotes the length of amino acid replacements per position.The sequence ThaleCress MAPK9 was used as outgroup.

Figure 3 .
Figure 3. Tree of MAPK6 protein sequences.The sequence ThaleCress MAPK9 was used as outgroup.The conventions are the same as in Figure 2.

Figure 4 .
Figure 4. Tree of WAKL protein sequences.The sequence Pea NORK (nodulation receptor kinase) was used as outgroup.The conventions are the same as in Figure 2.

Figure 5 .
Figure 5. Tree of DORN1 protein sequences.The sequence ThaleCress LYK5 was used as outgroup.The conventions are the same as in Figure 2.

Figure 6 .
Figure 6.Tree of PEPR1/2 protein sequences.The sequence ThaleCress BAM3 was used as outgroup.The conventions are the same as in Figure 2.

Figure 7 .
Figure 7. A. Structural model of the Arabidopsis thaliana PEPR2LRR(green)-AtPEP1(gray)-BAK1(gold) complex.Important polar interactions formed across the PEPR2LRR and AtPEP1, PEPR2LRR and BAK1, AtPEP1 and BAK1 interfaces are depicted in panels B., C., D., respectively.Polar interactors are represented in ball-and-sticks, with oxygen atoms colored in red and nitrogens in blue.

Figure 9 .
Figure 9. A. Tertiary structure … blue corresponds to the most highly conserved region, where green to the least one.The conserved DFG (207-209) motif is highlighted in yellow (left panel, at the center of the molecule).The other important conserved residues can be listed as (from left to right) GLY251, ALA 231, PRO232, GLU233, ARG336, ASP246, ASP189, ASN194, GLU110, LYS92.The numbering is given as in 5CI6.