Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

The Roles of Protein Structure, Taxon Sampling, and Model Complexity in Phylogenomics: A Case Study Focused on Early Animal Divergences

Version 1 : Received: 18 January 2021 / Approved: 19 January 2021 / Online: 19 January 2021 (16:46:25 CET)

A peer-reviewed article of this Preprint also exists.

Pandey, A.; Braun, E.L. The Roles of Protein Structure, Taxon Sampling, and Model Complexity in Phylogenomics: A Case Study Focused on Early Animal Divergences. Biophysica 2021, 1, 87-105. Pandey, A.; Braun, E.L. The Roles of Protein Structure, Taxon Sampling, and Model Complexity in Phylogenomics: A Case Study Focused on Early Animal Divergences. Biophysica 2021, 1, 87-105.

Journal reference: Biophysica 2021, 1, 8
DOI: 10.3390/biophysica1020008

Abstract

Despite the long history of using protein sequences to infer the tree of life the potential for different parts of protein structures to retain historical signal remains unclear. We propose that it might be possible to improve analyses of phylogenomic datasets by incorporating information about protein structure; we test this idea using the position of the root of Metazoa (animals) as a model system. We examined the distribution of “strongly decisive” sites (alignment positions that support a specific tree topology) in a dataset comprising >1,500 proteins and almost 100 taxa. The proportion of each class of strongly decisive sites in different structural environments was very sensitive to the model used to analyze the data when a limited number of taxa were used but they were stable when taxa were added. As long as enough taxa were analyzed, sites in all structural environments supported the same topology (ctenophores sister to other animals) regardless of whether standard tree searches or decisive sites were used to select the optimal tree. However, the use of decisive sites revealed a difference between the support for minority topologies for sites in different structural environments; buried sites and sites in sheet and coil environments exhibited equal support for the minority topologies whereas solvent exposed and helix sites had unequal numbers of sites supporting the minority topologies. Given the plausible trees equal support for minority topologies is consistent with discordance among gene trees, making it possible the relatively slowly evolving buried (and sheet and coil) sites are giving an accurate picture of the true species tree as well as the amount of conflict among gene trees. Alternatively, the apparent support could reflect currently uncharacterized processes of molecular evolution. Regardless, it is clear that analyses of the deepest branches in the animal tree of life using sites in different structural environments are associated with a subtle data type effect that results in distinct phylogenetic signals.

Keywords

protein structure; relative solvent accessibility; secondary structure; phylogeny; models of sequence evolution; gene tree-species tree discordance; incomplete lineage sorting; Ctenophora; Porifera

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our diversity statement.

Leave a public comment
Send a private comment to the author(s)
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.