README: Supporting data for Pandey & Braun (2019)

This folder contains data files that support the manuscript:

Pandey A, Braun EL (2019) Sites in Different Protein Structural Environments Support Distinct Placements of the Metazoan Root


Supplementary_File_S1.xlsx:
		File containing information about all proteins included in the Ryan et al. (2013)
		dataset. Transmembrane (TM) proteins (which were excluded from the FRG dataset) 
		are indicated. We predicted TM proteins using the TOPCONS server (Tsirigos et al. 
		2015).

Supplementary_File_S2:
		Nexus files with each protein multiple sequence alignment in the FRG dataset. 
		All files include an ASSUMPTIONS block with charsets. The relevant charsets are:
		
			CHARSET SS_HELIX = list of sites... ;
			CHARSET SS_SHEET = list of sites... ;
			CHARSET SS_COIL  = list of sites... ;
			CHARSET EXPOSED  = list of sites... ;
			CHARSET BURIED   = list of sites... ;

		Each file also includes a PAUP block that will export the relevant subsets of 
		the data if the file is executed in PAUP* (Swofford 2018).

Supplementary_File_S3.xlsx:
		File containing the gene-wise likelihoods of the two-alternative trees (T2 and
		T3) examined by Reddy & Braun (2018). The excel file includes a graph of the 
		likelihood differences for each gene. 

Supplementary_File_S4:
		Rate matrices estimated using data from Ryan et al. (2013). All .dat files are
		text files with a format usable by programs such as codeml from PAML (Yang 2007)
		or IQ-TREE (Nguyen et al. 2014). The files use the standard amino acid order:
		
		Ala Arg Asn Asp Cys Gln Glu Gly His Ile Leu Lys Met Phe Pro Ser Thr Trp Tyr Val
		
		All .dat files have the prefix PB18; this indicate the authors of the manuscript
		and the year of submission. "grandGTR" was estimated using the complete Ryan
		dataset. Other .dat files 

Supplementary_File_S5:
		This folder contains treefiles generated by RAxML analyses of structurally-
		defined subsets of protein sequences from the "filterd Ryan genomic dataset"
		(FRG, which was generated by removing a subset of proteins from a data matrix
		assembled by Ryan et al. 2013). One treefile is in nexus format (Maddison et al.
		1997) that contains comments that will printed to the screen if the file is
		executed in PAUP* (Swofford 2018). That file can also be read in any text
		editor. The folder also contains the same trees in a simple newick format.

Supplementary_File_S6:
		This folder contains treefiles generated by RAxML analyses of structurally-
		defined subsets of protein sequences from the FRG. The formats of the treefiles
		are identical to those in 01_treefiles_set1

Supplementary_File_S7:
		Log and output files (.log and .iqtree) files from IQ-TREE (Nguyen et al. 2015) 
		analyses of the exposed and buried subsets of the FRG dataset using the ML 
		variants of Bayesian the CAT models (Le et al. 2008). Results are split into two
		subfolders that contain the results for buried vs solvent exposed residues.

Supplementary_File_S8:
		This folder contains phylip file for recoded alignments for exposed and buried 
		as well as nexus tree file for all the recoded trees

REFERENCES:

Le SQ, Gascuel O, Lartillot N (2008) Empirical profile mixture models for phylogenetic 
reconstruction. Bioinformatics 24, 2317–2323.

Maddison DR, Swofford DL, Maddison WP (1997) NEXUS: An extensible file format for 
systematic information. Sys Biol 46, 590–621.

Nguyen LT, Schmidt HA, von Haeseler A, Minh BQ (2015) IQ-TREE: A fast and effective 
stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol 32, 
268-274.

Ryan JF, Pang K, Schnitzler CE, Nguyen AD, Moreland RT, Simmons DK, Koch BJ, Francis WR, 
Havlak P, Smith SA, et al. (2013) The genome of the ctenophore Mnemiopsis leidyi and its 
implications for cell type evolution. Science 342, 1242592.

Swofford DL (2018) PAUP* (* Phylogenetic Analysis Using PAUP). Computer program available
from http://paup.phylosolutions.com

Tsirigos KD, Peters C, Shu N, Käll L, Elofsson A (2015) The TOPCONS web server for 
consensus prediction of membrane protein topology and signal peptides. Nucleic Acids 
Res 43, W401–W407.

Yang Z (2007) PAML 4: Phylogenetic analysis by maximum likelihood. Mol Biol Evol 24, 
1586-1591.
