Evolutionary Aspect of Miltefosine Transporter Proteins in Leishmania major

Transporter proteins, P-glycoprotein (P-gp) and P4ATPase-CDC50, are responsible for the transport of Miltefosine drug across cell membrane of a protozoan parasite Leishmania major. Mutations or change in activity of these proteins may lead to emergence of resistance in the parasite. Owing to the structural and functional importance of these transporter proteins, in this ppaper, we have tried to decipher the evolutionary divergence of these Miltefosine transporter proteins across different forms of life including Protists, Fungi, Plants and Animals. We retrieved 96, 207, and 189 sequences of P-gp, P4ATPase and CDC50 proteins respectively, across diverse variety of organisms for the conserved analysis. Phylogenetic trees were constructed for these three transporter proteins based on Bayesian posterior probability inference. The evolutionary analysis concluded that these proteins remain highly conserved throughout the species diversity but still substantial differences in the proteins for host (Homo sapiens) and parasite (L. major) were observed which have led in targeting these Miltefosine transporter proteins in a parasite specific manner.


Introduction
Leishmaniasis, a neglected tropical disease, is the second largest parasitic killer in the world only after malaria. It is prevalent in more than 90 countries all over world with an estimate of 2 million new cases every year (Herwaldt 1999; India National vector Borne Diseases Control Programme 2017). As for treatment regimen, various drugs available are antimonials, Miltefosine, sitamaquine, Paromomycin, amphotericin B, etc. Of these, Miltefosine is the only one to be administered orally, hence widely used. The irregular and extensive use of Miltefosine in the past few years have resulted in development of resistance against this drug (Singh et al. 2012; Ghorbani and Farhoudi 2018). Miltefosine is being transported inside parasite through P4ATPase-CDC50 protein complex while the efflux is carried out by an ATP binding cassette (ABC) transporter protein transporting phospholipid, i.e., P-glycoprotein (P-gp). Structurally, P4ATPase consists of 4 domains, a transmembrane domain (TMD) having 10 transmembrane helices (TMH), an actuator domain (A-domain), a phosphorylating domain (P-domain) and a nucleotide binding domain (NBD). CDC50 has 2 TMH with a large exocytoplasmic loop.
Structural composition of P-gp includes 2 TMDs each having 6 TMH and 2 NBDs (Dawson and Locher 2006;López-Marqués et al. 2015). Alteration in the functional activity of these two proteins have resulted in the development of resistance by the Leishmanial parasite against Miltefosine (Croft et al. 2006;Singh and Mandlik 2015).
The transporters proteins play a pivotal role in the survival of intracellular pathogens inside hosts. They mediate the entry of essential nutrients inside cell, uptake and efflux of ions, maintain homeostasis by regulating concentration of metabolites, and particularly in survival of pathogens by extrusion of drugs, toxins and xenobiotics (Lam et al. 2011). As transporter proteins play such an important role in survival of organisms, hence a lot of studies have been focused on them, especially on the drug transporters. Since most of the transporter proteins are membrane proteins, hence a large fraction of reports have been on the evolution of the transmembrane helices amongst various superfamilies'. Lam et al. have proposed a pathway of evolution of transmembrane helices (TMH) in drugmetabolite transporter proteins starting from having 2Transmembrane Segments (TMS) to 4-5TMS and reaching up to 10TMS (Lam et al. 2011). In another study by MH Saier, a pathway of the evolution of transporter proteins in all organisms was proposed which starts from a simple peptide channel comprising of 1-3 helices to evolving to much complex secondary carriers, primary active transporters as well group translocators comprising of 12-14 helices (Saier 2016).
ATP binding cassette (ABC) transporter proteins constitute one of the largest superfamily of the transporter proteins and are present in all forms of life where gene duplication has played an important role in the evolution of new genes (Jones and George 2004). Gene duplication events have resulted in the evolution of many ABC genes in vertebrates and most of them are associated with various diseases as well as development of resistance (Moitra and Dean 2011). Saurin W. et al. retrieved 197 sequences of ABC transporters homologues to Maltose/maltodextrin import ATP-binding protein (MalK protein), a bacterial transporter. The phylogenetic analysis of these transporter proteins showed a clear divergence in the evolution of ABC importers and ABC exporters and subsequently they proposed a model for studying the generation and diversification of ABC systems (Saurin et al. 1999). Multigene families like those of ABC transporters having ATP-binding cassette, are subjected to point mutations or duplication events resulting in either few members to be nonfunctional or evolution of new members suggesting that almost all the members have emerged from a common ancestor. Based on this ideology, 68 members of ABC family have been identified from previously known 6 genes in Dictyostelium discoideum and other representatives of each family of ABC superfamily. This number is quite more than those present in either yeast or human which are amongst the best characterized genomes (Anjard and Loomis 2002). Evolutionary analysis have helped in establishing the connection of Cystic fibrosis transmembrane conductance regulator (CFTR) protein being a member of ABCtransporter superfamily in spite of being an ATP-dependent chloride channel. Although having the functional divergence, CFTR belongs to ABCC subfamily based on the structural similarity and domain rearrangement (Jordan et al. 2008 This have facilitated in understanding functional divergence and identification of new putative functions of ABC genes and how they are related in both eukaryotes and prokaryotes (Igarashi et al. 2004). Another large superfamily of membrane transport proteins utilizing ATP-hydrolysis energy is P-type ATPases. Axelsen and Palmgren analyzed 159 P-type ATPase sequences and classified them in 5 major classes based on their substrate similarity instead of sequence similarity (Axelsen and Palmgren 1998). Moller et al. studied the evolutionary divergence of P5 subfamily of P-type ATPases in eukaryotes which acts as secretory pumps in Endoplasmic reticulum (Møller et al. 2008). Similarly substrate specificities based evolutionary analysis was done for maltose transporters in Thermotoga maritima (Nanavati et al. 2005). A few research groups have pointed the evolutionary relationship between different subfamilies of the superfamily on the basis of their structure as well as function i.e., their substrate specificity.
Here, we are providing insight on the sequential conservedness of a particular member of a transporter proteins superfamily and their presence in various organisms. To the best of our knowledge, this is the first report of its kind where evolutionary divergence of Miltefosine transporter proteins in Leishmanial species is being discussed. Since, Miltefosine is the only orally administered drug available for the treatment of Leishmaniasis, and the protozoan parasite is developing resistance with an alarming rate against this drug. We aim to study the evolutionary relationship of these transporter proteins to find any possible solutions for combating drug resistance. In one of our previous published dataset, we did the conservedness analysis and compared these Miltefosine transporter proteins between host (Homo sapiens) and parasite (L. major) to identify parasite specific motifs against which peptides were designed in order to allosterically modulate activity of transporter proteins to reverse the resistance effect (Kabra et al. 2020). Using the same data set, we did reanalysis aiming at the evolutionary divergence of these 3 transporter proteins across different kingdoms of life.

Sequence retrieval
Sequences for P-gp, P4ATPase and CDC50 were extracted manually from National Centre for Biotechnology Information (NCBI, https://www.ncbi.nlm.nih.gov/) in FASTA format. All the annotated, complete and fully characterized sequences available in the database (as of December, 2015) ranging from all five kingdoms of life (Monera, Protista, Fungi, Animalia and Plantae) were retrieved. The sequences of P-gp, P4ATPase and CDC50 proteins of L. major (parasite) and Human (host) serves as the basis to identify homologous sequences in other organisms via BLASTp and then apparently their retrieval from NCBI following the accession IDs. As far as, multiple copy or paralogs are concerned we looked for those variants whose substrates were phospholipids since Miltefosine is a phosphatidylcholine analog and this substrate information was available only in annotated sequences. The sequences ranged from 181 to 1503 amino acids in P-gp; 776 residues to 2061 residues for P4ATPase and from 245 residues to 1537 residues in CDC50 protein. Multiple Sequence Alignment (MSA) for each protein was done using Clustal Omega and the output was saved in Nexus format (Sievers et al. 2011). Clustal omega is a tool, available online at https://www.ebi.ac.uk/Tools/msa/clustalo/, for aligning three or more nucleotide or protein sequences. This tool is provided and maintained by EMBL-EBI. It uses either seeded guide trees or Hidden Markov Model (HMM) profile-profile technique to generate MSA.

Phylogenetic tree construction
Phylogenetic Tree was constructed through MrBayes v3. and CDC50 sequences to get a converged tree with the average standard deviation of split frequency below 0.01. The constructed trees were visualized using FigTree v1.4.2 available at http://tree.bio.ed.ac.uk/software/figtree/.

Identification of specificity determining positions
JDet software was used to analyze aligned sequences and identify the positions determining specificity through Xdet method (Muth et al. 2012). In the graphical user interface, we used predefined Xdet method where the command '.\conf\Maxhom_McLachlan.metric' was run. For the Xdet analysis, the threshold cutoffs considered were 0.6 for Xdet and 2.5 for entropy. Using JDet interface, the sequences were represented in logo format.

Results
Fully annotated sequences of P-glycoprotein (P-gp), P4ATPase and CDC50 proteins were retrieved from NCBI (as available till 2015) from archaea to mammalia as well as plantae. The sequences homologues to Leishmania major were considered. The sequences of these proteins retrieved for various organisms were classified and represented in fig 1. A total of 96, 207 and 189 sequences were retrieved for P-gp, P4ATPase and CDC50 proteins. As can be seen, P4ATPase and CDC50 have been very well characterized in Fungi followed by Chordata, Protozoa and Nemathelminthes. In order to study the evolutionary relationship and sequential conservedness amongst different organisms, phylogenetic tree was constructed for these three proteins, viz. P-gp, P4ATPase and CDC50.

P-gp
Fully annotated sequences of P-glycoprotein (P-gp) were retrieved from NCBI (as available till Although P-gp is sequentially highly conserved, but it was seen to be more conserved amongst For identifying the positions determining specificity (SDPs), JDet was used where C. albicans served as the master sequence. Within that, Xdet method was used with a cut off of 0.6 and entropy cutoff as 2.5. From the various position specific residues obtained, we were particularly interested in those residues which directly interacts with P-gp substrate, i.e., Miltefosine. The Miltefosine interacting residues for L. major P-gp, which have been previously determined by us (Kabra et al. 2020), are S359, A360, M364, L397, R941, T1008, I1011, Y1012, S1041, I1044 and Q1049. Of these 11 residues, only one residue, L397, was obtained post Xdet analysis. But, in case of protozoan species only, 8 residues (S359, A360, M364, L397, R941, S1041, I1044 and Q1049) were identified as SDPs. The Miltefosine interacting residues for Pgp (all organisms and only protozoans) pre and post Xdet analysis are represented in fig 3.

P4ATPase
P4ATPase is a flippase belonging to type 4 sub-family of P-type ATPases. It mainly transports phospholipids from outer leaflet of membrane towards inner leaflet (cytosol). Trypanosomatids as studied by Maslov and group (Maslov et al. 1996). This also validates the consideration of genes/proteins over morphological features while talking about the evolution of these protozoan parasites. In an elaborated review by Lukes and group, they pointed towards more possibility of gene duplication events in trypanosomes having approximately 11000 protein coding genes when compared with Leptomonas and leishmanials having approximately 8200 protein coding genes (Lukeš et al. 2018). The three protists in which fully annotated P4ATPase sequence have been characterized and retrieved, Naegleria gruberi was found to share more similarity with protozoans compared to the other two protists, Acanthamoeba castellanii and and Macaca mulatta) were observed to be diverged from a common ancestor with Trichusris trichiura (a nematode) being an exception ( fig S5). The rest of the chordates follow an independent lineage with P4ATPase in humans sharing similarity with rodents. As foresaid, Fungi P4ATPases have been well characterized and out of sequences belonging to 166 fungal species being retrieved; 117 belongs to Ascomycota division, 39 to Basidiomycota, 3 each to Zygomycota and Chytridiomycota and 1 each to Entomophthoromycota, Mucoromycota, Glomeromycota and Deuteromycota ( fig S6). The evolutionary divergence suggests Ascomycots and Basidiomycots have diverged from a common ancestor and are closely related to Glomeromycots. Amongst these fungal species, Rozella allomycis, a Chytridiomycot, was believed to have been evolved first in comparison to other members of same division like

Spizellomyces punctatus and Gonapodya prolifera.
For the SDPs in P4ATPase protein, Trichomonas vaginalis served as the reference sequence.
K121, S365, V372 and N870 residues of L. major P4ATPase are interacting with Miltefosine (Kabra et al. 2020). All the four residues were identified as SDPs post Xdet analysis whether all 207 organisms or just 15 protozoans were considered. These residues in sequence logo format are shown in fig 5. It was observed that residues 'K', 'S' and 'N' were highly conserved in all organisms while 'V' shared place with 'M' and 'L' (majorly in protozoans). This conservedness of Miltefosine interacting residues was expected since P4ATPase proteins are specifically lipid transporters and Miltefosine is an analog of phosphatidylcholine.

CDC50
CDC50 acts as β-subunit required for the functioning as well as membrane localization of P4ATPase (López-Marqués et al. 2015). The heterodimeric complex of P4ATPase-CDC50 proteins together is responsible for the influx of their substrates towards the cytosol. As of 2015, fully annotated sequences of CDC50 protein belonging to 189 organisms were retrieved from NCBI. Similar to P4ATPase, CDC50 proteins are also very well characterized in Fungi with fully annotated sequences retrieved from 117 organisms comprising of 62% of the total organisms considered for phylogenetic analysis (fig 6). Unlike P-gp and P4ATPase, in terms of sequence length CDC50 protein was observed to be much conserved with a range from 245 residues (Drosophila sechellia, an arthropod) to 612 residues (Trichinella sp. T9, a nematode) with Amazona aestiva, a chordate, being an exception with protein length of 1537 residues. As observed in P4ATPase, in case of CDC50 proteins also a protozoan was found to be acting as outgroup, i.e., Trypanosoma grayi.  S11). Amongst them, Gonapodya prolifera, a Chytridiomycot, was perceived to be outgroup. Of the two zygomycots, Mucor ambiguous, shares more resemblance with entomophthoromycot, Conidiobolus coronatus rather than its fellow member i.e., Lichtheimia corymbifera. The sister clade of these three comprises of basidiomycots.
In case of L. major's CDC50 protein, F151, Q159, M235 and W236 interacts directly with Miltefosine (Kabra et al. 2020). When all organisms were considered, none of these residues fit in the criteria of Xdet analysis, but when only protozoans were considered, all these miltefosine interacting residues acted as SDPs post Xdet analysis (fig 7). Unlike P4ATPase, in CDC50 proteins, Miltefosine interacting residues are far diverged when all organisms were considered and hence no residues lied in Xdet analysis. In contrast, these residues showed their dominance in protozoans.

Discussion
The transporter proteins holds utmost position in any cell or organisms as they not only help in the uptake of nutrients, small molecules or ions but are also responsible for the extrusion of toxic substances, xenobiotics, drugs etc. Since different transporters are specific for their substrates hence, these proteins have remained highly conserved in the due course of evolution. Here in this project, we have tried to study the evolutionary relationship of Miltefosine transporter proteins in Leishmania major with the homologous proteins present in other organisms. P4ATPase and CDC50 proteins are responsible for the uptake of Miltefosine drug while P-gp is responsible for the extrusion of drug outside parasite. On analyzing the completely annotated sequences of these proteins amongst four kingdoms of life, i.e., Protista, Fungi, Animalia and Plantae; we found that these proteins have shown quite a high degree of conservedness across different classes of organisms. Most of the organisms belonging to a phylum or division were found to be clustered together with very exceptional cases of horizontal gene transfer. Through these analysis it was realized that Type IV -P-type ATPase have been studied and characterized more than the ABC transporter, P-glycoprotein (P-gp) especially amongst fungal species. The identification of SDPs through Xdet analysis confirmed that drug binding residues were more conserved in P4ATPase rather than in its accessory protein CDC50 or in P-gp. But when a particular class of organisms was considered, like in protozoans, the three Miltefosine transporter proteins, i.e., P-gp, P4ATPase and CDC50; were found to exhibit more conservedness. Similar observation was observed through phylogenetic analysis too. Inspite of being kinetoplastids, Trypanosomes and Leishmanias were observed to have diverged from each other sequentially. But they were found to possess same substrate specificity as various Trypanosome strains like T. cruzi, T. copemani, etc. were found to be susceptible of Miltefosine (Saraiva et al. 2002;Luna et al. 2009;Botero et al. 2017). In nutshell, these phylogenetic analysis as well as finding of SDPs not only helped in understanding the evolutionary pattern of a particular protein in an organism but can also help in targeting a protein in an organism specific manner to either increase or decrease their functional activity. Using this concept, the sequential and positional conservedness of P-gp, P4ATPase and CDC50 proteins between Human (host) and Leishmania major (parasite) have helped in designing parasite specific peptides that allosterically modulate activity of Miltefosine transporter proteins resulting in reversal of Miltefosine resistant parasites to the drug sensitive form (Kabra et al. 2020). Similar strategy can be implied for other proteins in same or different organisms. Thus, the study of evolutionary relationship of a probable target protein across diverse organisms may provide probable and long-lasting solutions in curing a disease or fighting against resistance.