Upstream Open Reading Frames (uORFs) in the Apicomplexan Parasites Plasmodium falciparum and Toxoplasma gondii: small yet powerful regulators of translation

During their complex life cycles, the Apicomplexan parasites, Plasmodium falciparum and Toxoplasma gondii employ several genetic switches to regulate their gene expression. One such switch is mediated at the level of translation through upstream Open Reading Frames (uORFs). As uORFs are found in the upstream regions of a majority of genes in both the parasites, it is essential that their roles in translational regulation be appreciated to a greater extent. This review provides a comprehensive summary of studies that show uORF-mediated gene regulation in these parasites and highlights examples of clinically and physiologically relevant proteins that exhibit uORF-mediated regulation. In addition to these examples, several studies that use bioinformatics, transcriptomics, proteomics, and ribosome profiling also indicate the possibility of widespread translational regulation by uORFs. Further analysis of genome-wide datasets will reveal novel genes involved in key biological pathways such as cellcycle progression, stress-response, and pathogenicity. The cumulative evidence from studies presented in this review suggests that uORFs will play crucial roles in regulating gene expression during clinical disease caused by these important human pathogens.


INTRODUCTION
Eukaryotic translation initiation is a tightly regulated, multi-step process that involves scanning of messenger RNA (mRNA) by the preinitiation complex (Kozak, 1980). This complex, comprising of the small ribosomal subunit and numerous initiation factors, scans the mRNA for an authentic start codon (AUG) present in the coding sequence (CDS) (Kozak, 1991). The selection of the start codon is governed by the sequence surrounding the AUG codon i.e. the Kozak sequence, availability of initiation factors, molecules that provide energy, and methionyl-tRNAs (reviewed in Hinnebusch, 2011).
Other than these factors, the presence of start codon(s) that lie upstream of the start codon of the main CDS confers another layer of regulation. This is due to the scanning model of translation initiation where the ribosomes recognize the 5ʹ cap and move along the mRNA towards the 3ʹ end. During this process, the ribosomes encounter upstream start codons (uAUGs) before the main CDS and therefore, these uAUGs are capable of engaging the ribosome (Kozak, 2002). Similar to uAUGs, upstream open reading frames (uORFs), defined as an upstream start codon followed by an in-frame stop codon, also engage the scanning ribosome with varying capacities, which in turn alters the level of the protein encoded by the main CDS (reviewed in Morris and Geballe, 2000). The presence of these alternative initiation sites constitutes a "hurdle" for the ribosome and usually results in repression of translation of the main CDS. This repression can be relieved by the cellular translation machinery with a multitude of strategies, as and when required (Wang and Rothnagel, 2004;Iacono et al., 2005).
Hence, uAUGs and uORFs act as regulatory elements in the 5' leader sequences of eukaryotic mRNAs. Interestingly, as translation regulation allows the organism to respond more rapidly than transcriptional regulation, uORFs (rather than uAUGs) are used by cells to handle a wide range of environmental changes, affecting the survivability of the cell.
The earliest known evidence for uORF involvement in translational control was shown for Zebrucka et al., 2016;Wek, 2018;Costa-Mattioli and Walter, 2020) and leads to global inhibition of protein synthesis and up-regulation of genes involved in mediating the adaptive response. These studies indicate that the phosphorylation status of eIF2α is a global indicator for translational regulation of large numbers of genes, some of which could be controlled by uORFs.
A more definitive role for uORFs in translational regulation is provided by the presence of ribosomal footprints on the 5' leader of the transcripts undergoing PTGR (Schneider-Poetsch et al., 2010;Garreau de Loubresse et al., 2014). This provides a snapshot of the dynamics of translation on each transcript by determining the positions of the ribosomes engaged in elongating an ORF (Brar et al., 2012;Ingolia et al., 2014). Such studies in yeast and humans revealed that uORFs are the major contributors of ribosome occupancy in the 5' leaders of transcripts (Calvo et al., 2009;Brar et al., 2012;Ingolia et al., 2014;Johnstone et al., 2016), suggesting that the presence of ribosome footprints in the 5' leader of the transcript is a distinctive feature that indicates PTGR via uORFs. Ribosome footprints along the entire length of certain genes show that when the upstream regions are loaded with ribosomes, the CDS has lower ribosome occupancy (Ingolia et al., 2014). These data reinforce the notion that the presence of uORFs stalls the ribosome before it can reach the main CDS, resulting in repression of CDS translation. The conclusive proof of a uORF regulating the translation of a particular gene is provided when mutation of the start codon of the uORF results in a loss of repression/regulation of the gene (Harigai et al., 1996;Reynolds et al., 1996;Ruan et al., 1996;Schlüter et al., 2000;Sarrazin et al., 2000;Diba et al., 2001;Kwon et al., 2001;Jousse et al., 2001;Warnakulasuriyarachchi et al., 2003;Zhang and Dietrich, 2005;Lee et al., 2007;Song et al., 2007;Calvo et al., 2009;Devlin et al., 2010;Spevak et al., 2010;Qiao et al., 2011;Armata et al., 2013;Tennen et al., 2013;Bancells and Deitsch, 2013;Wu et al., 2014;Capell et al., 2014;Kumar et al., 2015;Guerrero-González et al., 2016). The direct and indirect falciparum and T. gondii respectively (Jacobs, 1963;Sabin and Olitsky, 1937). These parasites exhibit many developmental stages in different hosts and so must regulate expression of their genes in a highly coordinated fashion for survival and transmission in order to complete their life cycles. Gene expression is regulated at multiple levels, including transcription and translation (White et al., 2014;Vembar et al., 2014Vembar et al., , 2015Vembar et al., , 2016Holmes et al., 2017;Bennink and Pradel, 2019;Hollin and Le Roch, 2020;Sharma et al., 2020).
There is evidence for uORFs playing substantive roles in translational control in apicomplexan parasites; this evidence includes high frequencies and a widespread distribution of uORFs among large numbers of genes (Bunnik et al., 2013;Caro et al., 2014;Kumar et al., 2015;Srinivas et al., 2016;Hassan et al., 2017;Holmes et al., 2019;Markus et al., 2021).
Additionally, ribosome profiling studies in P. falciparum and T. gondii parasites reveal footprints in the 5' leader sequences of transcripts (Lacsina et al., 2011;Bunnik et al., 2013;Caro et al., 2014;Hassan et al., 2017;Holmes et al., 2019). Recent discoveries of clinically important genes, such as var2csa in P. falciparum (Chan et al., 2017) and BFD1 in T. gondii (Waldman et al., 2020) that are regulated translationally by uORFs further reinforce the impact of these small, yet important features in translational regulation of gene expression. In the next sections, the major findings of these and other reports will be summarized and the need to further understand the phenomenon of uORF-mediated PTGR in apicomplexan parasites will be highlighted in detail.

UPSTREAM ORFS IN PLASMODIUM FALCIPARUM
A long uORF regulates translation of the var2csa gene The first example of uORF-mediated translational regulation was shown for a gene implicated in pregnancy-associated malaria (PAM), also termed malaria in pregnancy (MiP): var2csa (Lavstsen et al., 2003;Amulic et al., 2009;Bancells and Deitsch, 2013). This gene is a variant of the var gene family in P. falciparum that consists of ~60 var genes encoding Erythrocyte Membrane Protein 1 (PfEMP1). These proteins help the parasite evade clearance by the spleen of the host by binding to the endothelial lining of blood vessels (Kraemer and Smith, 2006).
The var gene family has also been implicated in cerebral malaria, one of the major symptoms of severe malaria caused by P. falciparum that results due to sequestration of infected RBCs to capillaries in the brain (reviewed in van der Heyde et al., 2006). This sequestration is due to binding of PfEMP1 proteins to receptors such as CD36, thrombospondin, and intercellular adhesion molecule 1 (ICAM1) found on the surface of different cell types (Baruch et al., 1996;Smith et al., 2000Smith et al., , 2013Rowe et al., 2009).
The transcription profile of members of this gene family is unusual, with only one of the var genes expressed at a given time (Scherf et al., 1998)  such as cross-reactivity to other proteins that cannot be ruled out, they also mention that deregulation of the uORF-mediated repression of the var2csa gene might play a role in these clinical findings.

High prevalence of uORFs in the P. falciparum genome leads to repression of translation
Reports establishing translational regulation of the var2csa gene led to an interest in understanding whether this phenomenon was observed in other genes as well. Interestingly, subsequent studies showed that regulation by uORFs could be more prevalent in P. falciparum

Upstream ORFs in stress conditions
The role of uORFs in the stress response in yeast and mammals is well studied (Hinnebusch, 2005;Silva et al., 2019;Houston et al., 2020). However, this area of research requires more focus in P. falciparum, more so because of the widespread occurrence of uORFs. During its complicated life cycle, P. falciparum faces a variety of external conditions that are hostile to the parasite. As is the case with other parasites, P. falciparum has also evolved to use complex strategies to adapt to the changing environment (Camus et al., 1995). While the shift of host from mosquito to human is one of the major challenges faced by the parasite due to drastic During the IDC, P. falciparum experiences a periodic rise in temperature every 48 hours due to the host inflammatory response (Brown, 1912). The temperature during these febrile episodes can elevate to 40-41°C (Kwiatkowski, 1989). The adaptive response to the cyclical heat stress experienced by intra-erythrocytic parasites has been studied at the level of the transcriptome (Oakley et al., 2007;Rawat et al., 2019). However, as translational responses afford a rapid adaptation mechanism, it would be informative to study whether uORFs play a role in heat stress by checking the phosphorylation status of PfeIF2α and differential ribosome occupancy during this stress condition.
Another stress faced by P. falciparum during its intra-erythrocytic cycle is the lack of essential amino acids, especially isoleucine. This stress arises from the fact that inside the red blood cell, the parasite salvages amino acids by degrading haemoglobin (Francis et al., 1997).
However, of the twenty amino acids, isoleucine is completely absent in the α and β chains of haemoglobin (Sherman, 1977). Therefore, the parasite depends on an exogenous supply of isoleucine through the plasma of the host (Liu et al., 2006). Since isoleucine is an essential amino acid, the human host also depends on external sources of isoleucine to survive (Soeters Last updated: 26 April 2021 translation of the genes required for adaptive response to this nutritional stress faced by P.
falciparum can be illustrated by identifying genes having differential ribosome occupancy in parasites that are deprived of isoleucine. Further, ribosome profiling of PfeIK1 knock-out parasites would also reveal classes of genes that are under regulation by uORFs.
There is preliminary evidence to support the notion that translational regulation mediated by uORFs occurs during isoleucine starvation stress. The Maf1 protein is a part of the Target In an attempt to study translational control of genes that provide an adaptive advantage to the stress posed by the extracellular environment, comparative ribosome profiling of extracellular and intracellular tachyzoites was performed. This study identified more than a thousand genes that vary at the level of ribosome occupancy in intracellular and extracellular parasites, implying there is a widespread usage of translational regulation to cope with the stress imposed by the extracellular environment on T. gondii. Additionally, this study ScAAP stalls the ribosome and prevents it from reaching the downstream CDS in arginine-rich conditions. Conversely, in the conditions of arginine scarcity, ribosomes are able to reach and translate the downstream CDS (Wu et al., 2012;Wei et al., 2012). A similar switch is used by T. gondii for modulating the TgApiAT1-dependent uptake of arginine in varying arginine conditions (Rajendran et al., 2019). Given the extensive occurrence of uORFs in T. gondii, we believe that this might be among the first of many studies that unravel the existence of uORFmediated translational regulation.

Upstream ORFs play a crucial role in development of latent cysts in T. gondii
Inside the intermediate mammalian host, T. gondii parasites divide asexually to form tachyzoites, which develop into tissue cyst bradyzoites under certain conditions (reviewed in Cerutti et al., 2020). Bradyzoites are the latent stages of T. gondii that persist and cause reinfection when the immune system of the host lapses (Dubey, 1998;Montoya and Liesenfeld, 2004). While the host immune response can lead to stress that initiates bradyzoite formation in vivo (Bohne et al., 1993;Lüder et al., 1999), conversion of tachyzoites to bradyzoites in vitro can be induced under various stress conditions, such as pH change, heat shock, nutritional stress, stress to the endoplasmic reticulum (ER), mitochondrial inhibition, presence of nitric oxide, signalling through secondary messengers such as cAMP, and other in vivo factors (Bohne et al., 1993;Soete et al., 1993;Weiss et al., 1995Weiss et al., , 1998Dubey, 1998;Kirkman et al., 2001;Fox et al., 2004;Narasimhan et al., 2008). Stage conversion that can be triggered by a multitude of external stressors is highly reminiscent of an integrated stress response (ISR) that is controlled by uORFs in other eukaryotes (Reviewed in Young and Wek, 2016).
Another indicator of translational regulation, possibly through uORFs, is phosphorylation of eIF2α which has also been reported for bradyzoite conversion. TgIF2α is phosphorylated during alkaline stress when the developmental shift from tachyzoite to bradyzoite occurs  (Sullivan et al., 2004;Narasimhan et al., 2008). Disruption of this phosphorylation by either deleting TgIF2KB (Augusto et al., 2020) or inhibiting TgIF2KA (Augusto et al., 2018), both kinases responsible for phosphorylating TgIF2α, leads to significant loss of stage conversion.
The molecular factor responsible for the stage conversion was unidentified until the recent discovery of a master regulator, the Bradyzoite Formation Deficient 1 (BFD1) protein that encodes a transcription factor, which triggers the conversion of tachyzoites to the latent tissue cyst form (Waldman et al., 2020). Stress-dependent expression of BFD1 appears to be regulated at the translational level because although the transcript is detected both in tachyzoites and in bradyzoites (a marginal 1.5-to 3.6-fold upregulation in bradyzoites) the protein is expressed only in bradyzoites (Waldman et al., 2020) ( Figure 2).
As bradyzoites can be formed in culture by a variety of stressors and their stage conversion coincides with the phosphorylation of TgIF2α, it would not be far-fetched to infer that uORFs play a role in the process. Most satisfyingly, evidence for the involvement of uORFs in translational regulation was provided by the observation that parasites expressing BFD1 without its 5' leader can differentiate into bradyzoites even in the absence of any stress. This strongly alludes to the presence of regulatory cis-acting elements in the 5' leader that act as a switch to turn on gene expression under stress conditions. The translational switch of the gene has been hypothesized to be under the control of four uORFs present in its 2.7 kb-long 5' leader sequence (Waldman et al., 2020). the switch to bradyzoite formation.

CONCLUDING REMARKS
Given the sheer number of uORFs and wide prevalence of ribosomal footprints on the 5ʹ leader sequences in the Apicomplexan parasites, P. falciparum and T. gondii, their role in mediating translational regulation is certainly under-recognized. Efforts to understand translational regulation in these parasites is gradually gaining momentum (reviewed in Rao et al., 2017) and in this review we highlight selected examples of genes that are regulated by uORFs giving rise to clinically relevant patho-physiology in the life cycles of these parasites. Due to the requirement of novel translation factors that promote non-canonical strategies of handling the "hurdles" created by uORFs, such as reinitiation and leaky scanning, further research in this area may lead to the identification of parasite-specific, essential proteins that might serve as drug targets for therapeutics. We conclude by predicting that, with transcriptome, proteome, ribosome profiling and bioinformatics analyses giving genome-wide pointers towards genes and pathways that might be subjected to uORF-mediated PTGR, the role of uORFs in regulating translation will surely be an area of intense research in the future.

FINANCIAL SUPPORT
This work was partially funded by intramural funds from IIT Bombay. CK is supported by a PhD Teaching-Assistant Fellowship from IIT Bombay.

CONFLICTS OF INTEREST
The authors declare there are no conflicts of interest. Plasmodium falciparum erythrocyte membrane protein 1 is a parasitized erythrocyte receptor for adherence to CD36, thrombospondin, and intercellular adhesion molecule 1.