The Meaning of the Microprocessor: Accounting for Evolution of Structural-Functional Novelty in the Canonical microRNA Biogenesis Pathway

Subsidiary to detection and assignment of novel microRNAs in non-model taxa, it is standard to identify and compare genomic or transcript sequence of Drosha and Pasha. Detection of both (1) bona fide microRNAs and (2) presence of Drosha/Pasha orthologs is often assumed to represent a functional canonical eumetazoan microRNA biogenesis pathway. However, this is not often experimentally confirmed in non-model taxa, and therefore the assumption is not necessarily valid. Below I describe several lines of evidence for this assertion.

Placozoa phylums. Many such sequence comparisons are found in the literature (Moran et al. 2013, Kerner et al. 2011, Jin et al. 2009, Murphy et al. 2008, Robinson et al. 2013, Robinson 2015 . 1). I (and colleague Dr. Ben Busby) have also observed putative Drosha-like BLAST hits in deeply branching eukaryotic taxa (unpublished data) and even in prokaryotes, however we did not resolve whether these represent contamination, horizontal gene transfer, results of long-branch attraction, or an 'early origin' for Drosha RNAseIII.
Recently published work has shown Drosha homologs and miRNAs are present within non-metazoan holozoans (mostly colonial, often parasitic protists diverging after Fungi but before Metazoa) from sequencing and analysis of holozoan Icthyosporea (syn. Mesomycetazoa) also provide Drosha and Pasha sequence comparisons as criteria for a canonical metazoan (now holozoan) microRNA biogenesis pathway. (Brate et al. 2018) Dicer is the widely conserved eukaryotic protein which cleaves dsRNA in the cytoplasm for the RNA interference pathway, also shows the arrangement of 2 RNAseIII and 1 dsRBD C-terminal domains strikingly similar to Drosha, although the rest of the N-terminal portion. The of Drosha and Dicer C-terminal domains is widely accepted, and due to this similarity Drosha is taken to be result of a Dicer duplication.
In Grimson et al. 2008, microRNAs in the demosponge Amphimedon queenslandica were reported with a subset conserved in a diversity of additional demosponge taxa (Wheeler et al. 2009). Among these conserved microRNAs, no orthologs were found for the many conserved Eumetazoan microRNAs or other poriferan classes (the calcisponges and homoscleromoph sponges) (Robinson et al. 2013). Both Grimson et al. 2008, Robinson et al. 2013, and Robinson 2015 report alignments of Drosha and Pasha sequences from the respective taxa. Findings of microRNAs in additional demosponges Stylissa carteri and Xestospongia testudinaria likewise report orthologous sequences for Drosha and Pasha (Liew et al. 2016).
Cnidarian miRNAs and Drosha and Pasha orthologs were identified in sequence data for Nematostella (Putnam et al. 2007) and Hydra (Chapman et al. 2010). An analysis of Drosha/Pasha evolution in Metazoa including cnidarian taxa can be found in Moran et al. 2013. Of interest is that cnidarian orthologs show more overall sequence similarity with bilaterian Drosha and Pasha than with other basal metazoan phyla, particularly in conservation of N-terminal functional domains (vs. C-terminal RNAseIII and dsRBD structural domains). This indicates that functional role of the N-terminal domains corresponds with the Eumetazoa, as such N-terminal structure has not been shown in Porifera, or in Icthyospora. microRNA candidates are absent from placozoan Trichoplax adhaerens ), which has a Drosha but no Pasha ortholog (Srivastava et al. 2008). microRNA candidates are not found in ctenophore Mnemiopsis leidyi, nor are Drosha or Pasha orthologs (Maxwell et al. 2012). These support various scenarios of acquisition and loss, or basal absence depending on relative phylogenetic position of Ctenophora and Placozoa (Schierwater et al. 2016), yet all are consistent with the assumption that Drosha and Pasha orthologs represent presence of a canonical metazoan biogenesis pathway.
Presence of Drosha and Pasha orthologs with putative microRNAs is therefore reasonably standard criteria when identifying microRNAs in basal taxa outside the range of well-studied models as 'canonical microRNA'. Despite this, some authors have maintained convergent evolution of microRNA in phyla with Drosha and Pasha orthologs (Robinson et al. 2013). In Robinson et al. 2013, reporting miRNAs in calcisponge and homoscleromorph sponges (Porifera), microRNA is presented as having had multiple independent origins with respect to eumetazoan miRNAs. In the discussion of that paper, we argued that while individual microRNAs are not specifically homologous (because of sequence dissimilarity), conservation of microRNA biogenesis processes in general is unclear and should be contingent on the experimental determination that Drosha and Pasha orthologs were functionally active in producing the observed small RNAs.
Drosha, Pasha, and the Microprocessor complex perform many functions other than miRNA biogenesis.
Drosha and Pasha are both known to perform non-miRNA functions.
Canonical miRNAs were first discovered in the late 20 th /early 21 st century (Fire et al., 1998, Lau et al., 2001, Lee and Ambros, 2001, the miRNA originating from endogenously transcribed hairpins to regulate gene expression post-transcriptionally by targeting of partially complementary sequences. Proteins Drosha (vertebrate RNASEN) and Pasha (vertebrate DGCR8) were 'necessary and sufficient' for the recognition and cleavage of precursor stem-loops from primary transcripts (pri-miRNA) during microRNA biogenesis (Gregory et al., 2004, Denli et al., 2004, Han et al., 2004. Pasha biochemistry showed that two C-terminal double-stranded RNA binding domains (dsRBD) facilitated recognition and binding of primary-miRNA by Pasha, and that the N-terminal domain of Pasha contained canonical WW (Tryptophan-Tryptophan) sequences (Landthaler et al., 2004, Yeom et al., 2006. Drosha was originally known as a ribosome biogenesis factor. Two tandem, C-terminal RNAse III domains performed cleavage of the precursor stem-loop structure. In Eumetazoans, Drosha possess an N-terminal proline-rich domain, an arginine-serine rich domain, and a central, conserved domain of unknown function (DUF) (Wu et al., 2000, Lee et al., 2003, Han et al., 2006.
The RNAse III domain, its structure and mechanism of cleavage are historically well-studied (Court et al., 2013). Functional roles of the Drosha N-terminal P-rich, SR-rich domains are not yet fully known. Proline-rich and arginine-serine rich domains function in protein-protein interaction and spliceosomal interactions, respectively, in other protein families (Shepard and Hertel, 2009, Long and Caceres, 2009, Kay et al., 2000. Pasha contains a heme-binding domain, required for miRNA processing and recognition of pri-miRNAs. Its associated WW domain facilitates dimerization of Pasha in the binding of heme (Faller et al., 2007, Senturia et al., 2010, Weitz et al., 2010, Quick-Cleveland et al., 2014. This structure-function relationship was even show experimentally to be conserved in Deuterostome invertebrates (Senturia et al., 2012).
Although the structure and function of the central Drosha DUF is unknown, the N-terminal RS and Proline rich domains of vertebrate Drosha are shown to associate with the promoter to regulate transcription independently of its miRNA cleavage function (Gromak et al., 2013). Furthermore, Drosha has miRNA independent functions in mRNA cleavage and rRNA processing (Johanson et al., 2013), and may have roles in regulation of splicing (Havens et al., 2014). Arginine-serine (RS) domains are found in a large class of spliceosomal regulators (Shepard andHertel, 2009, Long andCaceres, 2009); therefore, this domain in Drosha may play a role in the spliceosome associated miRNA processing or even represent a miRNA independent spliceosomal role for the microprocessor (Agranat-Tamir et al., 2014, Kataoka et al., 2009. That the N-terminal portion of the DUF overlaps with part of the vertebrate RS-domain may show that highly conserved part of this domain has some function in these processes, for example, it is possible that the vertebrate-specific proline domain facilitates an interaction with promoter elements not present in invertebrates. Conserved central DUF in Drosha plays a role in miRNA or other regulatory functions must be determined experimentally, it does not appear to play a role in establishing conserved miRNA complements, as it is present in Sycon but not Leucosolenia, calcisponge species that have been shown to possess at least one conserved miRNA (Robinson, 2015). Functional transcriptomics studies have elucidated many novel interactions for Pasha not clearly related to its miRNA functions (Macias et al. 2012, Kadener et al. 2009 A strict structural definition for "bona fide microRNA" begs the question of a structure-function relationship. Uncertainty in parsing small ncRNA in disparate non-eumetazoan taxa have led research groups to propose and implement nomenclature schemes to describe and define miRNAs and the diversity of non-canonical small noncoding regulatory RNA found in organisms. Newer schemes have placed effort on providing a naturalistic framework reflective of underlying evolutionary processes than the original database MiRBase (Griffiths-Jones et al. 2006).
A rigorously exclusive definition for 'bona fide microRNA' has also been implemented (Fromm et al. 2015), which have reduced by over half the number of microRNAs described as such in miRBase. Criteria for defining a bona fide microRNA are provided and elaborated upon, these include origin from genomic loci forming specific transcribed hairpin structure, location of the mature miRNA sequence in a specific orientation on the hairpin, and evidence of a paired duplex representing the complementary 'mate' (the 'star' sequence ie. miR-X, miR-X*) of the functionally mature microRNA. Additional meta-criteria are formalized in Fromm et al. 2015, which place value on conservation of sequence parameters for individual microRNA genes between species.
A limitation for such definitions is that the structure-function relationship must be inferred, for example so that while the defined miRNA presumably represents a 'bona fide' end product of an evolutionarily conserved mechanism, the relationship between the structure and functional molecular mechanism producing it remains poorly understood in its context. Bona fide microRNAs have been shown in eumetazoans, plants, and brown algae, for example, however these are clearly paraphyletic and therefore do not represent a conserved processing mechanism despite their grouping as 'bona fide'. (Tarver et al. 2015) Desvignes et al. 2015 present a broad nomenclature describing an inclusive diversity of small noncoding RNAs, such as lncRNAs, miRtrons, endogenous siRNAs. The authors include the spectrum of noncoding RNA species ('gene level'), but also provide a classification based on downstream processing of the transcribed RNA ('precursor level'). These provide a high-level hierarchy of molecular mechanism, a fair advancement when attempting to contextualize related, diverse ncRNA origins and processing pathways. An inconsistency introduced in this scheme however, is that all end-product small ncRNAs are described as 'miRNAs' regardless of the origin of the precursor duplex. The broad sweep of small ncRNAs would more appropriately as 'short noncoding regulatory RNAs, rather than microRNAs which are historically defined according to the criteria above. Budak et al. 2016 for example also describe the need for a nomenclature of post-transcriptional miRNA modifications and non-canonical miRNA species.

The cytoplasmic effector functions of bona fide microRNAs require the conserved eukaryotic RNAi machinery of Dicer and Argonaut.
Dicer and Argonaute proteins are the cytoplasmic effectors present and capable of facilitating RNAi from double stranded RNA sources, in almost every eukaryotic taxa, and appear to be an ancient mechanism against viral parasitism (Shabalina and Koonin, 2012). microRNA was discovered in the model organisms, nematode worm C. elegans and green plant Arabidopsis. In elucidating the molecular mechanism responsible for miRNA biogenesis, it was determined that green plants (ie. Streptophyta) and Eumetazoa produced microRNAs via different molecular mechanisms. This is taken as indication that miRNA biogenesis in plants and animals are of independent evolutionary origin.
It is something of an oversimplification however: plant and animal microRNA pathways do share common components: both plant and animal microRNA require the conserved eukaryotic RNA interference (RNAi) machinery for their effector function. This is principally the Dicer protein, an RNase III enzyme performing cytosolic cleavage of dsRNA, and the Argonaute proteins, which provide the enzymatic functions of targeting and transcript regulation for the microRNA itself. Both are found so universally, that it is clearly molecular synapomorphy in the eukaryotes, while it is a Dicer duplication that facilitates nuclear pri-to pre-processing in the plant nucleus. Even many of those taxa which lack microRNA under the standard definition, are able to effect RNAi functionality through experimental introduction of dsRNA, due to Dicer and Ago.
In the context of the 'homology' and 'convergence' labels it is therefore incorrect to describe 'microRNA' (ie. the total manifestation of the microRNA biogenesis pathway) as a biological entity wholly separate from RNAi, even though conserved microRNA loci and individual protein members could be described as homologs and orthologs. More accurately, 'canonical eumetazoan microRNA' can be described as a derived and specialized subset of RNA interference; novel in terms of the specific origin of microRNA transcripts and Drosha/Pasha biogenesis machinery, yet also dependent upon and derived from pre-existing functional pathways.

Conclusion
My goal has been to highlight three philosophical issues on the topic of "evolution of miRNA biogenesis in the Eumetazoa".
First is that, while Drosha and Pasha homologs are found in various taxa, they may not add up to 'canonical miRNA biogenesis' due to the fact that N-terminal protein-protein interaction domains and other domains found in Eumetazoa may not be present in lower, non-Eumetazoan taxa. Functional interactions of the Drosha Nterminal protein-protein interaction domains may have gained or lost sequence and function based on adaptive evolutionary pressures, as it appears to have done, for example, in Placozoa.
Second is that, categorization of miRNAs based on structural components of the RNA itself may result in paraphyletic grouping, for example as in the case of plant, animal, and brown algal miRNAs as 'bona fide', nonetheless having different, or convergently evolved biogenesis pathways.
Finally, that ancient conserved cytoplasmic RNAi machinery in the Eukaryotes, is still used, with modification, to effect gene regulation via the canonical eumetazoan miRNA pathway. Canonical eumetazoan miRNA therefore cannot be totally separated from the Dicer/Ago functionality. Nuclear processing of pri-to pre-miRNA in eumetazoans is portrayed as residing in the novelty of the 'microprocessor'. Yet, the 'microprocessor' entity is really more of a cobbled-together association of deeply conserved RNAseIII and dsRBD domain proteins, with an ad-hoc protein-protein interaction network that evolved stepwise to gain multiple specific functions along the evolutionary trajectory. This rather than being representative of a single de novo origin of novel miRNA biogenesis due to duplication or appearance of a novel RNAseIII representative (ie. Drosha).
In conclusion, continuation of such research provides important foundation for understanding the likelihood and frequency, and molecular trends involved in evolving a complex and sophisticated molecular mechanism such as miRNA biogenesis.