Two-Step Contractions of Inverted Repeat Region and psa I Gene Duplication from the Plastome of Croton tiglium ( Euphorbiaceae )

Croton L. (Euphorbiaceae) is a very specious genus and consists of about 1,250 species, mainly distributed in the New World. The first complete plastome sequence from the genus, Croton tiglium, is reported in this study (NCBI acc. no. MH394334). The plastome is 150,021 bp in length. The lengths of LSC and SSC are 111,654 bp and 18,167 bp, respectively. However, the length of the IR region is only 10,100 bp and includes only four rrn and four trn genes, and a small part of the ycf1 gene. We propose two-step IR contractions to explain this unique IR region of the C. tiglium plastome. First, the IR contracted from rps19-rpl2 to ycf2-trnL-CAA on the LSC/IRb boundary. Second, the IR contracted from ycf2-trnL-CAA to rrn16-trnV-GAC on the LSC/IRa boundary. In addition, duplicated copies of psaI genes were discovered in the C. tiglium plastome. Both copies were located side by side between accD and ycf4 genes, but one copy was pseudogenized because of a five-basepair (TAGCT) insertion in the middle of the gene and following frameshift mutation. The plastome contains 112 genes, of which 78 are protein-coding genes, 30 are tRNA genes, and four are rRNA genes. Sixteen genes contain one intron and two genes have two introns. The infA gene is lost. Twelve large repeats were detected in the plastome. All large repeats are located in the LSC region. Also, 272 simple sequence repeats (SSRs) were identified. The penta-SSRs accounted for 45% of total SSRs, followed by mono(32%), di(12%), tetra (6%) and tri-SSRs (5%). Most of them were distributed in the large single copy (LSC) region (85%). In addition, 76% of the SSRs were located in the intergenic spacer (IGS). Phylogenetic analysis suggested that C. tiglium is a sister group of Jatropha curcas with 100% bootstrap support. Seven Euphorbiaceae species formed one clade with 100% bootstrap support. Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 23 November 2018 © 2018 by the author(s). Distributed under a Creative Commons CC BY license. doi:10.20944/preprints201807.0458.v2


Introduction
The Euphorbiaceae belongs to Malpighiales and is the seventh-largest family in the flowering plants that contains 6,252 species in 209 genera (APG IV 2016;Christenhusz & Byng 2016).The Euphorbiaceae was classified into five subfamilies at first (Tokuoka 2007;Webster 1994a;Webster 1994b;Wurdack et al. 2005), but was later treated to include four subfamilies (Acalyphoideae, Cheilosoideae, Crotonoideae, and Euphorbioideae) and the subfamily Peroideae was treated as Peraceae, which is an independent family, now (Wurdack & Davis 2009;Xi et al. 2012).
The genus Croton is the second-largest genus after Euphorbia in the Euphorbiaceae, comprising more than 1,250 species (Frodin 2004;Govaerts et al. 2000).This genus belongs to the subfamily Crotonoideae (Tokuoka 2007;Webster 1994a;Webster 1994b;Wurdack et al. 2005).Webster (1993) classified Croton into 40 sections (Webster 1993).Van Ee et al. (2011) divided the genus Croton into four subgenera, Adenophylli, Croton, Geiseleria, and Quadrilobi (van Ee et al. 2011).Croton is distributed all over the world, ranging from America to Africa, Asia, and Oceania, but particularly many species of it are intensively distributed in the New World (van Ee & Berry 2016;van Ee et al. 2011).Various species of Croton are used for wound healing, rheumatism treatment, cancer treatment, or are widely cultivated as ornamental plants (Loureiro 2008;Salatino et al. 2007).
Croton tiglium L. is known as purging croton, and originated in the tropical Asia and China (Loureiro 2008).Croton tiglium belongs to the genus Croton subgenus Croton section Tiglium (van Ee et al. 2011;Webster 1993).As with other Croton species, C. tiglium is used to treat diseases such as rheumatism and cancer, and is also used as a laxative in China (Loureiro 2008;Salatino et al. 2007).
With the development of genome sequencing technology, studies on the evolution of plants using complete plastomes have been actively in progress.Currently, over 2,200 complete plastome sequences can be downloaded from the NCBI database.Among them, in the case of Malpighiales to which Euphorbiaceae belongs, the plastomes of a total of 105 species in nine families have been reported (retrieved July 10, 2018).However, Chrysobalanaceae with 50 species and Salicaceae with 41 species account for most of the foregoing species (Bardon et al. 2016;Huang et al. 2014;Huang et al. 2017).Also, the plastomes of 14 species of Passifloraceae (Cauz-Santos et al. 2017;Rabah et al. 2018), three species of Malpighiaceae (Menezes et al. 2018), one species of Clusiaceae (Jo et al. 2017), one species of Erythroxylaceae, one species of Linaceae (de Santana Lopes et al. 2018), and one species of Violaceae (Cheon et al. 2017) have been reported.In the case of Euphorbiaceae, the plastomes of six species have been reported.The six species belong to three subfamilies; Acalyphoideae (Ricinus communis L.) (Rivarola et al. 2011), Crotonoideae (Hevea brasiliensis (Willd.ex A. Juss) Müll.Arg., Jatropha curcas L., Manihot esculenta Crantz and Vernicia fordii (Hemsl.)Airy Shaw) (Asif et al. 2010;Daniell et al. 2008;Li et al. 2017;Tangphatsornruang et al. 2011) and Euphorbioideae (Euphorbia esula L.) (Horvath et al. 2018).There is no published plastome sequences for Croton.The plastome structure to Euphorbiaceae species, published to date, is almost identical to that of the general angiosperm plastomes.However, a 30 kb inversion has been reported for Hevea brasiliensis (Tangphatsornruang et al. 2011).Given that the complete plastomes of only six species, which are around 0.1% of the 6,252 species of Euphorbiaceae, have been reported, if more species are studied hereafter, many variations may be discovered.
Therefore, this study reports for the first time the complete plastome for a Croton species.Through this study, it was indicated that the IR region in the plastome of C. tiglium was shortened by two-step contractions.In addition, we also found psaI gene duplication and following pseudogenization in the C. tiglium plastome.The amounts and distributions of the large repeats and the simple sequence repeats (SSRs) were compared among seven Euphorbiaceae plastomes.These results are expected to be used for the development of molecular identification markers for Croton species and for studies of Euphorbiaceae plastome evolution.

DNA extraction, Sequencing and Annotation
Leaves of C. tiglium used in this study were collected from the Korea University greenhouse, where we grew the plants from seeds that were originally collected in Indonesia.The plants flowered and fruited in the greenhouse.A voucher specimen was deposited in the Korea University Herbarium (KUS acc. no. 2014-0242).Fresh leaves were ground into powder in liquid nitrogen and total DNAs were extracted using the CTAB method (Doyle 1987).The DNAs were further purified by ultracentrifugation and dialysis (Palmer 1986).The genomic DNA was deposited in the Plant DNA Bank in Korea (PDBK) under the accession number PDBK2014-0242.
Approximately 200 ng of DNA were used for library construction and raw sequence reads were generated using an Illumina HiSeq 2000 system (Illumina, Inc., San Diego, CA).A total of 26,602,848 raw reads was generated.Among them, 1,396,325 plastome reads were filtered and collected using Geneious v.6.1.8(Biomatters, Inc., Auckland, New Zealand) (Kearse et al. 2012).The plastome sequences of Manihot esculenta (NC_010433) were used as filtering reference.The collected plastome reads were subjected to generated a single contig to cover a whole plastome.
The IR contractions and their boundaries were confirmed by PCR method.We synthesized four following primers: ycf2F-ACGCAAGAAAAGCACTTCGA, trnLR-GAATGGGGAGTCCGCTTTGA, trnHF-AAGATGAAGAAGCACCCGCA and rrn16R-ATTCGGAAAATGGGGCTGGT.PCR amplifications were carried out using all possible combinations of primers.The PCR condition was 3 min initial denaturation at 95℃; followed by 30 cycles of 1 min annealing at 60℃, 1 min extension at 72℃ and 1 min denaturation at 95℃; and 7 min final extension at 72℃.The PCR products were sequenced by Sanger method.The IR boundary were confirmed by sequence comparison.
Gene annotations were performed using the National Center for Biotechnology Information (NCBI) BLAST and tRNAscan-SE programs (Lowe & Eddy 1997).The circular map was drawn using the OGDRAW program (Lohse et al. 2013).The raw read sequence data and the completely annotated plastome sequence data were deposited on NCBI GenBank database under acc.no.SAMN09982495 and acc.no.MH394334, respectively.

Comparison of structural differences in the Euphorbiaceae plastomes
The Mauve alignment (Darling et al. 2004) of Geneious v.6.1.8was used to compare C. tiglium with six Euphorbiaceae plastomes.In addition, MUSCLE v.3.8.425 alignments (Edgar 2004) with the psaI genes of six Euphorbiaceae were carried out to identify the psaI and ΨpsaI genes existing between accD-ycf4 in C. tiglium.

Repeat Analysis in the Croton tiglium Plastomes
Large repeats in C. tiglium were analyzed using the REPuter (bibiserv2.cebitec.unibielefeld.de/reputer)(Kurtz et al. 2001).Forward, reverse and palindromic repeats were identified.In addition, we confirmed the large repeats in the six Euphorbiaceae plastomes.For all repeat types, the minimal size and similarity were 26 bp and 100%, respectively.

Phylogenetic Analysis
In order to identify the phylogenetic position among Euphorbiaceae taxa, we reconstruct the phylogenetic tree.The whole-plastome sequences of 27 species belonging to fabids were downloaded from NCBI (Table S2).Seventy-six protein-coding and four rRNA gene sequences from plastome were used for alignments.Three plastome genes including infA, rpl32, and rps16 were excluded from the analysis because these genes were deleted from several species.Each gene sequences were aligned using the MUSCLE v.3.8.425 program (Edgar 2004) and all gene sequences were concatenated.Maximum likelihood (ML) analysis of the concatenated sequences (79,798 bp) was conducted using RAxML v 7.7.1 (Stamatakis et al. 2008).

Structure and Gene Contents of Croton tiglium Complete Plastome
The plastome of C. tiglium was sequenced by Illumina HiSeq 2000 platform.The data was summarized in Table 1.The plastome of C. tiglium shows a quadripartite structure similar to the plastome structure of common flowering plants (Fig. 1).The plastome is 150,021 bp in length.In previous studies, the lengths of Euphorbiaceae plastome ranged from 160,512 bp (Euphorbia esula) to 163,856 bp (Jatropha curcas) (Fig. 2).The length of C. tiglium plastome was shorter by around 10 ~ 13kb compared to them.This is because the length of the IR region was shortened and the length of the LSC region was increased (Fig. 3).Despite the length of the C. tiglium plastome being shorter than that of other species, the length of SSC is around 1 kb longer compared to Euphorbia.esula and Jatropha curcas.
The plastome comprises 112 unique genes (78 protein-coding genes, 30 tRNA genes and four rRNA genes).Four tRNA and four rRNA genes are duplicated in the IR regions (Fig. 1, Table 2).Protein-coding genes only appeared in the LSC and SSC regions.The 16 genes have one or two introns (Table 2).Two genes (clpP and ycf3 genes) have two introns.The others have one intron.Both the psaI and the ΨpsaI are present between accD and ycf4.ΨpsaI was pseudogenized as TAGCT was inserted (Fig. 4).In the six previously reported Euphorbiaceae plastomes, only psaI exists between accD and ycf4 (Asif et al. 2010;Daniell et al. 2008;Horvath et al. 2018;Li et al. 2017;Rivarola et al. 2011;Tangphatsornruang et al. 2011).Jatropha curcas is 9 bp longer than other species.The reason for this is that whereas the poly A of other species is 7 bp, that of Jatropha curcas became 6 bp, leading to an increase in the length of psaI (Fig. 2).The infA gene was lost in C. tiglium, which is identical to the results of previous Euphorbiaceae plastome studies.

Comparison of Plastome Structures in Euphorbiaceae
We compared the plastome structures among seven plastomes in Euphorbiaceae.Unlike six other complete plastomes, the length of the IR region of C. tiglium was only 10,100 bp and it included only four rrn and four trn genes, and a small part of the ycf1 gene.The IR region in other Euphorbiaceae are usually more than 26,000 bp in length (Fig. 2).In order to explain the current short IR in the C. tiglium plastome, we propose a two-step contraction mechanism based on comparative analyses with other plastomes (Fig. 3).The rps19-rrn16 region is generally located at IR in angiosperms (Kim & Lee 2004;Shinozaki et al. 1986;Yi & Kim 2012), and in other Euphorbiaceae plastomes, too (Table 3).However, it is located in the LSC region in C. tiglium.This is because the first IR contraction occurred in the region from rps19-rpl2 to ycf2-trnL-CAA and then the second IR contraction occurred in the region from ycf2-trnL-CAA to rrn16-trnV-GAC (Figs. 1 and 3).As a result, the rps19-trnL-CAA region is located at the 3'-LSC and the ycf2-rrn16 region is located at the 5'-LSC.These forms of IR contractions are reported for the first time.More studies are necessary on whether these IR contractions are a phenomenon occurring in certain taxa in the genus Croton or are observed in other genera as well.
The large inversion of 30 kb previously reported in the LSC region (trnS-GCU-trnT-GGU) of Hevea brasiliensis (Tangphatsornruang et al. 2011) was identified as not being present in Croton.

Distribution of Large and Simple Sequence Repeats
Dispersed repeat sequences in plant plastomes have been analyzed in order to understand the evolutionary roles of the sequences (Gupta & Varshney 2000;Muriira et al. 2018;Powell et al. 1995;Provan et al. 2001;Timme et al. 2007).A total of 12 large repeats were found in C. tiglium plastome (Table 4).Among them, forward repeats occupied the majority with nine, and the numbers of reverse and palindromic repeats were one and two, respectively.The large repeats of seven Euphorbiaceae plastomes, including C. tiglium, were compared in Fig. 4. In the results, the number of large repeats of Jatropha curcas was the largest with 37 and the number of large repeats of Euphorbia esula was the smallest with eight (Fig. 5A).In all seven species, most of the large repeats were located in the LSC, and in particular, unlike other species, in the case of C. tiglium, all large repeats were present in the LSC (Fig. 5B).The large repeats of Euphorbia esula and Hevea brasiliensis were not present in the SSC.In the case of Ricinus communis no large repeat was present in the IR.
With regard to simple sequence repeats (SSRs), a total of 272 SSRs were found in the C. tiglium plastome.In order to compare the numbers of SSRs, the SSR numbers were also calculated from six other published Euphorbiaceae taxa.The SSR numbers are 214 in Euphorbia esula, 245 in Hevea brasiliensis, 250 in Jatropha curcas, 201 in Manihot esculenta, 283 in Ricinus communis and 259 in Vernicia fordii.The SSR numbers are similar each other among the Euphorbiaceae plastomes, but they are 2-3 times higher than the plastomes from other families (Kim & Lee 2004;Shinozaki et al. 1986;Yi & Kim 2012).Among them, the numbers of penta-SSRs and mono-SSRs were 123 and 87, respectively, and accounted for 77% of the entire SSRs (Fig. 6A).These results are in sharp contrast to previous results regarding other taxa, in which mono-SSRs were the most abundant and penta-SSRs were rarely present (Cauz-Santos et al. 2017;Gu et al. 2018;Saina et al. 2018).The largest number of SSRs was located in the LSC (232), followed by the SSC (37) and the IR (3) (Fig. 6B).This is because the length of the LSC is as long as 111,654 bp and the rate of variations is high.Unlike the length of a general angiosperm LSC, which is around 85 kb, C. tiglium has a larger LSC due to the two-step IR contractions, so that the distribution ratios of SSRs increased in the LSC and decreased in the IR.Most SSRs were distributed in the IGS and 34 and 32 SSRs were distributed in the intron and the CDS, respectively (Fig. 6C).This is one of the reasons why the IGS has higher sequence divergences than the CDS.These SSR loci may be usefully used in interspecific or intergroup comparative studies.

Phylogenetic Analysis
The phylogenetic position of C. tiglium was identified using 28 complete plastome sequences belonging to the Fabids (Fig. 7).Croton tiglium formed a clade with Jatropha curcas (BS = 100%) and Vernicia fordii were located as a sister group of them (BS = 100%).They share inaperturate seed traits and belong to the subfamily Crotonoideae (Tokuoka 2007;Wurdack et al. 2005).Manihot esculenta and Hevea brasiliensis form a clade (BS = 100%) in our tree.They formed a sister clade of the Croton-Jatropha-Vernicia clade.They also belong to the subfamily Crotonoideae but they share articulated seed traits (Tokuoka 2007;Wurdack et al. 2005).Euphorbia esula, which belongs to the subfamily Euphorbioideae, was located as a sister group of the Manihot-Hevea clade (BS = 85%).Our results indicate the subfamily Crotonoideae is The results of this study indicate that Euphorbiaceae are sister group of the Salicaceae-Passifloraceae-Violaceae-Linaceae clade.However, their relationships are uncertain because there is no information on the plastomes of Rafflesiaceae and Peraceae (Davis et al. 2007;Wurdack & Davis 2009;Xi et al. 2012), which are thought to be closely related to Euphorbiaceae.Hereafter, clear relationships will be established if the whole plastome sequences of Rafflesiaceae and Peraceae are added.
Phylogenetic studies of Croton have been conducted several times using partial sequences such as matK, rbcL, trnL-F, and ITS (Berry et al. 2005;Haber et al. 2017;van Ee & Berry 2010;van Ee et al. 2011;Wurdack et al. 2005).Phylogenetic trees made using rbcL had poor resolutions of Croton.However, monophyly was formed in the phylogenetic tree made using trnL-F (BS = 100%).Monophyly was also formed along with 100% support in the phylogenetic tree made using combined rbcL and trnL-F data.In Croton's biogeography studies, C. tiglium formed a clade with C. acutifolius, C. argyratus, C. cascarilloides, and C. megalobotrys (Haber et al. 2017).
The first results on complete plastome of Croton obtained in this study will be useful reference sequences for the further phylogenetic and biogeographic studies.

Conclusions
In this study, we reported the complete plastome of Croton tiglium (Euphorbiaceae).The length of C. tiglium plastome was shorter by around 10 ~ 13kb compared to the plastomes of other Euphorbiaceae members.It is primarily due to the short invert repeat (IR).The length of the invert repeat (IR) region is 10,100 bp and it includes only four rrn and four trn genes, and a small part of the ycf1 gene.In order to explain this unique IR structure, we propose two-step IR contraction model.First, the IR contracted from rps19-rpl2 to ycf2-trnL-CAA on the LSC/IRb boundary.Second, the IR contracted from ycf2-trnL-CAA to rrn16-trnV-GAC on the LSC/IRa boundary.The first contraction must occur prior to the second contraction.In addition, duplicated copies of psaI genes were discovered in the C. tiglium plastome.Both copies were located side by side between accD and ycf4 genes, but one copy was pseudogenized because of a five-basepair (TAGCT) insertion and following frameshift mutation in the middle of the gene.This work was supported by the National Research Foundation of Korea (NRF).The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Grant Disclosures
The following grant information was disclosed by the authors: National Research Foundation of Korea (NRF): NRF-2015M3A9B8030588