Complete genomic sequence of Noni mosaic virus (NoMV) associated with a mosaic disease in Morinda citrifolia L.

An outbreak of a virus-like disease has caused severe damage to noni plants (Morinda citrifolia L.) in Xishuangbanna area of Yunnan province in southwest China since 2015. The diseased plants displayed typical mosaic symptom with light and dark green patches on leaves. Flexuous filamentous virus particles of about 800 nm in length were observed from the leaf saps by transmission electron microscope. Illuemina transcriptomic sequencing further revealed the presence of a potyvirus and its near complete genome was obtained from de novo assembly. The complete genome of 9659 nts was obtained by Sanger sequencing of eight amplicons generate by RT-PCR and 5′ and 3’ RACE. BLASTp analysis of the polyprotein sequence showed that the virus was most closely related to Tobacco vein banding mosaic virus (TVBMV), but these two viruses only shared 50.7% amino acid sequence similarity. Both phylogenetic analyses of the polyprotein and CP amino acid sequences indicated that this virus is a member of genus Potyvirus. However, the low sequence homology with all known potyviruses established this virus as a new species in the genus, tentatively named as Noni mosaic virus (NoMV). Our field surveys showed that 100% of the symptomatic samples and 28.57% of the asymptomatic samples were infected with this novel potyvirus. Aphids collected from diseased leaves were also detected carrying the virus and aphid transmission test confirmed it can transmit the NoMV. In summary, our data indicated that a novel species of potyvirus, NoMV, is prevalent in Yunnan, China and is associated with an emerging mosaic disease on M. citrifolia.


Introduction
Plant viruses have serious threat to crop production and human food safety due to their prevalence and outbreak. According to the incomplete statistics, plant virus infection annually causes about 10% reduction of global crop production (Qian et al. 2014). The development of Next Generation Sequencing (NGS) has provided a powerful tool for plant pathogens diagnosis, especially for the novel virus identification. Many plant viruses have been discovered by the NGS method, such as grapevine Muscat rose virus (GMRV) (Diaz-Lara et al. 2019), Apple rubbery wood virus (ARWV) (Rott et al. 2018), Tea plant necrotic ring blotch virus (TPNRBV) (Hao et al. 2018) and Areca palm necrotic ringspot virus (APNRV) (Yang et al. 2019).
Noni (Morinda citrifolia L.) is a fruit-bearing tree cultivated across tropical or sub-tropical regions of Southeast Asia, Australia and the Pacific Islands. Its fruits are traditionally used as a medicinal herb in many countries (Ahmad et al. 2016;Torres et al. 2017). The agricultural planting of noni began as early as 2000 in Hainan and Yunnan Provinces, China. At present, it has been grown in most provinces in South China. With the increased acreages and continuous cropping, diseases such as blight and fruit rot caused by Phytophthora botryosa (Gan and He 2004) and anthracnose caused by Colletotrichum spp. (Wang et al. 2015) have been reported and may become threats to the development of the noni industry. However, no viral disease of noni has been reported up to now.
In 2015, a virus-like disease was found in noni plants in Xishuangbanna, Yunnan, China. The leaves of diseased plants had striking mosaic symptoms with light and dark green patches. At present, this virus-like disease seriously affects the cultivation and production of noni plant in Xishuangbanna. In such cases, identification of the viral pathogen present in diseased noni plants is significant for scientific and agronomic interest. In this study, NGS technology and Sanger sequencing were used to demonstrate that the putative causal agent of this noni disease is a novel potyvirus with distinctive molecular characteristics. This study is the first report of the novel potyvirus, tentatively named as Noni mosaic virus (NoMV) worldwide.

Plant materials and electron microscopy observation
In 2015, leaf samples of thirty-one diseased noni plants (Morinda citrifolia L.) showing typical mosaic symptoms with light and dark green patches (Fig. 1a) were collected from Xishuangbanna, Yunnan, China. To determine if the diseased plants were infected by virus, leaf-saps were prepared for transmission electron microscope (TEM) examination. Briefly, fresh leaves of five healthy or five diseased samples were ground in 1 × PBS at a final concentration of 0.1 g/ml. The grounded samples were centrifuged at 5000 rpm for 2 min to obtain the supernatants, which were then loaded onto copper grids (200 meshes) individually. The copper grids were negatively stained by 1% phosphotungstic acid for 2 min, dried under tungsten lamp for about 10 min, and then observed under TEM (HT7700, Hitachi). The width and length of viral particles were measured by using Adobe Photoshop CS3 software.
Library preparation for transcriptome sequencing Total RNA was extracted from five diseased noni plants by using a Quick RNA Isolation Kit (Bioteke, Beijing, China) according to the manufacturer's instructions. RNA quality, including purity, concentration, and integrity, were confirmed by using Nanodrop, Qubit 2.0, and Agilent 2100 before processing to cDNA library preparation.
The cDNA library was prepared in the following steps. First, mRNA was enriched from total RNA by Oligo(dT) magnetic beads and then randomly interrupted by adding fragmentation buffer. The fragmented mRNA was primed by random hexamers and reverse-transcribed into first strand cDNA, which was then treated with RNase H to remove RNA and used as a template for second strand cDNA (ds cDNA) synthesis using DNA polymerase I. The ds cDNA was purified by using AMPure XP beads and subjected to end repair and dA-tailing. Subsequently, the adaptors were added into the ds cDNA and the cDNA library was further enriched by PCR amplification. Before highthroughput sequencing, the concentration and insert size of the cDNA library were analyzed by Qubit 2.0 and Agilent 2100, respectively. Finally, the library was sent to Biomarker Biotechnology Corporation (Beijing, China) for deep sequencing, which was performed on HiSeq4000 with paired-end read length at 150 bp.

Viral genome assembly
Raw data was cleaned by filtering low-quality reads and trimming the adaptors. High-quality clean reads were mapped to viral sequences downloaded from NCBI website (https://www.ncbi.nlm.nih.gov/) using the TopHat software (Trapnell et al. 2009). HTSeq v0.5.4 p3 was used to count the number of reads mapped to viral sequences (Anders et al. 2015). Contigs were assembled de novo from the clean reads using Trinity 2.1.1 (Haas et al. 2013). These assembled contigs or unigenes were used to remap the viral reads for the second round to obtain more accurate results. The viral contigs were further assembled into the full length potyvirus-like sequence in CodonCode Aligner 6.0.2 (CodonCode, Centerville, MA). The resulting full length potyvirus-like sequence was subjected to BLASTx search against non-redundant protein database (GenBank).
Sanger genome sequencing of a novel potyvirus Sanger sequencing was used to confirm the genome sequence of the novel virus, designated as Noni mosaic virus (NoMV), arising from deep sequencing and de novo assembly. Briefly, seven primer pairs (Supplementary Table S1) covering the near complete genome of NoMV were designed based on the assembled putative genome. Similarly, two nested primer pairs targeting 5′ end were also designed. RNA extraction was conducted as described earlier. Random hexamer and Oligo(dT) were used in reverse transcription. PCR was carried out using Phusion High-Fidelity DNA Polymerase (Thermo Fisher Scientific). The 5′ end fragment was amplified using a 5' Rapid Amplification of cDNA Ends (RACE) kit (Invitrogen) according to the manufacturer's instructions. Each PCR fragment was cloned into pMD18-T vector (Takara, Beijing, China), and three independent clones from each fragment were subjected to Sanger bidirectional sequencing (Sangon Biotech, Wuhan, China). High quality sequencing results were overlappingassembled using BioEdit (version 7.0.9.0) to obtain the complete genome of NoMV (Hall 2013).

NoMV genome analysis
The complete genome of NoMVobtained from previous step was subjected to sequence analyses. Firstly, putative ORFs of NoMV were identified by using ORF finder (https://www.ncbi.nlm.nih.gov/orffinder/) and bioinformatics analysis. Identified ORFs along with amino acid sequence were then used in BLAST. Sequences with high similarity to NoMV were downloaded and subjected to alignment and pairwise comparison using BioEdit (version 7.0.9.0).

Phylogenetic analysis
To determine the phylogenetic relationship of NoMV with Potyvirus, phylogenetic trees were constructed based on the amino acid sequences of polyprotein and coat protein (CP). 20 potyviruses were chosen for alignment with NoMV because of their close relationship with NoMV revealed by BLAST search. These potyviruses are Narcissus yellow stripe virus (NYSV, All amino acid sequences were aligned using ClustalX, then passed to MEGA 6.0 for tree building using Neighbor-Joining method with 1000 bootstrap replicates (Tamura et al. 2013).

Field survey
To investigate the prevalence of the viral disease in Xishuangbanna City, Yunnan Province, three field surveys in different places were conducted from March to May in 2016. Leaf samples of 21 asymptomatic and 67 symptomatic noni plants were randomly collected and seven asymptomatic and 13 symptomatic samples were further selected for RT-PCR detection using a pair of primers specifically targeting NoMV CP gene (NMV-F and NMV-R, see in Supplementary Table S1).

Aphid transmissibility
During field survey, aphids and whiteflies found on the back of noni plant leaves were also collected for NoMV detection to determine if aphids or whiteflies were carriers of the NoMV. Approximately 50 aphids or 100 whiteflies were pooled to extract total RNA using TRIzol reagent. RT-PCR detection of NoMV was performed as described above. Transmissibility of NoMV by aphids was tested with virus-carrier apterous adults of Aphis atrata Zhang. The five aphids were transferred to a healthy noni plant for 30 min inoculation feeding and 10 noni plants were inoculated. The symptoms on leaves were observed every day and RT-PCR detection of NoMV was performed as described above.

Results
Potyvirus-like particles were found in diseased noni leaf sap In 2015, leaves of thirty-one noni plants displaying typical mosaic symptoms with light and dark green patches (Fig. 1a) were collected from Xishuangbanna, Yunnan, China. In order to determine if the diseased plants were infected with a virus, leaf-sap of the diseased and healthy leaves were prepared and observed under transmission electron microscope (TEM). The results showed that viral particles of typical potyviruses were observed in the saps of the diseased but not from healthy leaves. These flexuous filamentous particles were at 800 ± 20 nm in length and 20 ± 1 nm in wide (Fig. 1b). Some of the virus particles formed large aggregates (Fig. 1c). No virus particle was found from the healthy samples.
A nearly full length potyvirus-like sequence was assembled from transcriptome sequencing Raw data were filtered to obtain high-quality clean data, which were then mapped to the virus database in NCBI GenBank. A total of 644,392 viral reads were identified. The mapped viral reads were assembled into contigs or unigenes. A total of 40,911 contigs and 318 unigenes were assembled (Table 1). In detail, the length range of contigs in 200-300 bp accounted for 99.67%, while the length range of contigs in 300-2000 bp were 0.33%, and only two contig was longer than 2000 bp. Meanwhile, the length ranges for unigenes in 200-300 bp, 300-500 bp, and 500-1000 bp accounted for 60.69%, 23.90% and 12.26%, respectively. The length range of unigenes over 1000 bp was 3.15%, while only two unigenes over 2000 bp were obtained (Table 1). These assembled contigs or unigenes were used to remap the viral reads for the second round and obtain more accurate results.
The assembled viral contigs or unigenes were further extended into the full length potyvirus-like sequence as long as possible in the CodonCode Aligner 6.0.2 (CodonCode, Centerville, MA), and an 8832 bp sequence was finally obtained. Blastn search showed this 8832 bp sequence had significant similarity to potyviruses. Based on the genome size of potyviruses (~9.7 kb), a near full-length of this potyvirus-like sequence was obtained leaving approximately 850 bps of 5′ terminal sequence to be sequenced.

Determination of complete genomic sequence of NoMV
To further verify the sequence assembled from transcriptomic sequence and to obtain the missing 5′ end sequence, an overlapping amplicon cloning strategy was used via a series of 8 sequential RT-PCR cloning runs. The two ends of the viral genome were obtained by 5′ and 3' RACE. These fragments were Sanger sequenced and assembled. The full length of this potyvirus-like sequence, which was designated as NoMV Yunnan isolate (NoMV-YN), comprises of 9659 nt. This complete genome sequence was deposited in GenBank under accession number MN158696. NoMV-YN has typical genomic organization and structural characteristic of potyviruses. It has a 291-nt 5′ untranslated region (UTR) and a 256-nt 3' UTR. It contains a large open reading frame (ORF) encoding a polyprotein of 3026 amino acids (aa) residues with a calculated molecular mass of 343 kDa (Fig. 2).

NoMV polyprotein analysis
Pairwise comparison of NoMV and 20 closely related members in genus Potyvirus showed that this virus shared only 47-50.7% and 53-67% aa sequence identity with the polyproteins and CPs of other potyviruses, respectively (see Table S2). BLASTp analysis of the polyprotein showed that NoMV is most closely related to Tobacco vein banding mosaic virus (AEB66864.1), and the two viruses have 50.7% amino acid sequence similarity on polyprotein. Further analysis showed that NoMV polyprotein has typical structure and domains of a characteristic potyvirus. Nine highly conserved proteolytic cleavage sites were found in the polyprotein based on potyvirus conservative protease cleavage sites. Ten putative mature proteins from cleavage are P1 protein (P1, 258 aa), helper-component proteinase (HC-Pro, 458 aa), P3 protein (P3, 344 aa), first 6 kDa peptide (6 K1, 52 aa), cylindrical inclusion protein (CI, 634 aa), second 6 kDa peptide (6 K2, 53 aa), viral protein genome-linked (VPg, 190 aa), nuclear inclusion "a" protease (NIa-Pro, 242 aa), nuclear inclusion "b" protease (NIb, 521 aa) and 274-aa CP (Fig. 2). The putative PIPO ORF (99 aa) was also identified as the conservative motif GAA AAA A which was also found at nt 2896-2902 (Gong et al. 2011). In addition, the NoMV CP has 274 aa residuals with a calculated molecular mass of 30.69 kDa. The 2759 DAG 2762 motif associated with aphid transmission was also found at the N terminus of CP (aa 7-9) (López- Moya et al. 1999).
Moreover, aa sequence comparison of each mature protein between NoMV and other 20 members of Potyvirus was also performed. The result showed that HC-Pro, CI, VPg, NIa-Pro and NIb proteins of NoMV shared relatively high aa sequence identities with those of other 20 members, ranging from 46% to 66%. However, the similarity of P1 and P3 are as low as 9-14% and 23-36%, respectively (Supplementary Table S3).

Phylogenetic analysis of NoMV with other potyviruses
To determine if NoMV is a novel potyvirus and how it relates to other potyviruses, a phylogenetic analysis of NoMV and the 20 potyviruses were performed based on the polyprotein amino acid sequences. The results indicated that NoMV together with other potyviruses formed a group, representing the Potyvirus genus. Further analysis indicated that NoMV was most closely related to TVBMV within a subgroup (Fig. 3a). The close relatedness between NoMV and TVBMV was confirmed by the phylogenetic analysis based on CP amino acid sequences (Fig. 3b). Although NoMV is closely related to the TVBMV, considering the low sequence similarities between these two viruses, it is clear that NoMV is a distinct species of Potyvirus. These results indicated that this virus isolate represents a novel  (Table 2). These results showed that the widespread Noni mosaic disease (NoMD) is tightly associated with NoMV, and is mostly likely caused by this virus.

Aphid is a carrier and transmits the NoMV
Field surveys also showed that a large number of aphids (Aphis atrata Zhang) and whiteflies were observed on the underside of symptomatic leaves ( Fig. 4a and b). RT-PCR was performed to determine whether aphids or whiteflies could carry the virus and the results showed that a DNA band with expected size was amplified from aphids RNA preparations, but not from the whitefly RNA preparation (Fig. 4c). Sanger sequencing confirmed the amplified band was NoMV CP. These results indicated that the aphid is a carrier and a potential transmission vector of NoMV. Transmission experiment showed that the virus could be transmitted to the 10 plants (100%). NoMV could be detected starting by day-4, and the mild symptoms of light and dark green patches on leaves were observed at the beginning of the ninth day. The result confirmed that the aphid is a carrier and a vector of NoMV.

Discussion
In this study, we analyzed the transcriptome highthroughput sequencing data of diseased noni plant leaves, which leads to identification of a novel potyvirus in diseased plant and assembly of the complete genome of the potyvirus NoMV. However, the de novo assembly from the Illumina reads failed to obtain~850 bp sequence at the very 5′ end of NoMV. This is most likely due to the poor sequence homology at the 5′ end of NoMV with any existing viral sequences. Consequently reads containing the 5′ end sequences of NoMV were not mapped to any vial genome and were not used for the de novo assembly during the subsequent analysis. The draft genomic sequence assembled from the NGS data allowed the design of primers for PCR cloning and 5′ and 3' RACE. The complete genome of NoMV was then sequenced by using Sanger sequencing and assembled. When the complete NoMV genome was used as reference to analyze the transcriptomic sequencing data again, we found rare viral reads mapped to the 5′ end of NoMV genome. Therefore, the data from the transcriptome high-throughput sequencing would not completely cover this region. Similar results were observed not only in potyvirus (Sheveleva et al. 2013), but also in babuvirus (Yu et al. 2019), enamovirus and nucleorhabdovirus (Cao et al. 2019). These results also indicated the complete genome of NoMV would likely be obtained by increasing the transcriptomic sequencing data coverage. Considering the cost involved, we chose to fill the flanking region by 5' RACE.
Potyviruses are one of the most important plantinfecting virus groups (Wylie et al. 2017). At present Potyvirus, consisting of about 170 species, is the largest genus in the family Potyviridae according to the International Committee on Taxonomy of Virus (ICTV)   (Revers and Garcã-A 2015;Huang et al. 2019). The species demarcation criteria of potyviruses, as suggested by ICTV, is based on the complete nucleotide sequence and large ORF amino acid sequence, setting at <76% nucleotide sequence identity or < 82% amino acid sequence identity (Wylie et al. 2017;Adams et al. 2005).
In this study, sequence analysis revealed that the polyprotein and CP aa sequence of NoMV shared less than 51% and 67% sequence identities with those of other potyviruses. These results indicate that NoMV is a novel potyvirus. Although NoMV was more similar with TVBMV and grouped into sub-clades in polygenetic analysis, it is genetically distinct from TVBMV and other potyviruses. Considering the low sequence similarities between these two viruses, it is clear that NoMV is a distinct species of Potyvirus. Among the ten proteins produced by cleavage of the potyviral polypeptide, P1 is the most divergent in amino acid sequence. Sequence comparison revealed that the amino acid sequence of NoMV P1 shared less than 14% identities with those of other potyviruses. P1 has been shown to be required for host adaptation in potyviruses (Shan et al. 2018;Vozárová et al. 2017). In addition, sequence comparison also revealed that the amino acid sequence of NoMV P3 shared less than 23% identities with those of other potyviruses. P3 has been shown to play a decisive role in the intercellular movement of the potyvirus (Cui et al. 2017). These findings suggest that the two genetically divergent proteins, along with other proteins, may allow NoMV to adapt noni plants as a host.
Our study also suggests that aphid is a carrier and a transmission vector of NoMV. Some conserved motifs on potyviral proteins have been verified to be responsible for potyvirus aphid transmission (Huang et al. 2019). For example, motifs "KITC" and "PIT" on HC-Pro and a motif "DAG" on CP have been identified to be key factors in aphid transmission (López-Moya et al. 1999;Plisson et al. 2003;Stenger et al. 2005;Seo et al. 2010). This finding also arouses the concern that without appropriate management of aphids could accelerate the spread of NoMV. Currently, the disease caused by NoMV has spread to other growing area in Yunnan Province and even some neighboring area of Thailand. It is likely that the disease would further spread without proper management and intervention. Further studies are needed in order to clarify the genetic diversity, biological characterization, and epidemiology of NoMV in different geographic regions.
The data presented in this study provided strong evidence that the novel NoMV is intimately associated with and is the likely cause for the noni mosaic disease prevalent in Yunan, China. Additional experiments are required to conclusively demonstrate the etiology of and the role of NoMV in the noni mosaic disease.

Compliance with ethical standards
Conflict of interest The authors declare no conflict of interest.
Ethical approval This manuscript did not involve any human participants, and/ or animals.