Evaluation of reference genes for qRT-PCR normalization in Angelica decursiva under various experimental conditions

: Angelica decursiva is one of the lending traditional Chinese medicinal plants producing coumarins. Notably, several studies have focused on the biosynthesis and not the qRT-PCR (quantitative real-time reverse transcription polymerase chain reaction) study of coumarins. This qRT-PCR technique has been extensively used to investigate gene expression levels in plants and the selection of reference genes which plays a crucial role in standardizing the data form the qRT-PCR analysis. In our study, 11 candidate reference genes were selected from the existing transcriptome data of Angelica decursiva . Here, four different types of statistical algorithms (geNorm, NormFinder, BestKeeper, and Delta Ct) were used to calculate and evaluate the stability of gene expression under different external treatments. Subsequently, RefFinder analysis was used to determine the geometric average of each candidate gene ranking, and to perform comprehensive index ranking. The obtained results showed that among all the 11 candidate reference genes, SAND family protein ( SAND ), protein phosphatase 2A gene ( PP2A ), and polypyrimidine tract-binding protein ( PTBP ) were the most stable reference genes, where Nuclear cap binding protein 2 ( NCBP2 ), TIP41-like protein ( TIP41 ), and Beta-6-tubulin ( TUBA ) were the least stable genes. To the best of our knowledge, this work is the first to evaluate the stability of reference genes in the Angelica decursiva which has provided an important foundation on the use of qRT-PCR for an accurate and far-reaching gene expression analysis in this medicinal plant.


Introduction
Angelica decursiva (Miq.) Franch. et Sav. is an important traditional medicinal plant form the genus Angelica, family Umbelliferae. The major bioactive components of this herbal drug are coumarins and this drug has been listed as one of the main special coumarin sources by the Chinese Pharmacopoeia. Coumarin has a core structure (2H-1-benzopyran-2-one core), that widely exists in higher plant species like the Rutaceae, Leguminosae, and Umbelliferae. Besides, they are slightly distributed in both animals and microorganisms [1]. Recent pharmacological studies have proved that the 80% ethanol extract of A. decursiva has an anti-inflammatory activity [2,3] antioxidant [4], antifungal [5], inhibitory effect on HepG2 tumor cells [6] and inhibits the proliferation of melanoma cells [7]. Hence, certifying the supply of coumarin compounds from the drug has been a long-term practice [8]. Conversely, there is a huge shortage of coumarin raw materials, which has been attributed to low abundance, season, or region-dependent sourcing. Also, because of its complicated structure and multiple chiral centers [9], there lacks a feasible method that has the capacity of mass chemical synthesis. Therefore, genetic and metabolic engineering strategies used in the production 2.1. Selection of candidate reference genes, evaluation of amplification specificity and PCR efficiency Here, 11 candidate reference genes were selected based on the transcriptome data of A. decursiva (unpublished). Table 1 lists all the candidate reference gene names and abbreviations, homologous genes from Arabidopsis, primer sequences, amplification length, annealing temperature (℃), PCR efficiency (E), and correlation coefficient (R 2 ). Next, conventional PCR and qRT-PCR were used to verify the primer-specific amplification of all candidate reference genes. As illustrated in Fig. S1, based on agarose gel electrophoresis and melting curve analysis which showed a peak, the 11 primer pairs were amplified by a single specific amplicon. Here, the amplification efficiency range of CYP2 and PP2A was 1.711 and 1.880 respectively whereas the correlation coefficient range of UBQ10 was 0.888, and that of PP2A, PTBP, TIP41, and ACT was 1.000. Here, the cycle threshold values (Ct) showed the number of cycles when the generated fluorescent signal reached a level that could be detected. Therefore, in this study, the gene expression level and stability of candidate reference genes were directly determined using the obtained Ct values. As shown in Figure 1, the average Ct values of these 11 reference genes are distributed between 10 and 30, and a majority of them are distributed between 23 and 27. Among all the candidate reference genes, NCBP2 had the lowest average Ct values, whereas CYP2, PTBP, and TUBA had the highest average Ct values. Moreover, each reference gene had a different coefficient of variation under different conditions. Also, it was observed that SAND and PTBP had the lowest variability, whereas TUBA, which had the highest Ct value, had the highest variability. This value ranged between 25 and 32. Therefore, it may not be qualified as a reference gene. As shown in Table  S2, the Ct values related to the box plots not only determine the expression profile of the reference genes but also reveals their stability. reveal their stability (Table S2).

Expression stability of candidate reference genes
Based on the relative expression levels of the 11 candidate reference genes, four algorithms such as BestKeeper, geNorm, NormFinder, and Delta Ct were used to examine their expression stability. Subsequently, the RefFinder tool was employed to sequence the expression stability of all these candidate reference genes and select the most suitable ones.

geNorm analysis
The geNorm analysis evaluated the stability of all the 11 candidate reference genes using the M value (reference expression stability measure). These M values were calculated from the average variation of the gene relative against other candidate reference genes, and the lower M values indicated a higher gene expression stability [21]. As illustrated in Fig. 2, all the candidate reference genes had different levels of stability under different treatments. Here, the M value of PP2A and SAND was the lowest in most treatments and was deliberated as the most stable reference genes. Besides, ACT showed good stability under H2O2, MeJA, NaCl, and UV treatments, whereas CYP2 was one of the most stable reference genes in the CuSO4 and UV treatment groups. On the other hand, the stability of NCBP2 seemed to be less satisfactory than other candidate reference genes. Subsequently, the geNorm algorithm can also determine the optimal number of normalized reference genes [21] by calculating pairwise mutations (V n/n + 1). Normally, the ideal paired variation (V) score is less than 0.15, which means that the addition of any other gene will not have a substantial impact on standardization. In our study, in the NaCl, Cold, UV, H2O2, CuSO4, and WT subsets, their pairwise variation values (V 2/3) were all less than 0.15. This showed that the addition of a third reference gene lacked significant effects on the normalization of the target gene, thus, the number of the optimal reference genes that were determined was two. In contrast, as shown in Fig. 3 and Table S3, the pairwise variation values (V 3/4) of the PEG and MeJA subsets, was also less than 0.15. This indicated that the number of reference genes that were most suitable for these two treatments was three. Figure 2. Average expression stability values (M) of the 11 candidate reference genes using geNorm software. Expression stability was evaluated in samples from A. decursiva was submitted to cold stress, drought stress (20% PEG), Methyl jasmonate (MeJA) stress, salt stress (0.5M NaCl), oxidative stress (H2O2), ultraviolet (UV) induction, Metal stress (CuSO4) , untreated (WT) and 'total' (all treated samples). The least stable genes are on the left with higher M-value and the most stable genes on the right with lower M-value. Figure 3. Determination of the optimal numbers of reference genes for normalization by pairwise variation (V n/n+1). Different treatments are marked as square frame with different colors. The cut-off value is 0.15 and used to determine the optimal number of candidate reference genes for qRT-PCR normalization.

NormFinder analysis
To normalize raw data and measure the stability of candidate reference genes through intra-and inter-group variation, the NormFinder analysis uses the 2 -ΔCt method. Like the geNorm analysis, its lower values show higher stability. Table 2 shows the ranking of all candidate genes as calculated using the NormFinder algorithm. Among the PEG, Cold, and 'total' treatment subsets, it was observed that SAND was the most stable candidate gene, and also ranked higher in other subsets. After MeJA treatment (0.078) on all sample subsets, TUBA was deliberated as the most stable candidate gene. Besides, among the other five groups (NaCl, UV, H2O2, CuSO4, and WT), PTBP (0.670), TIP41 (0.307), GAPDH (0.495), CYP2 (0.483), and UBQ10 (0.451) were observed to be the most stable genes, respectively. Similar to the geNorm analysis, our study found that the NCBP2 was the most unstable candidate gene.

BestKeeper analysis
Here, the raw data of the Ct value was directly calculated and the stability of candidate reference genes evaluated by calculating the standard deviation (SD) and the Ct value [22]. Notably, lower SD and Ct values indicated higher gene expression stability, particularly when the SD was greater than 1 which indicated that the reference gene was unstable [23] and could not be utilized for normalization. Table 3 shows that in the NaCl, MeJA, UV, WT, and 'total' treatment subsets, the PTBP gene was considered the most stable. However, the MeJA and 'total' treatment subsets were postulated to be expression-insensitive since their SD values were exceed 1 and could not be used for normalization. For this reason, such values should be excluded. Besides, under H2O2, CuSO4, Cold, and PEG treatments, the most stable genes were observed to be CYP2, SAND, TUBA, and PP2A. Generally, as illustrated in Table 3, it was observed that the NCBP2 gene was still the most unstable under all treatments.

Delta Ct analysis
This method evaluated gene expression stability by calculating the mean standard deviation (SD) of each gene. Here, the smaller the value, the higher the stability [24]. As shown in Table S4, the results of this analysis are consistent with the geNorm analysis. The only difference is that in the Cold and H2O2 subsets, TUBA and PP2A candidate genes are the most stable respectively. According to the geNorm analysis, the two most stable candidate genes are SAND and ACT. Hence, SAND and PP2A are the most qualified reference genes.

RefFinder analysis
As shown in Fig. 4, we further calculated the geometric mean of the ranking of each candidate gene using the RefFinder algorithm (http://150.216.56.64/referencegene.php#). This was based on the results obtained from the three statistical algorithms such as geNorm, NormFinder, and BestKeeper. Table S5 and S6, show the comprehensive index ranking, whereby, the smaller the index, the more stable the gene expression [19]. This study showed that SAND and PP2A ranked the highest in most subsets, whereas NCBP2 and TUBA ranked the lowest, making them the most unstable reference genes. In contrast, in the MeJA and Cold subsets, TUBA seemed to be a relatively stable reference gene. Despite the different assessment methods, this resulting difference is reasonable and acceptable. In summary, the stability of these 11 candidate reference genes from the highest to the lowest is: SAND, PP2A, PTBP, ACT, CYP2, EXP-1, GAPDH, UBQ10, TIP41, TUBA, and NCBP2. These results were similar to those obtained from the geNorm and NormFinder analysis, but slightly different from those of the BestKeeper analysis. NormFinder, and geNorm algorithms, the 11 candidate reference genes were ranked comprehensively, and the geometric mean of each reference gene was calculated.

Discussion
Currently, a majority of medicinal plants enhance the bioactivity of their medicinal components by inducing and regulating functional genes present in their biosynthetic pathway [25]. Several reports have shown that in plants, there is a significant correlation between the synthesis and accumulation of secondary metabolites and the expression levels found in the functional pathways of their biosynthetic pathways [26,27]. Coumarins are one of the main active components extracted from A. decursiva, which is one of the main sources of coumarin found in China and is listed as a special coumarin resource by the Chinese Pharmacopoeia. However, there exist no previous reports on qRT-PCR studies, which confines the research on the biosynthetic pathway of coumarin compounds from A. decursiva.
Here, we screened a total of 11 potential reference genes from the transcriptome data of A. decursiva (unpublished). First, as shown in Table 1 and S1, their primer specificity and PCR experimental conditions were confirmed, whereby the single peak of the dissolution curve exposed the specificity of these 11 pairs of primers (Fig. S1). As illustrated in Fig. 1 and Table S2, the average Ct-values for the 11 candidate reference genes is between 8.04 and 32.65, and the majority of these values were found to be between 23 and 27, which means that most of the 11 candidate reference genes were likely to afford accurate normalization [20]. The SAND and PTBP genes have the lowest variability, which suggests that they could have stable expression levels under different treatments. Conversely, considering the complexity of the ambient environment, the stability of these reference genes under different conditions must be investigated further. Therefore, there is a need to use more statistical tools and conduct further analyses.
To ensure the accuracy and reliability of experimental data, four commonly used statistical algorithms (geNorm, NormFinder, BestKeeper, and Delta Ct) were combined for reference gene selection. Subsequently, RefFinder analysis was used to calculate the geometric ranking mean of each candidate reference gene, and then a comprehensive index ranking was done. Here, the results from this analysis showed that the ranking results acquired using different statistical algorithms were not similar, and the candidate reference genes also showed different levels of stability under different treatments. For example, Table 2 shows that TUBA is the best reference gene in the MeJA subset, and this is consistent with the results obtained when Atropa belladonna [28] was exposed to different hormone treatments which showed that it had the most stable expression. On the other hand, when this gene was exposed to other treatments, its stability was unsatisfactory. This is most likely because some genes belonging to the tubulin family and that express isomers in specific developmental stages or tissues are regulated by different developmental processes [29]. Similar to the results from tomato [30] and Arabidopsis thaliana [31], UBQ10 is the most stable gene in the WT group (Table 2). Also, the PTBP gene performed well in the UV subset, unlike in the Cold, PEG, and CuSO4 which is illustrated on Table 2 and S5). This result contrasted with the most stable performance of the PTBP gene in Cold, PEG, and CuSO4 subsets in Peucedanum praeruptorum [20]. Table 3 shows that under cold stress (4 ℃) and the 'total' group, the TUBA and PTBP genes were the most stable reference genes, respectively, but this was different from the results obtained from geNorm and NormFinder analyses which showed that SAND was the gene with the best stability. Since various calculation principles are inconsistent, even under the same conditions, the ranking of these genes could be different. Therefore, the single software analysis has certain disadvantages when evaluating the stability of the reference gene. Our study recommends that a comprehensive evaluation and analysis need to be used to ensure the reliability and accuracy of the results. Besides, it is necessary to flexibly select the most appropriate reference gene for different experimental treatments.
Also, bearing in mind that a single reference gene could cause errors in the expression level of the target gene [32,33], the geNorm statistical algorithm calculates the pairwise variation (V n/n+1) to attain the optimal number of reference genes [34]. Notably, the ideal pairwise variation value must be lower than the critical value of 0.15 [32,35]. For instance, as illustrated in Fig. 3 and Table S3, the pairwise variation V 2/3 in the Cold, NaCl, CuSO4, H2O2, UV, WT, and the 'total' treatment subsets, is less than 0.15. This demonstrated that only two genes are required to complete the gene expression normalization under these conditions. Consequently, it was observed that the PP2A and ACT are relatively stable reference genes under the treatment of MeJA, their pairing value V 3/4 was < 0.15, and their optimal number of reference genes was three.
Taken together, the analysis results from the three algorithms (geNorm, NormFinder, and Delta Ct) seemed to be consistent, whereas those of BestKeeper are different. Fig. 4 and Table S5, confirm that under most external conditions, the stability of SAND and PP2A genes are significantly better than that of other reference genes, whereas NCBP2 and TUBA are nearly always defined as the most unstable reference genes. Of note, the SAND and PP2A are new reference genes that were identified through screening the data from the Arabidopsis gene chip. To date, many studies choose the SAND and PP2A genes for their reference gene standardization studies. For instance, studies that used Arabidopsis thaliana [31], tomato [36], Petunia hyrbrida [37], Peucedanum praeruptorum [20], etc. Conventionally, reference genes, such as TUBA, GAPDH, UBQ10, and ACT play a housekeeping role in the maintenance of cell structure or primary metabolic activities. Nevertheless, increasing research indicates that the expression levels in most of these traditional reference genes can somehow vary greatly and in many species, they are unsuitable for gene normalization under specific conditions [38,39]. Therefore, given a subset of experimental conditions, the housekeeping genes used as reference genes in each different species should be handled carefully [40,41].

Samples preparation and treatments
Here, one-year-old plants of A. decursiva were collected from Ningguo City, Anhui Province, China (longitude: 118.95E, latitude: 30.62N). Then, these accessions were transplanted into plastic pots that contained a mixture of vermiculite, perlite, and peat moss (1:1:1 v/v). Next, the plants were grown in a greenhouse at a temperature of 25 ℃, a long photoperiod of 16 hours light and 8 hours darkness, 40-65% relative humidity, and 3000 lux light intensity until treated. For drought treatments, a 200 mL solution of 25% PEG 6000 (w / v, polyethylene glycol, Sangon, China) was used to treat the plants for one week. For salt stress treatment, an approximate amount of 200 mL (600 mM) of NaCl was used to water the plants for 7 days. For cold stimulation treatments, the plants were incubated at 4 ℃ for 48 hours. To study hormone therapy, 25 mM MeJA was applied for 6 hours. To assess heavy metal stress, 500 mM of Copper sulfate (CuSO4) was applied to the plants for 24 hours. In the case of oxidative stress, 50 mM H2O2 was used for 24 h. To induce ultraviolet (UV) light, these plants were irrigated using distilled water (100 mL) and then exposed to ultraviolet light for 24 h. Notably, all the above treatments had three biological replicates. The remaining plants that received no treatment served as controls. Lastly, all samples from each treatment were washed with MiniQ filtered water, quickly frozen in liquid nitrogen, and stored at -80 ℃.

Total RNA isolation and cDNA synthesis
About 100mg of the different frozen tissue samples were used to extract total RNA using the Spectrum Plant Total RNA kit (Sigma, USA). Next, the quality and purity of the extracted total RNA was determined using the NanoDrop spectrophotometer 2000 (Thermo Scientific, USA), and its integrity confirmed on a 1.5% agarose gel. Here, only RNA with A260/280 ratios between 1.8 and 2.2 and an A260 / 230 above 2.0 were used for cDNA synthesis. Subsequently, RNA samples were pretreated with RNase-free DNase I (Takara Biotechnology, Dalian, China) to remove contaminating traces of genomic DNA and then used for reverse transcription. Lastly, following the instructions from the HiScript Q RT SuperMix for qPCR (Vazyme, China), an amount of 1 μg of total RNA was used for cDNA synthesis (20 μL).

Primer design and qRT-PCR conditions
Here, a total of 11 candidate reference genes were selected from the transcriptome data of A. decursiva (unpublished). Next, the built-in program in the BioEdit Sequence Alignment Editor was used to screen and select potential single genes using local blast (TBLASTN). Subsequently, the corresponding homologs of these reference genes were selected from the database of The Arabidopsis Information Resource (TAIR) (http://www.arabidopsis.org), and only single genes with lower Evalues and higher scores were selected for subsequent analysis. Table S1, shows the reference gene IDs, homologous loci, gene sequences, and the different expression levels (FPKM, Fragments Per Kilobase per Million) of all the 11 candidate reference genes.
To avoid amplification of any contaminating genomic DNA, primers were designed to cross at least one intron/exon border that contained both donor and acceptor sites, and then exon analysis was performed using the AlignX program in the vector NTI advance 11.5 package. Subsequently, the Primer 5 software was used to design primers with the following characteristics: amplicon length was 100 to 150 bp, GC content was between 40-60%, primer length was 18 to 24 bp, temperature difference of each primer pair was less than 1 ℃, and the melting temperature (Tm) was between 55-60 ℃. Consequently, all the primer pairs were tested using conventional PCR to determine the combination of forward and reverse primers that performed optimally and the resulting products were examined using 1.0% agarose gel electrophoresis. Besides, from a series of standard curves of five different cDNA dilutions, the amplification efficiency (E) and correlation coefficient (R 2 ) were calculated. Table 1 lists all the gene-specific primer pairs that were designed and used in the qRT-PCR analysis.

Stability evaluation of candidate reference genes
To obtain the RT-qPCR data, three biological and technical replicates were done for each sample, and all data presented as mean ± standard error of the mean (SEM). Consequently, statistical analyses were performed using the Student's t-test. Next, representative graphs were generated using OriginPro 9.1 (OriginLab Corporation, Northampton, MA, USA). Subsequently, data analysis was performed using GeNorm (ver.3.5), NormFinder (ver.0.953), BestKeeper (ver.1.0), and the Delta Ct method following the instructions from the manufacturer. Lastly, a comprehensive stability ranking analysis was performed using RefFinder (http://150.216.56.64/ referencegene.php).

Conclusions
Selecting appropriate reference genes is a significant prerequisite for quantifying gene expression using qRT-PCR. This is a conducive research on the key enzyme genes that are involved in the biosynthesis of coumarins and other interesting secondary metabolites found in A. decursiva. So far, this is the first systematic screening experiment of the most suitable reference genes for A. decursiva under various external treatments, like cold stress (4 ℃), drought stress (20% PEG), Methyl jasmonate (MeJA), salt stress (0.5 M NaCl), oxidative stress (H2O2), ultraviolet (UV) induction, metal stress (CuSO4), untreated (WT), and 'total' (all treated samples). Our results have exposed that the 11 candidate genes have different stability in A. decursiva when exposed to different experimental treatments. From the overall stability ranking, SAND is the most stable candidate reference gene, followed by PP2A and then PTBP. On the other hand, the NCBP2 gene has the lowest stability making it unsuitable for further research. In summary, the reference genes evaluated in this study can be helpful for accurate normalization of the qRT-PCR data and any other future work on the gene expression of coumarin synthesis present in A. decursiva.
Supplementary Materials: The following are available online at www.mdpi.com/xxx/s1, Figure S1. Melting curves of 11 candidate reference genes tested in this study. Table S1. Informations on 11 candidate genes of A. decursiva. Table S2. Ct values of 11 candidate genes of A. decursiva. Table S3. Pairwise variation (V n/n+1) analysis of nine candidate reference genes calculated using geNorm.