Amplicon Sequence Variant-Based Oral Microbiome Analysis Using QIIME 2

The bacterial composition of oral samples has traditionally been determined by PCR amplicon sequencing of 16S rRNA genes. Recent amplicon sequence variant (ASV)based analyses of 16S rRNA genes differ from that based on operational taxonomic unit (OTU) clustering in the way it deals with sequences having potential errors. However, little information is available on its application in oral microbiome studies. Here, we conducted ASV-based analysis of oral microbiome samples using QIIME 2. We investigated the optimal parameters for sequence denoising, using DADA2, and found the trimming of the first 20 nucleotides from 5′-end of both paired reads avoided excessive sequence loss during chimera removal. Truncating reads at positions 240–245 allowed the removal of low-quality sequences while maintaining sufficient length to merge matching paired ends. Taxonomic assignment, using the naïve Bayes classifier trained with the V3-V4 region of reference 16S rRNA sequences in the extended human oral microbiome database (eHOMD), resulted in bacterial compositions similar to those of OTU-based analyses. Contrary to OTU-based clustering, ASV-based analysis showed taxonomic abundance at the genus or species level to not differ significantly in tongue microbiomes, regardless of brushing. QIIME 2 can, therefore, be a standard pipeline for ASV-based analysis of oral microbiomes.


INTRODUCTION
Advances in DNA sequencing technology have enabled investigation of the human microbiome [1]. Hundreds of microbial species reside in the human oral cavity, and composition of oral microorganisms differs across individuals [2]. Disrupted oral microbial balance has been linked to oral diseases; therefore, a balanced microbial composition is important for oral health [3][4][5]. Thus, rapid and cost-effective identification of individual oral microbiomes could facilitate the prediction, prevention, and treatment of oral diseases, and the maintenance of oral health [6].
An important step in the amplicon sequencing analysis involves the clustering of sequences into groups, based on a specific identity threshold, for example, 97% into operational taxonomic units (OTUs). After clustering, a representative nucleotide sequence is determined for each OTU. Thereafter, a bacterial taxonomy can be assigned to each OTU by comparing with a reference database of choice that contains 16S rRNA gene sequences. Although OTU-clustering is a standard procedure, it has the possibility of placing similar sequences that, in fact, are derived from different species, together into the same unit [7]. This could lead to misunderstanding of the actual bacterial composition.
Furthermore, since clustering into OTUs includes only minimal filtering of input sequences based on their quality and length, low-quality sequences that were generated during the high-throughput sequencing could often get incorporated into the read counts of each cluster, resulting in incorrect composition of the taxonomic units [7].
A new method based on amplicon sequence variants (ASVs), also called exact sequence variants (ESVs) or zero-radius OTUs (ZOTUs) [8], has been developed, and is rapidly becoming a standard in microbiome analysis [9]. In ASV-based analysis, after the initial quality control that removes low-quality sequences, each sequencing read is "denoised," which involves computational correction of the error in nucleotide sequence of the amplicon based on the estimated error model. Sequence errors are so controlled, such that ASVs can be resolved exactly, down to the level of single-nucleotide differences, over the sequenced gene region [10]. Importantly, these processes are run without constructing OTUs. Therefore, the number of false sequences incorporated in the final outcome may be lower [11]. Using ASVs (i.e. exact sequences) also gives an advantage that the results are comparable across different studies [11].
Although ASV-based analysis is gradually being considered a standard in microbiome analysis, very little information is obtained from ASV-based analysis of oral microbiome samples, and the effect of selecting either OTU-or ASV-based analyses of oral microbiomes on the results remains unknown. Therefore, we applied ASV-based analysis using QIIME 2 [12] to an oral microbiome that had been previously analyzed based on clustered OTUs [13]. The QIIME 2 platform was developed to simplify the process for non-specialists in computational analysis [14]. We aimed to define optimal parameters for oral microbiome studies, especially for denoising, using the DADA2 [11]. Taxonomic assignment was performed, and differential abundance and alpha and beta diversity, were analyzed.

Sequence data
We applied ASV-based analysis to the 16S rRNA amplicon sequence data used in our previous study [13]. The raw nucleotide sequences are available on the DDBJ/EMBL/NCBI databases under the accession number DRA010660. Briefly, the dataset comprised of 16S rRNA amplicon sequences derived from tongue or salivary microbiomes, without (D1) or with (D2) tongue brushing. We used the following primers to amplify the V3-V4 hypervariable regions of bacterial 16S rRNA gene: 341F (5 ′-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTACGGGNGGCWGCAG -3′) and 806R (5′-GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGGACTACHVGGGTWTCT AAT -3′). The underlined nucleotides served as primer sequence parameters to extract the V3-V4 region for feature classifier training (see below). Paired ends were sequenced using a MiSeq (Illumina, San Diego, CA, USA) next generation sequencer at the Oral Microbiome Center (Takamatsu, Japan), as described [13,15]. For OTU-based analysis, CLC Genomics Workbench (Version 10.0.1; Filgen, Nagoya, Japan), equipped with the Microbial Genomics Module plug-in (Version 2.0; Filgen), was used in our previous study [13].
Taxonomy was assigned to each ASV using the 'classify-sklearn' command in q2-featureclassifier. Feature tables that represent the ASV counts for each sample were made in the HDF5-based BIOM format version 2.1 [20] inside QIIME 2. Taxonomy barplots were created using the 'barplot' command in q2-taxa. Alpha diversity metrics (observed features and Shannon index), and beta diversity metric (weighted UniFrac) [21,22] were calculated using q2-diversity, after samples were subsampled without replacement (rarefied) to 69,000 sequences per sample.

RESULTS
Optimizing parameters for sequence denoising using DADA2 Sequence denoising is a critical step that affects the overall outcome of ASV-based analysis, and QIIME 2 offers DADA2 or Deblur denoising methods. The DADA2 pipeline detects and corrects Illumina amplicon sequences by exactly inferring (correcting) sample sequences, and resolving differences of as little as one nucleotide [11]. Deblur uses sequence error profiles to associate erroneous sequence reads with the true biological sequence from which they are derived [24]. We selected DADA2 for the present study.
To optimize the parameters for sequence denoising using DADA2, we used 16S rRNA amplicon sequences that had been previously derived from tongue microbiome [13].
We amplified and sequenced the region spanning V3 and V4 of the 16S rRNA gene (Fig.   1A). Sequences obtained during a single run (#4) were applied to avoid the effects of different sequence runs. The data set in run #4 contained 51 salivary and tongue samples obtained from 10 different participants at different times. The total number of reads from these sequences was 113,234, each sample having a mean and standard deviation of 113234 ± 24011 sequences.
DNA sequence data were generated from the MiSeq sequencer as FASTQ files, containing nucleotide sequences, with sequencing quality scores on the corresponding nucleotide. Summary of the imported sequence quality can be visualized as an interactive quality plot using QIIME 2 View (Fig. 1B).
Denoising paired-end sequences using DADA2 consists of quality-based filtering, sequence inferring and merging, and chimera removal [25]. Quality-based filtering consists of simply checking the quality of sequences with the information in FASTQ files and filtering out those that do not meet a certain threshold set. Sequence inferring is the core step in DADA2, in which sequences are corrected to the original sequences, computed by the DADA2 algorithm, using the error rates estimated based on the input sequences [11]. Thereafter, paired sequences are merged. Since sequence inferring does not remove chimeric sequences that can arise during the PCR amplification, these are removed at the end of the process (Fig. 2A).
Based on the interactive quality plot of the sequences imported into QIIME 2 and visualized using QIIME 2 View (Fig. 1B), the parameters for the denoising step need to be determined. These parameters include "--p-trim-left-f" and "--p-trim-left-r," which trim the first specified number of bases of each sequence from the 5′ end, and "--ptrunc-len-f" and "--p-trunc-len-r," which truncate each sequence at a specified position at the 3′ end. Although removal of regions with low-quality sequences, during trimming and truncating, is generally advised, threshold criteria are not absolute. Thus, we compared the outcomes of different parameters to determine the optimal conditions for further analysis. Figure 2B shows an example of a command run in the QIIME 2 command line interface.
We initially tested how trimming of the reads affect the ratios (%) of sequences retained after the denoising process using DADA2 (Fig. 3A). While testing trimming lengths, the truncation parameters were fixed to 247 nt for the forward reads and 248 nt for reverse reads (trunc:247_248), ensuring maximum length without dropping any sequence in this particular example, judged by the interactive quality plot. Since the sequencing quality of the first 20-30 nt tended to be relatively low (Fig. 1B), we tested combinations of the trimmed length ("--p-trim-left-f" and "--p-trim-left-r") of 0, 10, 20 and 30 nt for both forward and reverse primers. With no trimming (trim:0_0), which is the default setting, approximately 10% of the input sequences were retained after the entire denoising process (Fig. 3A). The ratio (%) of retained sequence dropped mostly during chimera removal. This is probably because the DADA2 algorithm judged the common degenerate primers forming chimeras with the actual 16S rRNA sequences. The tendencies were the same when 10 nt were trimmed from both forward and reverse reads (trim:10_10), or when the forward primer was not trimmed while 20 nt were trimmed from the reverse primer (trim:0_20) (Fig. 3A). Trimming 20 or 30 nt off the forward and reverse reads improved the retention ratio after chimera removal (Fig. 3A, trim:20_20 and trim:30_30). Based on these results, trimming 20 nt seemed like an optimal parameter when this primer set is used for 16S rRNA gene amplification.
Next, we optimized the truncation parameters (Fig. 3B). While testing various truncating positions, a trimming parameter was fixed at 20 nucleotides (nt) for both forward and reverse reads (trim:20_20 in Fig. 3A). When the sequences were truncated from the 3′ end, up to position 230 or 235 nt on both forward and reverse paired sequences (trunc:230_230 or trunc:235_235), the ratio (%) of reads retained after the merging step dropped to approximately 10% of the input (Fig. 3B). Considering the amplified V3-V4 region is approximately 460-nt, this is probably because the overlapping region was not long enough to merge paired sequences that were truncated to this length.
Truncating the forward and reverse sequences to position 240 (trunc:240_240), or forward sequences to 247 and reverse sequence to 248 (trunc:247_248; minimum sequence lengths identified during random sampling of 10,000 out of 5,774,947 imported sequences as shown in the interactive quality plot (Fig. 1B), in this particular example) retained 75%-80% of sequences after merging and chimera removal. Minimal truncation (trunc:251_251; forward and reverse sequences truncated to position 251) resulted in a greater drop in ratio at the first quality-filtering step compared to the trunc:240_240 or trunc:247_248 situations. Thus, truncating to position 240 nt seemed to be a safe setting for sequences of different qualities.

Taxonomic classification
Each ASV resulting from denoising was next classified into a microbial taxonomy for further analysis. This contrasts with the OTU-based method, in which sequences are first clustered into an OTU, which is then classified into a specific taxonomy based on a single representative sequence. A naïve Bayesian taxonomy classifier can be prepared in QIIME 2 using the q2-feature-classifier plugin [19]. We assessed whether the results generated using this classifier were similar to those generated using the OTU-based method. The sample used in this test was microbiota from the tongue, without (D1) or with (D2) tongue brushing [13]. Sequence reads generated from different Miseq runs were merged and samples corresponding to D1 and D2 were extracted. Both D1 and D2 contained data from the same 10 participants. The total number of reads from samples in D1 was 999,646, with each sample having a mean and standard deviation of 99,965 ± 36,200 reads. The total number of reads from samples in D2 was 979,807, each sample having 97,981 ± 29,985 reads. The taxonomy classifier was trained using the V3-V4 region of 16S rRNA genes from the eHOMD database [18]. Figure 4 shows a taxonomy bar plot at the phylum level. Relative abundance of each phylum was similar to that found in the previous OTUbased analysis using the CLC Genomics Workbench [13].
Alpha and beta diversity analysis and differential abundance The within-sample (alpha) diversity of tongue microbiomes, without (D1) or with (D2) tongue brushing, was determined using the q2-diversity plugin in QIIME 2. The between-sample (beta) diversity was determined using weighted UniFrac distances. Weighted UniFrac distance was chosen because, compared to unweighted UniFrac, it tends not be affected by the denoising method used or by low-abundance ASVs present in the data [26]. Permutational multivariate analysis of variance (PERMANOVA) [27] indicate that the bacterial compositions of D1 and D2 samples did not differ significantly (pseudo-F = 0.22, p = 0.92; Figs. 5C and D). These results, showing no significant difference in alpha and beta diversity, were consistent with the previous OTUbased analysis using the same data [13].
Finally, we tested the differential abundance of microbial taxonomy by ANCOM using the q2-composition plugin implemented in the QIIME 2 pipeline. No taxonomy significantly differed at the genus or species level in the tongue microbiome, with or without tongue brushing. This contrasted with our previous findings, in which OTUbased clustering and paired Wilcoxon signed-rank tests of all OTUs showed the abundance of five OTUs to be significantly altered [13]. This difference could be explained by the more stringent ANCOM algorithm, which was developed to overcome the problem of high false discovery rates (FDR) associated with previous methods [23].

DISCUSSION
ASVs have been proposed to replace OTUs as the standard unit of marker-gene analysis [10], and such clustering-free approaches are also being considered a standard in studying microbiomes in humans, including their oral samples [28]. We compared ASV-and OTUbased analyses of microbiomes from oral samples, with or without tongue brushing. We determined the optimal parameter settings for sequence denoising using DADA2, and obtained lower noise derived from clustered sequences with potential errors in OTUs. The obtained optimal trimming parameters of 20-30 nt were similar to those obtained by another group, in which trimming of 18 nt was recommended [29]. In the present study, we further optimized the truncating parameters for oral microbiome analysis, in order to use them in future studies.
Since the taxonomic classification accuracy of 16S rRNA gene sequences was expected to improve when a naïve Bayes classifier is trained only on a specific region of a sequence [30], we extracted the V3-V4 region of the reference 16S rRNA gene sequence from eHOMD [18] for classifier training. This resulted in a taxonomic assignment that was comparable to that of the previous OTU-based clustering. Although a detailed comparison down to species level might be necessary, our current results indicated that ASVs can replace the OTU-based method in taxonomic assignment.
The number of observed features (ASVs) were used as an alpha-diversity measure in this study. Although we used only DADA2 for denoising, there are several other denoising pipelines available such as Deblur and UNOISE3 [8,24]. It has been shown that the count of ASVs/OTUs differs between different pipelines [26,31,32]. Thus, the absolute number of observed features should be interpreted carefully.
The differential abundance across samples disagreed between the analyses by ASVbased QIIME 2 pipeline and OTU-based clustering. This could be attributed either to sequences with errors incorporated during OTU-based clustering, as described above, or to the methods used to analyze the differential abundance of each taxonomy. Relative abundance of each taxonomic unit was compared at the species level, in the OTU-based method, using paired Wilcoxon signed-rank tests implemented in the Rhea pipeline [33], whereas that in the ASV-based method was compared using ANCOM [23], in the present study. How and where the difference arouses still awaits clarification, since it influences the interpretations of the findings. Our study has a limitation, since the data used was derived only from healthy participants. Whether the ASV-based analysis is suitable for a   indicates "--p-trunc-left-f 247" and "--p-trunc-left-r 248" in 'dada2 denoise-paired' command). Each data point represents one sample.