Viral Dark Matter in the Gut Virome of Elderly Humans

The human virome is an area of increasing interest with relation to human health and disease. It has been demonstrated to alter in concert with the bacterial microbiome in early life and was also found to be different in patients with certain diseases such as inflammatory bowel disease. However, all virome analyses are hampered by a lack of annotated representative database sequences, often referred to as the ‘viral dark matter’. Here we provide the first description of the gut DNA virome in elderly individuals (>65 years old) as well as the description of novel bacteriophages not present in current reference databases. Diversity analysis comparing elderly persons from different residence locations (community living vs long term care facilities) did not reveal any difference in their virome diversity profiles despite the reported differences at the bacteriome level. An abundance of M icroviridae of the subfamily Gokushovirinae were present in the faeces of elderly individuals . Several novel members of the order Caudovirales were also characterized and annotated. Assignment of host bacteria to detected viral genomes was attempted using a combination of CRISPR spacers, tRNA genes and a probabilistic approach. Further characterization of the viral dark matter is necessary for developing tools and expanding databases to study the human virome. This study focused on the virome of an aging human cohort with the goal of illuminating part of the viral dark matter.


Introduction
The human microbiome has been an area of growing research in the past decade and many diseases have associated microbial alterations which researchers hope will lead to diagnostics, disease sub-typing or even the identification of the pathology for certain conditions.Faecal Microbiota Transplantations (FMT) have been a highly effective treatment for recurrent Clostridium difficile infections and to date is the best example of a microbiome-based therapy [1,2].However, microbiome research is often hampered by our lack of knowledge about many of the resident microbes.The Human Microbiome Project (HMP) has tackled this problem, firstly by providing a reference set of genomes of bacteria inhabiting the body and secondly by providing in-depth shotgun metagenomic data [3,4].Despite this, a recent analysis from the expanded HMP found that many abundant members of the gut bacteriome diverged considerably from the closest available reference and certain clades entirely lacked reference genomes [5].This lack of database representatives is amplified for the human virome, given the comparatively small number of studies and lack of dedicated projects such as the HMP.Thus, studies to date have found human viromes to be composed predominantly of novel viral sequences, referred to as "viral dark matter" [6,7].
Despite this, research on the human virome has proceeded by utilizing a combination of database-dependent and de novo assembly based approaches [8].Early work examining viral communities examined uncultured marine and human faecal communities [9,10]; however, these studies were conducted at a far lower sequencing depth than that currently available with modern technologies.More recently, Norman et al. found an expansion of bacteriophage Caudovirales richness is associated with IBD [11] and Lim et al. found that the first 2 years of life is associated with a contraction of bacteriophage diversity matched by an expansion of bacteria [12].Manrique 4 et al. identified a set of core bacteriophages present in the gut microbiome of healthy individuals [13].McCann and colleagues (2018) found differences in the virome diversity of 1 year old infants born by spontaneous vaginal delivery or caesarean section [14].The lack of database representation for members of the gut virome is epitomized by the discovery of the sequence of CrAssphage by Dutilh et al. (2014) and also showed that it is the most abundant virus in the human gut [15].This work was expanded on by Yutin et al. (2017) who showed that crAssphage is just one member of an expansive bacteriophage family [16].
It is becoming increasingly recognized that viral taxonomy will need to incorporate viruses which are not yet cultured and for which there is only sequence data available, such as crAssphage.
However, the lack of a universal marker gene in viruses significantly complicates sequence based taxonomy.The International Committee on Taxonomy of Viruses (ICTV) in 2017 released a consensus statement describing a proposed classification pipeline for incorporating viruses for which there is only sequence data available into available taxonomy [17].However, this framework has yet to be fully implemented and detailed methodologies are currently not available.
The ELDERMET project has examined the microbial composition of elderly citizens (>65 years old) in Ireland.This has focused on an examination of the bacterial component and has found that the microbiome of the elderly is capable of predicting residence location (community vs long term care facilities) and identified that this trend was heavily related to diet [18].To date, however, no study has examined the virome of elderly subjects, with several studies focusing on infants [12,14,19], disease control cases [11] or healthy adults [13].Thus, we sought to examine the diversity and composition of the DNA virome in an elderly cohort, which included individuals present both in the community and in long term care in efforts to characterize viral dark matter.

Selection of faecal samples
Faecal samples for all elderly individuals were collected with informed consent as part of the ELDERMET study [18].From the elderly faecal samples available, samples were chosen in order to have: (i) 10 community and 10 long term care representatives, (ii) sufficient faecal material for DNA virome extractions and (iii) a mixture of high and low bacterial 16S Shannon diversity samples within and between the cohorts stratified by residence so as to represent the mixture of individuals captured by the ELDERMET study.For details related to samples chosen in this study, see Supplementary Table 1.

Preparation and sequencing of DNA faecal viromes
The DNA faecal viromes from elderly subjects were prepared using the protocol described by McCann et al. (2018).Briefly, faecal material was suspended in 1:20 (w/v) of SM buffer with large faecal particulates and bacterial cells removed by centrifugation and 0.45 µm pore diameter filtration.Unprotected DNA and RNA was removed by DNase and RNase treatment, respectively, before heat inactivating these enzymes.Subsequently, lysis and release of virion-protected DNA was performed using guanidine thiocyanate.Viral DNA was randomly amplified using Illustra GenomiPhi V2 kit (GE Healthcare) multiple displacement amplification.Purified amplified viral DNA samples were prepared for 300bp paired-end read metagenomic sequencing on an Illumina MiSeq platform (Teagasc Moorepark, Cork) using a Nextera-XT library preparation kit (Illumina) as described by the manufacturer.

Bioinformatics analysis 6
The quality of the raw reads was visualized with FastQC v0.11.3.Nextera adapters were removed with Cutadapt v1.9.1 [20] followed by read trimming and filtering with Trimmomatic v0.36 [21] to ensure a minimum length of 60 bps, maximum length of 150 bps, and a sliding window that cuts a read once the average quality in a window size of 4 falls below a Phred score of 30.Sequencing reads aligning to the human genome release GRCh38.p7were removed using Kraken v0.10.5 [22].A summary of the number and quality of paired-end reads following sequencing is available in Supplementary Table 2. Levels of bacterial contamination were estimated by classifying reads with SortMeRNA v2.0 [23] against the SILVA database and by aligning reads against the cpn60db [24] with bowtie2 in end-to-end alignment mode [25].Reads were then assembled with the metaSPAdes assembler [26], as per the findings of Roux et al. 2017 [27].
Virome sequence reads were classified into known viral orders and families using the Kaiju metagenomic classifier [28] and the NCBI non-redundant protein database (March 2 nd 2018; [29]).This classifier was chosen as it utilizes a 6 frame translation approach to classify sequences on the basis of amino acid homology, and thus is more sensitive to more distant relatedness.Raw reads were deposited in the NCBI under BioProject PRJNA385126.The accession for each individual sample is listed in Supplementary Table 1.

Detection of viral contigs
To further ensure there is no bacterial contamination following the assembly of contigs, viral contigs were detected using VirSorter [30].Predicted viral contigs were annotated using VIGA (pre-print [31]).A table of the VirSorter positive viruses and their properties is summarized in Supplementary Table 3.The similarity of elderly viruses against publically deposited sequences of the NCBI nr database is available in Supplementary Table 4. Taxonomic classification of viral contigs to 'Order' and 'Family' levels was performed using DemoVir (pre-print [TBA]), with results summarized in Supplementary Table 5.

Statistical analyses
All statistical analyses were performed in R v3.3.0 [32].Alpha diversity metrics including Chao1 richness and Shannon index were computed with PhyloSeq v1.16.2 [33] and plotted with ggplot2 v2.2.1 [34].Between-group differences in alpha diversity were tested with a Mann-Whitney test (also known as a two sample Wilcoxon test).Unweighted Bray-Curtis distance was used as input for a Principal Coordinate Analysis (PCoA) as performed by the pcoa function in the ape package v4.1.Adonis tests were performed using the vegan package v2.4.3 [35] in R to test community level differences.

Viral host prediction
Several approaches were undertaken to try and identify host bacteria for viral sequences.
Firstly, all viral genomes were queried against a database of CRISPR spacers [36].Secondly, viral encoded tRNA sequences were detected using ARAGORN v1.2.36 [37] and the closest bacterial homologue detected through a BLAST query against the NCBI nt database.Finally, the most likely host bacterium of elderly viral contigs was calculated using WIsH ('Who IS the Host') that employs a probabilistic approach [38].An in-house custom database of complete phage genome sequences was built to test the accuracy of WIsH by combining the European Nucleotide Archive the NCBI RefSeq database (2,477 bacterial sequences; October 2017).Redundancy was removed from these databases by removing the shorter sequence if it aligned within the larger sequence with >95% identity across 90% of its length.In addition, all sequences which contained ambiguous 'N' nucleotides, or any bacterial 'genome' sequence ≤500kb, was removed.The finalized list and accession details of phage and bacteria custom databases used during this work is available in Supplementary Tables 6 and 7, respectively.The accuracy of WIsH phage-host prediction program, using the custom built phage database against the custom bacterial databases, was calculated by comparing the bacterial genus textual descriptions.The number of matches for the phage's known host compared to the predicted host was at minimum 35%, with this accuracy expected to increase with a detailed manual curation of the downloaded bacterial and viral databases.The complete results for the WIsH accuracy estimation is reported in Supplementary Table 8.Subsequently, WIsH was applied to the DNA viruses detected in faeces of elderly individuals using the custom built database of bacteria as potential hosts.The most likely predicted host bacterium for viruses detected in the faeces of elderly individuals is available in Supplementary Table 9.  [39], with phylogeny determined using PhyML v20131022 with 1000x bootstraps using a JTT substitution model [40].Phylogenetic relationships were visualized using FigTree v1.4.3.The Microviridae subfamily taxonomic classifications for the NCBI Microviridae sequences were extracted from their GenBank files.

Phylogeny of phage proteins
A phylogenetic tree of the predicted Caudovirales phages was performed in a similar manner to the described Microviridae phylogeny, except the terminase protein was employed as a genetic marker (Pfam seed sequences of PF04466) with related terminase sequences from the created custom phage database included (Supplementary Table 6).Phage taxonomy was inferred by their 10 DemoVir predicted family classifications and additionally, groups are highlighted similar hosts as predicted by WIsH.
The average nucleotide identity between phage genomes and taxonomic groups was calculated using pyani [41].Input sequences were aligned using MUMmer (ANIm method) with the 'maxmatch' option enabled.

Viral-encoded antibiotic resistance
In order to assess if the viruses present in the faeces of the studied elderly individuals encode antibiotic resistance genes, encoded proteins were predicted using Prodigal v2.6.3 with the 'meta' option enabled and subsequently screened against the Comprehensive Antibiotic Resistance Database (CARD) database [42] using an E-value threshold of 1E-05 (Supplementary Table 10).
Subsequently, the viral DfrE-related sequence yielding the top BLAST hit against the CARD database was compared with other thymidylate synthases downloaded from Pfam (seed sequences PF00303).Briefly, all thymidylate synthase sequences were aligned using Muscle v3.8.31, and their phylogeny was determined using PhyML v20131022 using a JTT substitution model with default 20 bootstraps.The elderly viral encoded DfrE-related sequence and its closest Pfam thymidylate synthase sequence, TYSY_SYMTH, were aligned using Muscle and visualized using JalView [43].The coordinate of the highlighted conserved cysteine residue of thymidylate synthases was provided by Pfam.

Results
Viral-like-particles purified from faecal samples were sequenced on an Illumina MiSeq to generate a median of 1.3 million read pairs per sample.Following quality control, the remaining paired end reads were classified into known viral groups using the Kaiju classifier [28] against the nr database at NCBI (Figure 1).However, even with this approach, a median of 72% of reads per sample remained unclassified.Those reads which were assigned to a known viral group were primarily into the viral order Caudovirales and family Microviridae.The latter in particular were found to be most abundant identifiable viral group in many of the samples sequenced here, although this may be due to the use of Multiple Displacement Amplification (MDA) which has been shown to distort abundance of ssDNA viruses [44].
Due to the large number of unassigned viruses, reads were assembled using MetaSPAdes [26].In order to avoid contamination with bacterial sequences, only those contigs predicted as viral by VirSorter [30] were considered for further analysis.This resulted in a total of 205 contigs, ranging in size from 1,353 bases to 118,143 bases (Supplementary Table 3).The distribution of read coverage and size of VirSorter-detected viruses is shown in Figure 2. Again, the impact of MDA is evident by the extremely high coverage of smaller viruses, presumably ssDNA phage.
Paired end reads were aligned back to this contig set using bowtie2 in end-to-end mode which recruited a median of 70.42% reads per sample.Only 100 contigs showed significant nucleotide homology to any sequence in the NCBI nt database (BLASTn cut offs: maximum E-value 1E-10 and alignment length of 500 bases) and in many cases these alignments were only a small fraction of the assembled sequence (Supplementary table 4).Over half of these sequences were identified as circular by VirSorter, thus making it likely that the contig set described above is primarily composed of complete viral genomes which are not currently present in reference databases.
Analysis of bacterial composition in the ELDERMET cohort previously found differences in both alpha and beta diversity between those elderly living in long term care and in the community [18].However, when we repeated this analyses using the above contig set as a reference we detected no difference by alpha or beta diversity in tested metrics (Figure 1A and 1B).It is worth noting that we only investigated the DNA portion of the virome and the amplification of DNA before sequencing is predicted to skew diversity estimates.In addition, samples were chosen with prior knowledge of bacterial 16S diversity metrics in order to capture a range of sample types within the ELDERMET cohort.
A search of the putative elderly-associated viruses against a CRISPR spacer database resulted in 4 matches, supporting the characterization of these contigs as mobile genetic elements (Supplementary Table 3).There were also tRNA genes identified in a further 9 viruses providing an indication of host range for these contigs.The top results for the phage-host prediction program WIsH linked the elderly-associated viruses to Paenibacillus, Clostridium, Bacteroides, Lachnoclostridium, Bacillus and Sphingobacterium (Supplementary Figure 1).
A total of 47 predicted Microviridae viral sequences were detected in the faeces of the elderly individuals.In order to determine the relatedness of these viruses, phylogenetic analysis of their capsid protein sequences was performed (Figure 3).By interspersing characterized Microviridae sequences within the phylogenetic tree, it is possible to observe that the majority of elderlyassociated Microviridae are members of the Gokushovirinae subfamily with only a single sequence clustering with members of the Microviridae Bullavirinae subfamily.The average nucleotide identity of all Microviridae sequences was calculated and showed the genomes of Bullavirinae subfamily members (both elderly and NCBI sequences) have >70% average nucleotide identity (Supplementary Figure 2).However, the majority of Gokushovirinae subfamily genome sequences share <70% average nucleotide identity, demonstrating there is extensive uncharacterized diversity of Microviridae viruses associated with human faeces and there is also a need to revise the taxonomy of Gokushovirinae using a sequence-based approach.
The family level taxonomic prediction of Caudovirales phages was performed using DemoVir (pre-print [TBA]; Supplementary Table 5).Subsequently, a phylogenetic comparison of the terminase protein of Caudovirales phages detected in the faeces of elder individuals was performed (Figure 4).The phylogeny of terminase sequences results in phages of the families Myoviridae, Podoviridae and Siphoviridae intermingled throughout the phylogenetic tree.Only one group of Myoviridae phages, predicted to infect members of the order Clostridiales (Clostridium and Lachnoclostridium), are widespread amongst elderly individuals.Average nucleotide identity and genome comparisons of the putative Clostridiales Myoviridae phages were performed, highlighting the genomic variations within this cluster (Supplementary Figures 3 & 4).
The terminase phylogenetic analysis also identified a smaller cluster of putative Sphingobacterium-infecting Myoviridae phages amongst three individuals.Visual genome comparisons show these phages are highly related, but not clonal (Supplementary Figure 5).
Recently, there has been significant interest in characterizing the mobilization of antibiotic resistance genes, with mixed reports about phage involvement [45].Therefore, we investigated whether antibiotic resistance genes were associated with the viruses detected in the faeces of elder individuals.A total of 73 BLAST hits were obtained against the 205 elderly viral contigs below the chosen cut-off (E-value <1E-05; Supplementary Table 10).However, a manual examination of these BLAST hits highlighted the majority had small alignment lengths, low percentage identities and high numbers of mismatches.Therefore, these results were considered insignificant and not pursued.
Of interest were the BLAST hits of phage protein encoding sequences against dfrE of Enterococcus faecalis within the CARD database (two E-values < 1E-34) (Figure 5; Supplementary  6A).A comparison of DfrE of EM298_T0.NODE_5 with the closest related sequence, TYSY_SYMTH, showed that both sequences are conserved at the predicted catalytic cysteine ( [47]; Supplementary Figure 6B).

Discussion
There remains a vast amount of uncharacterized sequence diversity associated with the viral fraction of the human microbiota, referred to as 'viral dark matter'.While there are an increasing number of metagenomic studies characterizing viruses and phages associated with humans, most microbiota research is still focused on correlating presence/absence with health/disease and the next big challenge is moving towards determining causation [48].Altered phage populations have been observed in various human diseased states [13]; in addition, restoration of viral and phage populations through filtered, bacteria-free faecal transplants have had initial success in treating recurrent Clostridium difficile infections [49].
With each additional phage metagenomic study identifying and characterizing more of the unknown viruses and phages present in the human microbiota, the viral dark matter, future studies will become more informative.Current metagenomic studies of phages are hampered by the vast amounts of sequence diversity and lack of database representatives.This is epitomized by crAssphage, the most abundant virus of the human microbiota, which was only identified in 2014 by Dutilh and colleagues [15].CrAssphage, which was lacking a database homologue when it was first discovered, was identified through de novo assembly approaches.Subsequently, researchers have used crAssphage to identify related phage sequences and propose a taxonomic structure to these abundant viruses ( [16]; pre-print [50]).Therefore, in the absence of an all-encompassing curated viral database, researchers must characterize viral populations through both database dependent methods and also through database independent approaches to identify novel sequences.
With an aging human population, understanding the various facets of the human microbiota through multi-omic approaches will be important in designing strategies to prolong health.A previous examination of elderly gut microbiotas in Ireland demonstrated that the bacterial 16S rRNA composition differentiated individuals based on residence location [18].This study initiated an examination of the human faecal virome associated with elderly individuals (>65 years of age).
Following metagenomic sequencing of the elderly viromes, we observed no differences in the viral diversity between residential cohorts when queried against known viral sequences.In addition, no diversity variations were observed between elderly viromes with younger, average-aged healthy controls from the study of Norman and colleagues ( [11]; data not shown).However, this is not to say that differences do not exist.It must be noted that samples were chosen for this study with prior 16S rRNA compositional knowledge and were not randomized.In addition, amplification of DNA prior to sequencing may have masked some of the diversity differences between the elderlyvirome cohorts.Such biases should be removed from subsequent efforts to characterize the virome of aging individuals.
One of the challenges of viral metagenomic studies is identifying the host organism.Several methods have been applied in this study to try and infer potential host information to novel viral sequences: (i) CRISPR spacer sequences, (ii) tRNA sequences and (iii) a probabilistic approach.
Finding matches between specific bacterial CRISPR spacer sequences and viral contigs is database dependent and CRISPR spacer sequences are highly strain dependent.However, a match between a CRISPR spacer sequence and phage is a strong indication of a true biological interaction.Within Finally, to predict potential hosts, the probability of a phage-host pair was calculated using the program WIsH, which is suggested to be at its most accurate when applied to small phage genomes without tRNA sequences as these phages would be more dependent on their host's replication machinery [38].While WIsH is able to infer the most probable host for all of the 205 elderly-associated viruses from the bacteria supplied, in only 1of the 9 phages which encode a tRNA did the WIsH host prediction match the predicted tRNA host prediction.In addition, there were occasions where the WIsH predicted a host that was not detectable in the 16S rRNA sequencing results of the same sample.Thus, without laboratory isolation of a phage-host pair, all in silico host predictions should be treated extremely cautiously.
A phylogenetic analysis of the Microviridae phages present in the faeces of elderly individuals was performed.Taxonomically, the majority of elderly faecal Microviridae phages are members of the Gokushovirinae subfamily, with only a single sequence clustering with Bullavirinae subfamily members.Gokushovirinae have previously been detected in the faeces of humans and wild chimpanzees [52,53], with Gokushovirinae phages predicted to infect obligate parasitic bacteria, such as Chlamydiae and Bdellovibrio, and Spiroplasma [54].While the Gokushovirinae subfamily contains three genera [55], the average nucleotide identities of the Gokushovirinae detected in elderly faeces do not result in three distinct clusters, while all Bullavirinae sequences formed a single cluster by average nucleotide identity.Therefore, there is a significant amount of uncharacterized Microviridae diversity associated with human faeces.
A comparison of all the detected elderly-associated phage terminase sequences was performed to assess the diversity of Caudovirales phages.Interestingly, phages of the families Myoviridae, Podoviridae and Siphoviridae (of the order Caudovirales) did not cluster together, supporting the need for a sequence based taxonomic scheme rather than categorizing phages by morphology.As supporting evidence, two distinct clusters of phage terminase sequences were present in our phylogenetic analysis, composed of three or more sequences, and the respective phages putatively infect similar host bacteria (Clostridiales and Sphingobacterium).Thus, despite conserved morphologies, divergence of Caudovirales phage sequences appears to be driven by the targeted host organism.Additionally, during the terminase phylogenetic analysis, all of the mostclosely related terminase sequences identified from a custom database of 3,134 phages were included in our dendrogram.However, only 12 related terminase sequences were recruited for the resultant terminase tree.This result exemplifies the vast amounts of viral dark matter within the human gut virome still not characterized and deposited in a public database.
Antibiotic resistance is a significant concern in long term health care facility.A search for potential antibiotic resistance genes associated with elderly gut viral contigs resulted in numerous hits which were not considered significant, but there were several strong hits against dfrE of Enterococcus faecalis.An examination of the literature for DfrE thymidylate synthase demonstrated it conferred resistance to trimethoprim in E. coli when cloned into a high copy plasmid [46].However, the majority of the literature surrounding phage encoding thymidylate synthases characterize this enzyme as important in DNA synthesis [56] with recent literature showing these enzymes are present in phages which synthesize modified DNA bases during replication [57].Therefore, it is not clear if the elderly-associated faecal viruses examined in this study confer trimethoprim resistance to their host bacteria; however, any potential trimethoprim resistance conferred through these phages is likely a secondary consequence of the phage's natural replication strategy.

Conclusion
Understanding and manipulating the human microbiome is important in treating various conditions and providing an overall improvement in quality of life.However, the first steps towards this ultimate goal is a better understanding of the constituent members of the human microbiota.In this study, we focused on the viral populations of elderly individuals.To our knowledge there have been no studies characterizing phage communities associated with elderly (>65 years old) individuals, a demographic which is increasingly important as improvements in healthcare result in longer lives.While no differences were observed in the diversity of viral populations between elderly individuals by residential location in this study, or observed against younger healthy adults, there is proposed to be an overall gradual decline in the microbiota of aging individuals.However, making conclusions about differences in viral populations and presence/absence of specific viruses with health/disease is strongly dependent on available databases, yet there are significant amounts of unknown sequences detected in the viral fraction of the human microbiome and elderly individuals.Therefore, further characterization of the 'viral dark matter' will facilitate a better understanding of the complete human microbiota.This will hopefully result in a better understanding of host-microbiota interactions that will hopefully lead to future therapeutic interventions and diagnostics to ultimately improve human health.

(
ENA) phage genomes database (2,010 viral sequences; May 2015) with phage sequences obtained from NCBI by the lab of Andrew Millard (8,761 viral sequences; March 2018; http://millardlab.org).A custom database of bacterial sequences was compiled by combining sequences from the ENA bacterial genomes database (3,316 bacterial sequences; May 2015) with

Preprints
(www.preprints.org)| NOT PEER-REVIEWED | Posted: 5 July 2018 doi:10.20944/preprints201807.0101.v1The phylogeny of putative elderly-associated Microviridae viral contigs was conducted on their predicted capsid protein sequence as follows.A text search using the term 'Microviridae' was performed against the NCBI Genome and Nucleotide web-resource (March 2018) resulting in 1,289 sequences, the accession of which were downloaded using NCBI Batch Entrez.All sequences smaller than 3kb were removed, resulting in 771 Microviridae sequences.These sequences formed an NCBI Microviridae database for subsequent analyses.The protein encoded sequences of 47 elderly-associated Microviridae and the NCBI Microviridae database sequences were predicted using Prodigal v2.6.3 with the 'meta' option enabled for small contig sequences.Subsequently, a protein BLAST (E-value 1E-05) of the Microviridae capsid protein F (Pfam seed sequences of PF02305) was performed to identify capsid protein sequences.Resultant BLAST hits were sorted by bitscore and only the top hit per genome was retained.Next, the predicted capsid proteins of elderly-associated Microviridae were BLAST against the custom NCBI Microviridae database, with only the top resulting BLAST hit for each of the queried elderly-associated Microviridae retained.All 95 of the predicted Microviridae capsid sequences were aligned using Muscle v3.8.31 this study, the bacteria Bacteroides vulgatus and Odoribacter splanchnicus encoded CRISPR spacer sequences which closely matched elderly associated viruses.Several phages encode their own tRNA sequences to optimize their host range and replication[51].While tRNA sequences are often associated with mobile genetic elements, they are functionally conserved and their relatedness to bacterial homologues can be used to infer a potential host relationship.The tRNA sequences associated with elderly viruses had close homologues to human gut bacteria such as E. coli and B. vulgatus.However, one of the closest bacterial homologues to a phage-encoded tRNA was Borrelia garinii, a causative agent of Lyme disease.While there is no data available to suggest this elderly individual had Lyme disease, this prediction is most likely an artifact of selecting the top BLAST hit for a divergent phage tRNA encoded sequence from uncharacterized phages of the human viral dark matter.

Figure 1 .
Figure 1.Virome diversity of community versus long-term residential care elderly individuals

Figure 2 .
Figure 2. VirSorter detected viruses present in the faeces of elderly individuals, highlighting viral

Figure 4 .
Figure 4. Phylogenetic comparison of the encoded terminase protein-encoded sequences present

Figure 5 . 25 Figure 1 .
Figure 5. Assessment of antibiotic resistance genes associated with viruses detected in the faeces

Table 10
).A literature search for DfrE of E. faecalis identified it encodes a