Mycobacterium chimaera from cardio surgery Heating- Cooling Units and from clinical samples in Israel are genetically unrelated

Non-tuberculous mycobacteria (NTM) are opportunistic pathogens that cause illness primarily in the elderly, in the immunocompromised or in patients with underlying lung disease. Mycobacterium chimaera is a NTM species belonging to the M. avium Complex (MAC) group of species. Since 2013, a global outbreak of M. chimaera infection related to heater-cooler units (HCU) used in cardio-thoracic surgery has been identified. This outbreak was caused by a single strain of M. chimaera. In order to estimate the prevalence of this outbreak strain in Israel, we sampled M. chimaera from several HCU machines in Israel, as well as from patients, sequenced their genomes and compared them to the outbreak strain. The presence of mixed mycobacteria species in the samples complicated the analysis of obtained sequences. By applying a metagenomic binning strategy, we were able to obtain genomes of single strains from the mixed samples, and characterized them. M. chimaera strains were compared to each other and to previously reported genomes from other countries. The strain causing the outbreak related to the HCU machines was identified in several such machines in Israel but not in any of the clinical samples.


Introduction
Non-tuberculous mycobacteria (NTM) are ubiquitous environmental bacteria found primarily in soil and water. They are considered opportunistic pathogens, but the occurrences of disease and even death caused by these bacteria in recent years are increasing [1][2][3][4][5]. These bacteria are resistant to many drugs and disinfectants [3,6,7]. Mycobacterium chimaera is a slow-growing species of NTM belonging to the Mycobacterium avium complex (MAC), with high similarity to the species Mycobacterium intracellulare [8].
In the second decade of the 21 century, a global outbreak of M. chimaera disseminated infection and endocarditis occurred among patients who had undergone cardiopulmonary bypass surgeries [9][10][11][12]. The source of infection was identified as contaminated water in heater-cooler units (HCU), which regulate the temperature of blood and cardioplegia solution during open-heart surgery. Aerosols from the HCU, containing the pathogenic bacteria, spread in the operating room, infecting the patients' open chest or grafts [13]. M. chimaera detected at the manufacturing site of the LivaNova (formerly Sorin) HCU suggested this was the source of the contamination [14].
In order to identify the source of the outbreak, the European Union launched an epidemiological investigation, where the genomes of 250 isolates, from patients, HCUs and the water supply in five different countries were sequenced, and the results were published in 2017 by Van Ingen et al [15]. A phylogenetic analysis of the genomes divided most of the strains into two main groups. Almost all the strains obtained from patients who had undergone cardiopulmonary bypass surgery, belonged to a closely related subgroup which the authors designated subgroup 1.1. Van Ingen et al also identified specific SNP signatures for some of the phylogenetic groups, including the outbreak subgroup 1.1 [15]. Other studies supported the finding that a single strain of M. chimaera was involved in this outbreak [16][17][18].
In addition to the clinical strains from patients related to the outbreak, the outbreak subgroup 1.1 included most of the strains from HCUs manufactured by LivaNova as well as samples taken from the LivaNova production site. Interestingly, other strains of M. chimaera were detected in water taken from HCUs manufactured by another company, Maquet, but these strains were not genetically related to the strain causing the outbreak [15]. The mechanical characteristics of the LivaNova/Sorin 3T model of HCUs favored the spread of bacteria containing aerosols [13]. Following these findings, LivaNova modified this HCU device to improve its safety.
The Israel Ministry of Health (MOH) was notified in November 2016 by LivaNova of the NTM infections associated with use of its Stöckert 3T heater-cooler devices. At the time, 25 such devices were distributed among 11 hospitals throughout Israel performing open-heart surgery. A number of steps were taken: 1. General hospital CEOs were notified, and asked to inform relevant staff members of the outbreak and nature of invasive NTM infections associated with use of these machines; 2. The directors of the clinical microbiology laboratories in general hospitals were asked to retroactively relay information on any pertinent cultures from patients who had undergone cardiac surgery with use of a heater-cooler device; 3. The public was notified via press release and a media interview of the outbreak and those having undergone open-heart surgery since 2011 were asked to report any suspicious symptoms or signs to their medical providers for evaluation; 4. CEOs of hospitals with the heater-cooler devices in use were asked to ensure that updated safety instructions issued by the manufacturer, and additional measures requested by the MOH, were strictly followed; 5. Hospitals using the machines were required to submit to sampling of the water for NTM, document device maintenance, and report bimonthly bacterial colony counts from water in the devices.
6. The manufacturer, in conjunction with the Ministry of Health, embarked on a process of staggered recall of the devices to their plant in Europe, where they underwent cleaning, as well as installation of safety features that would render patients no longer vulnerable to infection due to contaminated exhaust from the water. This process was completed by mid-2018.
In order to estimate the infection risk, water samples from HCUs at 10 medical centers were sent to the Israeli public health laboratories for total bacterial count and specific identification of Mycobacterium contamination.
Fortunately, no clinical infections with M. chimaera related to use of the devices were reported. Nevertheless, to characterize the epidemiology of M. chimaera in Israel and to evaluate the possibility of HCU-related M. chimaera infections, we sequenced the genomes of M. chimaera from the HCUs, as well as all the clinical Mycobacterium isolates in the Israeli mycobacterial reference lab that were identified as M. chimaera.
We found that some of the HCU cultures contained mixed mycobacterial species, which complicated the identification of M. chimaera strains. Environmental samples are often comprised of mixed Mycobacteria strains. Isolating each NTM type is difficult, which makes deep sequencing analysis challenging. Several approaches have been applied to the task of deciphering bacterial strains from whole genome sequencing of a mixed culture. Eyre et al [19] used a maximum likelihood based model to identify two different strains of Clostridium difficile in short read WGS from mixed infection samples. However, their method relies on a previously constructed panel of known haplotypes that ideally include the strains. The QuantTB method [20] that detects mixed infection tuberculosis also uses a reference dataset of MTB genomes. Yang et al [21] introduced a tool for identifying strains of Salmonella enterica in samples from mixed infections, which also uses a database of known strains. DESMAN [22] uses metagenomic binning and core genes to identify strains in mixed genome samples without a strain database. Metagenomic binning is a process that clusters environmental shotgun reads or their assembled contigs back into the taxa composing the sample. To address the challenge of mixed samples we formulated an in-house method, which involved de novo assembly and metagenomic binning followed by SNP identification and genotyping, by comparing our draft genomes to both a reference genome and a SNP-containing reference genome. Here we report the result of this analysis.

Whole genome sequencing and species identification
We included in our study all eight M. chimaera isolates obtained from clinical samples during 2017 in Israel. These isolates were all obtained from sputum. In addition, we included one previous M. chimaera isolate obtained from chest biopsy in 2014 that was initially designated as M. intracellulare, and one M. intracellulare sample isolated from a patient's pleural fluid in 2013. Besides the clinical samples, we also included thirteen samples of HCU water from ten medical centers that were cultured in the Israeli Mycobacterial Reference Lab, and identified as Mycobacterium chimaera. All the sampled HCU devices were of the 3T model manufactured by LivaNova/Sorin, the same model that was linked to the global outbreak of M. chimaera following open-heart surgery ( Table 1).
For initial identification, we used a Hain Lifescience molecular kit that is based on PCR and proprietary probes (GenoType NTM-DR assay). Two of the HCU samples were identified as mixed samples containing M. chimaera. An additional clinical sample identified as M. intracellulare was included in this study (Table 1). Short reads whole genome sequencing (WGS) of these samples were obtained. We further identified the bacterial species in each sample by assigning taxa to the sequence reads. The most abundant annotation in all samples was the genus Mycobacterium (53% -83% of the reads, data not shown). On the species level, in clinical sample M10, originally identified as M. intracellulare by Hain Lifescience GenoType assay, the most abundant species-level annotation was indeed M. intracellulare. In two other clinical samples, M8 and M9, The most abundant species-level annotations were Shewanella decolorationis and Bacillus azotoformans, respectively, and M. chimaera was only the second most abundant species. However, these samples had 60% and 58% of their reads annotated as Mycobacterium, respectively. Therefore, it is safe to say that M. chimaera was the most abundant species in these samples as well. In eight of the samples from HCU devices, the most abundant species-level annotation was Mycobacterium gordonae, a fast growing NTM species (Table 2).
Since some of our samples were not pure M. chimaera, it was impossible to use the short read sequence for identifying the outbreak M. chimaera strain in a straightforward manner. We therefore applied the following steps: 1. Short reads in each sample were assembled de novo into contigs.

2.
Contigs were binned into bacterial species, using metagenomics. Each sample resulted in one or two bins ( Table 2, Table S). 3. The species of each bin was identified by finding the genome most similar to it, from the collection of all publicly available genomes ( Table 2). A single bin was retrieved in each of our clinical samples, corresponding to one species. Most of these bins were more similar to M. chimaera genomes than any other genome, except for two cases. The bin produced from sample M10, which was, as expected, most similar to a M. intracellulare genome, and sample M2, which was surprisingly most similar to a Mycobacterium sp. TKK-01-0059 genome, a poorly characterized species of the Mycobacterium tuberculosis complex (Table 2). However, this bin also showed high similarity to a genome of M. yongonense, a sub species of M. intracellulare (839/1000 k-mers in MASH analysis), and lower similarity to a genome of M. chimaera (588/1000 k-mers). It cannot be ruled out that sample M2 is a strain of M. chimaera that is distant from the strains with publicly available genomes. The situation was different among samples taken from HCU devices. Samples M16-M19 and M21 had two metagenomic bins in each of them, one corresponding to M. chimaera and one to M. gordonae, a rapidly growing species. The metagenomic binning process resulted in a single assembly bin, identified as M. gordonae by comparison to publicly available genomes, in samples M22 and M23. All these samples also had a relatively high proportion of reads annotated as M. gordonae ( Table 2

column 5). Samples M22 and M23
were also the only HCU samples not identified as pure M. chimaera by the PCR based Hain Lifescience assay. Sample M20, also taken from a HCU, was divided into one bin of M. chimaera and one bin most similar to the genome of a Mycobacterium species isolated from a drinking water system in Illinois, USA [23]. HCU samples M11-M15 all had a single bin corresponding to M. chimaera. For the rest of our study, we used only the assembly bins identified as MAC, including the assembly bin from sample M2. The total length of these assemblies ranged between 5.3 Mbp and 6.9 Mbp, and the GC content between 67.3% and 68.1% (Table S). In comparison, the published complete genome length of the M. chimaera reference strain DSM-44623 and strains Zuerich-1 and Zuerich-2 are 6.1 Mbp, 6.4 Mbp and 6.5 Mbp respectively, and their GC content are 67.7%, 67.5% and 67.4% respectively. The genomic DNA was extracted using two methods ('RFLP' and 'Magnapure', see methods). 3 Sequencing was done using Illumina short reads technology in either a HiSeq or a MiSeq instrument (see methods). 4 PCR based identification -species identification by Hain Lifescience GenoType NTM-DR assay. 5 The production date of the HCU device, when available. 6 The hospital or the clinic in which the HCU was used, or the clinical sample was obtained, when available. The most abundant annotation of short reads on the species level (proportion of reads annotated to this species). 4 proportion of reads annotated as M. chimaera. 5 Number of assembly bins retrieved from metagenomic binning. 6 The most similar genome to each bin, as identified by running MASH against a database of all publicly available genomes (number of k-mers out of maximum 1000 k-mers). * M. = Mycobacterium.

Identification of known phylogenetic groups by specific SNP signatures
We searched our assembled genomes for the SNP signatures defined by van Ingen et al [15]. Twelve of the samples had a SNP signature assigning them to phylogenetic group 1, the largest phylogenetic group in that study, which included 200 isolates (Figure 1a). Within these 12 samples, two samples were further assigned to branch 2, one to subgroup 1.8, and six to subgroup 1.1, a tightly related phylogenetic subgroup related to the outbreak, by this subgroup's SNP signature (Figure 1b). Three samples were assigned to group 1 but not to any of its subgroups or branches. In addition, one of our isolates was assigned to van Ingen et al's group 2 and within it to subgroup 2.1, by their specific SNP signatures (Figure 1a). All isolates in our study that were assigned to subgroup 1.1, the outbreak subgroup, originated from HCU devices, which is in accordance with the fact that no clinical sample was taken from patients who had undergone cardio-thoracic surgery in the past.

Phylogeny
We identified SNP loci, relative to the M. chimaera reference strain DSM-44623, in the assembled bin of our isolates, as well as in some isolates analyzed by van Ingen et al [15] (see next section), and M. intracellulare strain MOTT-2. These SNPs were used for building a maximum likelihood phylogenetic tree (Figure 1a). The phylogenetic tree branching pattern agreed with the assignment of isolates by van Ingen et al's groupspecific SNP signatures [15]. Our samples have a wide genetic variability (Figure 1a). Sample M10, identified as M. intracellulare in all previous analyses, is most similar to other M. intracellulare genomes. Perhaps surprisingly, the same is true for sample M2 obtained from clinical sputum. Four of our samples, M6 -M9, were obtained from clinical samples clustered in one similar, yet not identical, group. The largest group of our samples, comprised of 12 samples, clustered together with isolates assigned by van Ingen et al [15] to group 1.
To explore the diversity within group 1, a separate maximum likelihood phylogenetic tree was built based on these isolates (Figure 1b). This tree too revealed a close agreement with SNP signatures. Most importantly, six isolates from HCUs clustered together with isolates from the outbreak subgroup described by van Ingen et al [15]. These isolates also belong to this subgroup based on their signature SNPs.

Validation of the results
To validate our procedures, we downloaded the raw genomic sequences of some isolates analyzed by van Ingen et al [15]], representing the full genetic variability in that study, and applied the same methods used on our isolates, including taxon annotation of short reads (not shown), de novo assembly followed by metagenomic binning and SNP identification. We used group-specific SNP signatures to assign these genomes back to their phylogenetic groups, and the original groups and subgroups were retrieved. Isolates that were part of the same branch in Ingen et al's phylogenetic analyses, exhibited similar clustering when analyzed with this study pipeline and used in phylogenetic trees (Figure 1).

Discussion
The aim of this study was to characterize the M. chimaera strains in Israel, in both HCUs and clinical samples, and find out whether the global outbreak strain is present among them. The mixed nature of samples in this study made it difficult to characterize the strains comprising them, and led us to use an un-orthodox path of bioinformatics analysis. The relative similarity between Mycobacterium genomes made the use of mapping to a reference genome inappropriate, since reads from one species of bacterium in the sample could be cross-mapped to a genome of a different bacterium. In addition, the mix of intra-species and inter-species variation may interfere with the identification of strains. In contrast, the use of assembled contigs, which are much longer, enabled the separation of sequences into bins representing the different bacteria. The agreement in genome size and GC content of the bins we annotated as M. chimaera, with those characteristics of known genomes of M. chimaera, supports this annotation. Moreover, our confidence in our results relies on the fact that we re-captured the results of van Ingen et al [15] for some of the isolates analyzed by them, using our unique analysis.
Many of the samples taken from HCU water contained Mycobacterium gordonae. Interestingly, other groups also found M. gordonae in cultures taken from HCU devices [17]. M. gordonae is considered a rapidly growing NTM, while M. chimaera is considered a slow-growing NTM. The culture medium used to grow Mycobacterium favors the growth of both species, and as M. gordonae grows faster, it is no surprise that it constituted the majority of sequences in some of our samples.
The global outbreak strain was found in some of the LivaNova/Sorin T3 HCUs in Israel but not in patients.
As far as we know, no case of M. chimaera infection was diagnosed in a patient who had undergone cardiothoracic surgery in Israel. This fortunate lack of infected patients, despite the presence of the outbreak strain in local HCU devices, maybe due to several factors. The location and orientation of the HCU has a major effect on the chance of infection. If the infected devices were placed outside the operation room, or even inside it but in an orientation such that the airflow direction is away from the surgery bed, this could diminish the infection risk [13]. Even in medical centers where cardio-thoracic surgery caused infections, these infections were very rare [24,25]. Lastly, several years can pass after the surgery before the infection manifests clinically [10,14].
Studies similar to ours were conducted globally [e.g. [11,[16][17][18]]. Our findings are in agreement with other studies, confirming the single strain of M. chimaera related to the outbreak and its common source.
Our study included nine Mycobacterium chimaera clinical isolates from lungs (sputum or thoracic biopsy). Lung NTM infections are increasing worldwide [26][27][28]. In a previous study, 28% of MAC infections identified in human pulmonary samples were caused by Mycobacterium chimaera [28]. The importance of this pathogen to human health is therefore beyond the cardio-thoracic surgery related outbreak, mostly regarding lung infection (for example [29][30][31]). Our knowledge of the population structure, mode of infection and virulence mechanism of this emerging pathogen is still lacking.

Bacterial isolation
Sputum and chest biopsy samples were decontaminated for 30 min at room temperature with a 1:1 volume of 4% NaOH or 1:4 volume of 4% H2SO4, respectively. DDW was added to stop the process and the samples were centrifuged at 3000g for 20 min and re-suspended in 5 mL of the supernatant. 0.5 mL of the processed sample was inoculated on solid Löwenstein-Jensen (LJ) medium and incubated at 30 °C until observation of growth (up to 8 weeks). 1 liter from each water sample was centrifuged at 3,000 g for 20 min, the supernatant discarded and about 4 mL of the remaining was decontaminated using 1:4 volume of 4% H2SO4 with gentle agitation for 15 minutes. Decontamination stop, inoculation and incubation of the water samples were identical to the treatment of the clinical samples. Acid fast staining confirmed positive cultures [32].

DNA extraction and species identification by Hain Lifescience assay kit
Crude DNA was extracted from the positive cultures by suspending a loop-full of bacteria from LJ medium in 300 µL water and heat inactivation for 45 min at 95°C, followed by 15 min of sonication in an ultrasonic bath and centrifuging at 13,000 rpm for 5 min. A 5-µL aliquot of the supernatant was used for molecular identification using the GenoType NTM-DR assay that was performed according to the manufacturer's instructions.

Genomic DNA isolation
Two different protocols were used for the isolation of genomic DNA. Manual RFLP-grade DNA extraction was done as previously described [33]. An automated DNA extraction was done by suspending a confluent portion of bacteria from LJ media in 400 µL TE, heat inactivated for 30 minutes at 90ºC followed by incubation with 1 mg/ml lysozyme at 37ºC overnight. 400 µL of the bacterial lysate was transferred to the automated MagNA Pure Compact system for DNA extraction according to the manufacturer's instructions (Roche Life Science).

Whole genome sequencing
For WGS the samples were send to Hylabs LTD, Israel. Libraries were prepared using the NEB Ultra DNA library prep kit. Ten of the samples were sequenced on Illumina HiSeq instrument and 13 of the samples were sequenced on Illumina MiSeq instrument. Both sequencing generated 2x150 bp paired end reads.

Metagenomic taxon annotation of reads
Annotating each short-read sequence to a taxon was done with Kaiju [34]. In essence, each read was compared to a database of publically available DNA sequences to assign a taxon to it. When a highresolution taxon level assignment was not possible, a lower resolution level was used. (For example, when it could not be decided to which species a read belongs, Kaiju would try to decipher to which genus it belongs. If a genus level annotation could not be made, a family level annotation would be tried, and so on).

Metagenomic binning
Short reads were de novo assembled into contigs using SPAdes v3.11.1 [35]. The Assemblies were uploaded to the Pathosystems Resource Integration Center (PATRIC, https://www.patricbrc.org/) [36], and PATRIC 'Metagenomic Binning' service was applied. To identify the species of each retrieved assembly bin, we used PATRIC service 'Similar Genome Finder', which implements the Mash tool [37]. The Mash algorithm represents each genome by a sketch, containing 1000 k-mers from this genome. The higher the similarity between two genomes, the more k-mers their sketches have in common.

SNP calling and genotype calling
The dnadiff command in the MUMmer suit [38] was used to align each genome assembly to the genome assembly of M. chimaera strain DSM-44623 (RefSeq accession NZ_CP015278.1), and find variant positions. For each sample, the output of this command was a list of variant positions, relative to the reference. SNPs were filtered if another SNP existed within a window of 12 bp in the same genome. We constructed a dataset of all the SNPs identified from all the isolates in this study, and from a selection of 40 isolates analyzed by van Ingen et al [15].
Since dnadiff only reports variant positions, it did not enable us to distinguish between cases where the sample genome is identical to the reference at a certain locus, to cases where this locus is not covered in the sample genome.
To get the genotype in each SNP locus in the dataset, in each genome assembly where this locus is covered, we applied the following procedure ( Figure 2): A) We created a 'dummy' reference genome, in which the original bases in the reference genome of strain DSM-44623 were changed in all SNP loci. All other loci in the 'dummy' reference genome are identical to the original reference genome. B) For each isolate, its assembly bin was aligned to the reference genome and SNPs were identified using dnadiff command in the MUMmer suit. For each SNP in the dataset, if the isolate differs from the reference genome in this base position, this base position appears in the output with the genotype call of the isolate. However, both SNP positions in which the isolate is identical to the reference genome, and SNP positions in regions where the two genomes did not align, would not appear in the output. To differentiate between the two possibilities, we applied the next step: C) Similar to the previous step, each isolate was aligned to the 'dummy' reference genome, and SNPs were identified. Since the 'dummy' genome differs from the true reference genome in all the SNPs in the dataset, for each SNP in the dataset where the isolate base is identical to the true reference genome, it differs from the 'dummy' reference genome and this base call appears in the output of dnadiff. D) Genotype call for each isolate from all dataset SNP loci was integrated from both B and C.

SNP based phylogenetic tree
After merging isolates of identical genotype, the SNPs were concatenated into a DNA sequence. A maximum likelihood phylogenetic tree was calculated using RAxML [39] with General Time Reversible model of nucleotide substitution under the Gamma model of rate heterogeneity with ascertainment bias correction, followed by 100 bootstrap iterations.

Group specific SNP signatures
Van Ingen et al identified SNPs signatures specific to some of their genotype groups and sub-groups and included them in table S2 in their paper [15]. We searched our genomes for these signatures. We compared the bases in table S2 from van Ingen et al to the genome sequences strains DSM-44623 and ZUERICH-1 (RefSeq accessions NZ_CP015278.1 and NZ_CP015272.1, respectively). We noticed a mistake in one of the SNPs in the table, the reference allele in position 4050336 is C, not G, and the table was corrected accordingly. SNP genotypes were called as described above. Isolates were identified which contain groupspecific SNP signatures. Figure 2. The process of genotype call. The dnadiff command in the MUMmer suit was used for genome alignment and SNP call between each isolate assembled genome and a reference genome. The SNPs found in all the samples were used to build one dataset of SNPs. In order to differentiate SNP loci were the isolate genome is identical to the reference genome, from loci not covered or not properly aligned in the isolate, each isolate was also compared to the 'dummy' reference genome. (a) The 'dummy' reference genome was created by changing the base in each of the dataset loci. (b) Positions from the dataset in which the isolate is different from the reference genome were identified when aligning the isolate genome to the reference genomes and the isolate base call in these positions was recorded. Position with a question mark are those where the isolate is either identical to the reference genome, or the two genomes did not align. (c) The next step was repeated using the 'dummy' reference genome. (d) The genotype calls from (b) and (c) are integrated so a final genotype call is given to all SNPs in the dataset were the isolate genome align with the reference genome.

Conclusions
The global outbreak strain was found in some of the LivaNova/Sorin T3 HCUs in Israel but not in patients. The use of metagenomic binning enabled strain identification from mixed cultures. The characterization of clinical M. chimaera isolates in Israel is important for our ability to surveil this emerging pathogen. : Table S1: The metagenomic bins and their quality.