Preprint
Article

This version is not peer-reviewed.

A Modern Framework for Identifying Novel Environmental Legionella Species

Submitted:

25 May 2026

Posted:

27 May 2026

You are already at the latest version

Abstract
Environmental surveillance of Legionella bacteria relies on standardized culture-based methods that enable reliable detection; however, accurate species-level identification remains challenging. Phenotypic, serological, and proteomic methods, including latex agglutination and MALDI-TOF mass spectrometry, enable rapid screening but often lack sufficient resolution to distinguish closely related species. This study proposes a structured framework for the identification and taxonomic characterization of environmental Legionella isolates. Environmental isolates were obtained using standard culture methods and initially identified by MALDI-TOF MS, followed by molecular characterization using PCR, 16S rRNA gene sequencing, and whole-genome sequencing. Phylogenomic analyses were conducted using overall genomic relatedness indices, including Average Amino Acid Identity (AAI) and Percentage of Conserved Proteins (POCP), and a phylogenomic tree was constructed based on 400 universal marker genes. The framework was applied to four environmental isolates (PATHC032, PATHC035, PATHC038, and PATHC039), including one previously described species, Legionella sheltonii. Results showed AAI values ranging from 94.3% to 96.6% and POCP values from 86.5% to 91.0%, supporting their genomic distinctiveness from any validly described Legionella species, which was confirmed by phylogenomic analysis. These findings demonstrate that the proposed framework enables consistent species-level identification and provides guidelines for recognizing candidate novel Legionella species through the integration of environmental surveillance and genome-based taxonomy.
Keywords: 
;  ;  ;  ;  

1. Introduction

The genus Legionella, the sole genus within the family Legionellaceae, comprises more than 60 validly described species (67), with new taxa continuing to be identified as environmental surveillance and genome-based taxonomy advance [1]. Since its first recognition during the investigation of a pneumonia outbreak in 1976, Legionella has been established as an environmentally ubiquitous bacterium of major public health relevance. Members of the genus are Gram-negative, strictly aerobic rods that require L-cysteine and iron salts for growth and are primarily associated with aquatic environments.
In natural and engineered water systems, Legionella species persist and replicate predominantly within free-living amoebae, an intracellular niche that enhances their ecological fitness and tolerance to environmental stressors such as disinfectants, temperature fluctuations, and nutrient limitation. Approximately one third of described Legionella species are considered opportunistic human pathogens capable of causing legionellosis, a spectrum of respiratory diseases ranging from Pontiac fever to severe Legionnaires’ disease. Infection typically occurs through inhalation of contaminated aerosols, linking disease risk directly to the presence and proliferation of Legionella in water systems. Consequently, proactive monitoring and effective control of Legionella in engineered environments remain essential components of public health prevention strategies.
Engineered water systems, including cooling circuits, potable water installations, and other complex aquatic infrastructures, provide favourable conditions for Legionella growth through biofilm formation, stable temperatures, and stagnant water. Routine environmental surveillance programs rely largely on standardized culture-based methods to detect and enumerate Legionella; however, these approaches offer limited resolution beyond genus-level identification. Phenotypic, serological, and proteomic methods, such as latex agglutination tests and MALDI-TOF mass spectrometry, enable rapid screening but frequently fail to reliably distinguish closely related species, particularly among non-Legionella pneumophila environmental isolates.
Recent genome-based studies, including our previous work, have demonstrated that environmental Legionella isolates may represent previously unrecognized species when assessed using whole genome sequencing and genome-wide relatedness metrics [2,3,4]. Despite the increasing accessibility of genomic tools, their application in routine environmental surveillance remains largely unstructured, and clear decision-making frameworks guiding their stepwise implementation are lacking.
The aim of this study was to develop a structured framework for the identification and taxonomic assessment of novel environmental Legionella species, similar like NOVA study [5]. The framework integrates standardized culture-based detection with proteomic, molecular, and genome-resolved analyses, providing a transparent and scalable pathway for species-level identification and recognition of candidate novel taxa. By aligning advanced genomic approaches with routine surveillance practice, this approach aims to enhance both the consistency of Legionella identification and the systematic exploration of environmental Legionella diversity.

2. Materials and Methods

A framework was developed and used for the identification and characterization of potentially novel environmental Legionella species (Figure 1).
Environmental water samples were collected from engineered water systems as part of routine Legionella surveillance programme.
The study focused exclusively on environmental Legionella isolates recovered from water systems and did not include clinical samples or bacterial genera other than Legionella.
Laboratory culture of samples followed the methods according to ISO/IEC 11731:2017 (Water quality – Enumeration of Legionella), as well as the procedures described in the ‘Procedures for the Recovery of Legionella from the Environment’ (January 2005, USDHHS, Public Health Service, CDC, Atlanta, GA) [4,5]. The samples were cultivated on buffered charcoal yeast extract (BCYE) media and glycinevancomycin-polymyxin-cycloheximide (GVPC) agar and incubated at 36 °C with 3% CO2 for 10 days. Presumptive Legionella colonies were further confirmed on culture media; BCYE agar Cys+ and BCYE Cys- agar in accordance with the ISO 11731 method in an accredited microbiological laboratory (HAA 1550).
Species identification of bacterial isolates was performed using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) (Bruker Daltonics, Bremen, Germany) with full extraction protocol as previously described [6,7]. Briefly, a loopful of a bacterial colony was suspended in 300 µl of deionized water and vortexed, after which 900 µl of absolute ethanol (Kemika, Croatia) was added. The suspension was centrifuged at 13,000 rpm for 2 min, the supernatant was decanted, and the resulting pellet was resuspended in an equal volume of 70% formic acid (Sigma Aldrich, Germany) and 100% acetonitrile (Fisher Chemical, Spain), followed by a second centrifugation at 13,000 rpm for 2 min. The supernatant was spotted onto a 96-spot polished steel target plate (Bruker Daltonik, Germany), air-dried at room temperature, and overlaid with 1 µl of a saturated solution of α-cyano-4-hydroxycinnamic acid (10 mg/ml; Bruker Daltonik, Germany) prepared in 50% acetonitrile and 2.5% trifluoroacetic acid. Mass spectra were acquired in positive linear ion mode within a mass range of 2–20 kDa. Spectral analysis and identification were performed using MBT Compass HT software version 5.0 (Bruker Daltonik) by comparison with the Bruker Biotyper reference database version 11. Species identification was interpreted based on log score values, where scores of 2.00–3.00 indicated high-confidence identification, scores of 1.70–1.99 indicated low-confidence identification, and scores below 1.70 were considered unreliable. Isolates yielding low-confidence identification scores were selected for further molecular and genomic analysis within the framework which is partial 16S rRNA gene PCR and sequence analysis.
PCR was carried out to amplify the 16S rRNA according to [10]. PCR products were visualized by gel electrophoresis on 1% agarose gel stained with Xpert Green DNA stain (GRISP Lda., Porto, Portugal), purified and sent to Macrogen Inc. for Sanger sequencing.
The obtained sequences were compared against 16S rRNA gene nucleotide databases available through the National Center for Biotechnology Information (NCBI) using the BLAST network service [11]. Sequence similarity was assessed by comparison with the closest validly published bacterial species. Bacterial species were considered validly described only if they were listed as validly published in the List of Prokaryotic Names with Standing in Nomenclature (LPSN), maintained by the German Collection of Microorganisms and Cell Cultures [1].
DNA extraction, sequencing, de novo hybrid assembly and genome annotation of Legionella strains were performed following the methods outlined in our previous studies [2,10]. Genomic DNA was extracted from a single bacterial colony grown on BYCE agar. The colony was resuspended in phosphate-buffered saline (PBS), washed three times, and DNA was isolated using the DNeasy Blood and Tissue Kit (Qiagen, Germany) with RNase treatment, following the manufacturer’s instructions. Genomic libraries for short-read sequencing were prepared using the Illumina Nextera DNA library preparation kit and sequenced on the Illumina MiSeq platform to generate 150 bp paired-end reads. Long-read sequencing was performed using the MinION Nanopore system (Oxford Nanopore Technologies, United Kingdom), with libraries prepared using the rapid 96 barcoding kit and sequenced on an R9.4.1 flow cell until sufficient data yield was obtained. Basecalling was carried out in real time using Guppy v5.0.11 with the high-accuracy model, and reads were exported in FASTQ format. Quality control and trimming of Illumina reads were performed using fastp (v0.20.1), while adapter trimming and quality assessment of Nanopore reads were conducted using Porechop (v0.2.4) and NanoPlot (v1.28.2), respectively. Hybrid de novo genome assembly was performed using Unicycler (v0.4.1) with default settings. Assembly quality was evaluated using QUAST (v5.2.0), and quality control reports were summarized with MultiQC. Functional genome annotation was performed using Prokka (Galaxy v1.14.6). Genome completeness and contamination were assessed using CheckM (v1.0.18, KBase), yielding values within acceptable thresholds for high-quality genome assemblies.
In our previous study, several genome-based typing analyses were performed to determine average nucleotide identity (ANI) [2]. Also, the calculation of genome-to-genome distance by in silico DNA– DNA hybridization was conducted against Legionella type strains. These analyses revealed four novel species candidates: PATHC032, PATHC035, PATHC038, and PATHC039. One of these, PATHC038, has already been validly described, Legionella sheltonii [3]. This study expands on those findings by applying phylogenomic analyses, including protein-based overall genomic relatedness indices (OGRIs) - average amino acid identity (AAI) and percentage of conserved proteins (POCP) to further resolve species boundaries for the remaining three isolates: PATHC032, PATHC039, and PATHC035 [12]. These metrics were applied in combination to evaluate taxonomic distinctiveness, particularly in cases where nucleotide-based thresholds alone may be inconclusive.
Whole-genome sequence data were uploaded to the Type (Strain) Genome Server (TYGS) for a whole-genome-based taxonomic analysis [13]. The analysis was performed using the current TYGS pipeline, including recently implemented methodological updates and features [14,15]. Taxonomic nomenclature, synonymy, and associated literature were obtained through the TYGS's sister database, List of Prokaryotic names with Standing in Nomenclature [14,15].
The identification of closely related type strains was carried out using two complementary approaches. First, all submitted genomes were compared against type strain genomes available in the TYGS database using the MASH algorithm, which provides a rapid estimate of intergenomic relatedness [16]. For each query genome, the ten type strains with the smallest MASH distances were selected. Second, an additional set of closely related type strains was identified based on 16S rRNA gene sequences. These sequences were extracted from the query genomes using RNAmmer [17] and compared against 16S rRNA gene sequences of all type strains in the TYGS database using BLAST [18].
The 50 best-matching type strains, ranked according to bitscore, were retained for each genome, and precise intergenomic distances were subsequently calculated using the Genome BLAST Distance Phylogeny (GBDP) approach with the coverage algorithm and distance formula d5 [19]. From these results, the ten closest type strain genomes for each of the submitted genomes were selected.
For phylogenomic inference, pairwise genome comparisons among all selected genomes were conducted using GBDP with the trimming algorithm and distance formula d5 [19]. One hundred distance replicates were calculated for each comparison. Digital DNA–DNA hybridization (dDDH) values and confidence intervals were estimated using the recommended settings of the GGDC 4.0 [14,19].
Phylogenetic relationships were inferred from the resulting intergenomic distances using a balanced minimum evolution tree generated with FASTME v2.1.6.1, including SPR post-processing [20]. Branch support was estimated from 100 pseudo-bootstrap replicates. Tree was rooted at the midpoint and visualized using PhyD3 [21,22].
Species-level clustering was performed using a type-based approach with a 70% dDDH threshold around each type strain [13], while subspecies clustering was determined using a 79% dDDH threshold [23].
The genomes and proteomes of isolates were evaluated alongside 36 Legionella type strains selected based on the availability and quality of publicly accessible genomic and proteomic data, representing the major phylogenetic diversity within the genus. For phylogenomic analyses, we ensured that the closest matches, as identified by MALDI-TOF MS, were included, enabling us to compare the proposed novel species with their closest relatives. The genome sequence data of reference type strains were obtained from the NCBI whole-genome sequencing (Table S1). AAI was calculated using the EzAAI pipeline [24]. POCP was calculated using the POCP-nf pipeline [25,26]. The de novo phylogenomic analyses were conducted with proteomic data using PhyloPhlAn 3.0 [27]. To reconstruct the phylogenomic tree, we used the supermatrix approach on 400 universal marker genes and 100 bootstrap replicates [28], with the diversity parameter set to "low" and other parameters set to their default values. The resulting tree was graphically adapted using iToL [29].
Isolates showing ANI values below 95% and dDDH values below 70%, while maintaining AAI values above 60–65% and POCP values above 50%, and forming distinct clusters, were considered candidates for novel taxa within the genus Legionella. Additionally, as a supplementary novel approach, we considered POCP values of about 90% or lower as indicative of significant species-level divergence.

3. Results

All isolates were successfully cultured under conventional aerobic conditions in accordance with ISO/IEC 11731:2017 and confirmed as members of the genus Legionella during the initial identification steps of the workflow. Preliminary identification using MALDI-TOF mass spectrometry enabled rapid genus-level assignment for all isolates.
Partial 16S rRNA gene sequencing confirmed affiliation with the genus Legionella for all isolates but did not provide sufficient resolution to reliably distinguish closely related species (Table 1). Consequently, these isolates progressed to genome-based analysis within the proposed guidelines.
Whole genome sequencing followed by comparative genomic analyses revealed clear differentiation among the four isolates. All isolates, except PATHC039, which approached the ANI threshold, exhibited levels of genomic divergence that fell outside accepted species delineation thresholds when compared with their closest validly described relatives. (Table 2).
To put the observed genomic divergence into the phylogenetic context, we conducted a whole-genome-based TYGS analysis. The results, particularly TYGS species clustering, suggest that isolates PATHC032, PATHC035, and PATHC039 likely represent novel species (Figure 2). The TYGS analysis showed that none of the isolates clustered with any currently described Legionella species. Instead, the isolates formed distinct and well-separated clusters, displaying levels of genomic divergence from their closest relatives comparable to those observed between already established Legionella species, supporting their classification as novel taxa within the genus (Figure 2, Supplementary File 1).
Protein-based OGRIs further supported these findings. AAI values ranged from 94.3% to 96.6%, closely matching ANI values, while POCP values ranged from 86.5% to 91.0%, indicating a substantial fraction of proteins lacking detectable homologs between genomes, consistent with distinct species-level divergence. Protein-based OGRIs analyses indicated that the three unresolved isolates were distinct from all currently validly described Legionella species, while maintaining relatedness consistent with placement within the genus (Table 2, Figure 3).
To contextualize the observed proteome-level divergence, we reconstructed a whole-proteome phylogenomic tree of the available Legionella type strains (Figure 4). Even at this more conservative level, the total amount of inferred evolutionary change separating the proposed novel species from their closest relatives is relatively large, comparable to some other pairs of sister species, e.g. L. rubrilucens and L. taurinensis, and L. gormanii and L. qingyii.
The overall topology of the PhyloPhlAn (proteomic) and TYGS (genomic) trees was largely congruent at the level of shallow and intermediate nodes, with both trees showing highly similar groups of species. However, some differences in deeper branching patterns were observed between the two approaches, suggesting that phylogenetic relationships among more distantly related Legionella lineages may still require further resolution through expanded taxon sampling and additional genome-based analyses. Regardless, similar compositions of shallow nodes in both analyses support the robustness of the phylogenomics approach for separating species, even if they do not recover identical phylogenies at deeper timescales.
These results, together with genome-wide comparisons, suggest that the three isolates represent separate Legionella species that have not yet been validly described.
Overall, the application of our proposed framework enabled clear separation of a known Legionella species from three environmentally derived isolates representing putative novel species, demonstrating the effectiveness of the stepwise approach for resolving taxonomic uncertainty in environmental Legionella surveillance.

4. Discussion

Routine environmental surveillance is essential for monitoring the presence of Legionella in engineered water systems; however, species-level identification remains challenging when relying exclusively on conventional laboratory methods. In this study, we present a structured framework that integrates standardized culture-based detection with genome-resolved analyses to improve taxonomic resolution of environmental Legionella isolates. Application of this workflow highlights both the limitations of routinely used identification tools and the added value of a stepwise genomic approach within environmental microbiology.
Culture-based isolation in accordance with ISO/IEC 11731 remains indispensable for the detection of Legionella in water samples and provides the foundation for downstream analyses. Consistent with previous reports, phenotypic confirmation and MALDI-TOF MS enabled rapid species-level identification of all isolates recovered in this study.
Despite yielding high-confidence MALDI-TOF MS identification scores (≥2.0), isolates were shown by genome-based analyses to be genomically distinct from their assigned reference species. This apparent discrepancy reflects intrinsic limitations of proteomic identification when applied to environmental Legionella isolates and does not represent a contradiction between methods.
MALDI-TOF MS identification relies on spectral similarity of a limited set of the dominant ribosomal protein profiles and provides the best possible match within the constraints of the reference database [29,30]. Consequently, a high-confidence score indicates strong similarity to the closest available reference spectrum rather than definitive species-level relatedness [30,31]. The limitations of MALDI-TOF MS are further exacerbated by the incomplete representation of environmental Legionella diversity in current reference databases, which are predominantly populated by clinically relevant species and type strains. Environmental isolates representing previously uncharacterized lineages are therefore forced to cluster with the most similar available reference, resulting in apparently high-confidence identifications that mask underlying genomic divergence. In contrast, whole genome–based metrics, including ANI, dDDH, AAI and POCP, evaluate relatedness across the entire genome or proteome and are internationally accepted standards for prokaryotic species delineation [12]. The consistently subthreshold values obtained for these indices provide robust evidence that the investigated isolates represent distinct genomic lineages within the genus Legionella, despite their proteomic similarity to known species. These findings highlight the necessity of genome-resolved analyses for accurate species-level assessment and support the use of MALDI-TOF MS as a rapid screening tool rather than a definitive taxonomic method in environmental Legionella surveillance.
Partial 16S rRNA gene sequencing provided additional confirmation of genus affiliation, but as expected, offered limited discriminatory power among closely related Legionella species. The high degree of sequence conservation within the genus restricts the utility of this marker for species delineation, particularly in environmental isolates that may represent previously uncharacterized taxa. Within our proposed framework, 16S rRNA gene analysis therefore functions primarily as a screening and decision-making step rather than a definitive identification tool.
Whole genome sequencing constituted the critical resolution step of the NOVA algorithm, enabling comprehensive assessment of genomic relatedness across multiple levels [5]. Nucleotide-based comparisons using ANI and dDDH demonstrated that all investigated isolates fell below established species delineation thresholds relative to their closest validly described relatives. Notably, the ANI value for isolate PATHC039 approached the ANI threshold, a scenario that complicates species assignment as genomic databases expand and phylogenetically proximate taxa are described. These findings illustrate the limitations of relying on single genomic metrics and support the use of complementary approaches for robust taxonomic interpretation.
To address this complexity, protein-based overall genome relatedness indices (OGRIs), including AAI and POCP, were applied in combination with phylogenomic analysis based on conserved marker genes. AAI values between PATHC032 and PATHC039 are close to the species threshold, suggesting they may represent closely related taxa or variants of L. pneumophila. The discrepancy between identity-based metrics (ANI, AAI) and gene-content similarity suggests that Legionella species share a conserved core gene set while maintaining species-specific adaptations, reflected in relatively low pairwise POCP values. These methods provided consistent and independent evidence for the genomic distinctiveness of isolates PATHC032, PATHC035, and PATHC039. While all isolates clearly belonged to the genus Legionella, their AAI and POCP values, together with their phylogenomic placement, indicate that isolates PATHC032, PATHC035, and PATHC039 represent distinct taxa. The concordance of nucleotide-based and protein-based OGRIs and phylogenomic results strengthens confidence in species-level inference and demonstrates the utility of integrated genomic analyses in environmental microbiology.
Importantly, our proposed framework is designed as an extension of NOVA algorithm. By defining transparent criteria for progression to genome-based analyses—such as low-confidence MALDI-TOF MS identification the workflow enables targeted use of sequencing resources while maintaining compatibility with existing monitoring programs.
The recovery of genomically distinct Legionella isolates from diverse engineered water systems and geographical settings highlights the substantial and likely underestimated diversity of environmental Legionella. Although the pathogenic potential of these putative novel species remains unknown, their identification contributes to a more comprehensive understanding of Legionella ecology and evolution in engineered environments. Systematic recognition of such taxa is an important prerequisite for future studies addressing virulence-associated traits, host interactions and public health relevance.
In summary, our proposed guidelines, similar to the conclusions of the NOVA study, provide a reproducible and scalable framework for improving the taxonomic resolution of environmental Legionella isolates. By integrating standardized culture-based methods with genome-resolved analyses, this approach supports accurate species-level placement and facilitates the systematic detection of novel Legionella taxa. As genomic technologies become increasingly accessible, structured workflows such as the one described here may play an important role in advancing environmental microbiology and water safety research.

5. Conclusions

The proposed framework bridges routine environmental monitoring and high-resolution microbial systematics. By combining standardized isolation methods with genome-based taxonomy, it enables reliable detection, accurate taxonomic placement, and identification of novel Legionella species. This framework may provide a valuable basis for future research into the virulence potential and pathogenic characteristics of environmental Legionella isolates.

Supplementary Materials

The following supporting information can be downloaded at the website of this paper posted on Preprints.org. Table S1: NCBI reference sequences and genome assemblies accession numbers of Legionella type strains used for phylogenomic analyses in this study, Supplementary File 1: TYGS job results.

Author Contributions

Sample preparation and data analysis, R.S.K., K.V. and M.K.; data curation/analysis and preparing figures and tables, N.K., K.V. and M.K.; design of experiments, collection and isolation of bacterial strains, and performing culture experiments, R.S.K., K.V. and M.K.; MALDI-TOF MS analysis, S.K.; writing—original draft preparation, R.S.K., K.V., N.K. and M.K.; conceptualization, supervision and revision of the manuscript, M.S., I.M., G.K., D.F., J.S., T.D.L. and B.G.S.; funding acquisition, R.S.K., M.S., I.M. and T.D.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially funded by the European Union – ‘NextGenerationEU’ grant (NPOO.C3.2.R2-I1.05.0010) and the Novo Nordisk Foundation grant NNF24SA0100980 to IM and the European Regional Development Fund, PK.1.1.10.0007, DATACROSS (T.D.-L.).

Data Availability Statement

Data is contained within the article or supplementary material. The original contributions presented in this study are included in the article/supplementary material. Further inquiries can be directed to the corresponding author.

Acknowledgments

During the preparation of this manuscript, the authors used Google Gemini for language editing, grammar correction, and text formatting assistance. The authors critically reviewed and edited all AI-generated content and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
MALDI-TOF Matrix-assisted laser desorption/ionization time-of-flight
BCYE Buffered charcoal yeast extract
GVPC Glycine-vancomycin-polymyxin-cycloheximide
ANI Average nucleotide identity
dDDH Digital DNA–DNA hybridization
AAI Average amino acid identity
POCP Percentage of conserved proteins
NCBI National Center for Biotechnology Information
TYGS Type (Strain) Genome Server

References

  1. Parte, A.C.; Sardà Carbasse, J.; Meier-Kolthoff, J.P.; Reimer, L.C.; Göker, M. List of Prokaryotic Names with Standing in Nomenclature (LPSN) Moves to the DSMZ. Int. J. Syst. Evol. Microbiol. 2020, 70, 5607–5612. [CrossRef]
  2. Svetlicic, E.; Jaén-Luchoro, D.; Klobucar, R.S.; Jers, C.; Kazazic, S.; Franjevic, D.; Klobucar, G.; Shelton, B.G.; Mijakovic, I. Genomic Characterization and Assessment of Pathogenic Potential of Legionella spp. Isolates from Environmental Monitoring. Front. Microbiol. 2023, 13. [CrossRef]
  3. Sauerborn Klobucar, R.; Kovačević, M.; Svetlicic, E.; Kasalo, N.; Vasari, K.; Santic, M.; Antonic, M.; Posta, A.; Ivančić Šantek, M.; Stopić, K.; et al. Legionella Sheltonii sp. nov., a Novel Species Isolated on a Cruise Ship during Routine Monitoring. Int. J. Syst. Evol. Microbiol. 2025, 75. [CrossRef]
  4. Cristino, S.; Pascale, M.R.; Marino, F.; Derelitto, C.; Salaris, S.; Orsini, M.; Squarzoni, S.; Grottola, A.; Girolamini, L. Characterization of a Novel Species of Legionella Isolated from a Healthcare Facility: Legionella Resiliens sp. nov. Pathogens 2024, 13, 250. [CrossRef]
  5. Muigg, V.; Seth-Smith, H.M.B.; Adam, K.-M.; Weisser, M.; Hinić, V.; Blaich, A.; Roloff, T.; Heininger, U.; Schmid, H.; Kohler, M.; et al. Novel Organism Verification and Analysis (NOVA) Study: Identification of 35 Clinical Isolates Representing Potentially Novel Bacterial Taxa Using a Pipeline Based on Whole Genome Sequencing. BMC Microbiol. 2024, 24, 14. [CrossRef]
  6. Inc, iTeh ISO 11731:2017 - Water Quality Legionella Enumeration Methods Available online: https://standards.iteh.ai/catalog/standards/iso/4d5f1cc4-844f-4fe6-a26d-d3011c32633c/iso-11731-2017 (accessed on 22 March 2026).
  7. Procedures for the Recovery of Legionella from the Environment Available online: https://stacks.cdc.gov (accessed on 22 March 2026).
  8. Pečur Kazazić, S.; Topić Popović, N.; Strunjak-Perović, I.; Florio, D.; Fioravanti, M.; Babić, S.; Čož-Rakovac, R. Fish Photobacteriosis—The Importance of Rapid and Accurate Identification of Photobacterium Damselae Subsp. Piscicida. J. Fish Dis. 2019, 42, 1201–1209. [CrossRef]
  9. Pascale, M.R.; Mazzotta, M.; Salaris, S.; Girolamini, L.; Grottola, A.; Simone, M.L.; Cordovana, M.; Bisognin, F.; Dal Monte, P.; Bucci Sabattini, M.A.; et al. Evaluation of MALDI–TOF Mass Spectrometry in Diagnostic and Environmental Surveillance of Legionella Species: A Comparison With Culture and Mip-Gene Sequencing Technique. Front. Microbiol. 2020, 11. [CrossRef]
  10. Yong, S.F.; Tan, S.H.; Wee, J.; Tee, J.J.; Sansom, F.M.; Newton, H.J.; Hartland, E.L. Molecular Detection of Legionella: Moving on From Mip. Front. Microbiol. 2010, 1. [CrossRef]
  11. BLAST: Basic Local Alignment Search Tool Available online: https://blast.ncbi.nlm.nih.gov/Blast.cgi (accessed on 22 March 2026).
  12. Riesco, R.; Trujillo, M.E. Update on the Proposed Minimal Standards for the Use of Genome Data for the Taxonomy of Prokaryotes. Int. J. Syst. Evol. Microbiol. 2024, 74, 006300. [CrossRef]
  13. Meier-Kolthoff, J.P.; Göker, M. TYGS Is an Automated High-Throughput Platform for State-of-the-Art Genome-Based Taxonomy. Nat. Commun. 2019, 10, 2182. [CrossRef]
  14. Meier-Kolthoff, J.P.; Carbasse, J.S.; Peinado-Olarte, R.L.; Göker, M. TYGS and LPSN: A Database Tandem for Fast and Reliable Genome-Based Classification and Nomenclature of Prokaryotes. Nucleic Acids Res. 2022, 50, D801–D807. [CrossRef]
  15. Freese, H.M.; Meier-Kolthoff, J.P.; Sardà Carbasse, J.; Afolayan, A.O.; Göker, M. TYGS and LPSN in 2025: A Global Core Biodata Resource for Genome-Based Classification and Nomenclature of Prokaryotes within DSMZ Digital Diversity. Nucleic Acids Res. 2026, 54, D884–D891. [CrossRef]
  16. Ondov, B.D.; Treangen, T.J.; Melsted, P.; Mallonee, A.B.; Bergman, N.H.; Koren, S.; Phillippy, A.M. Mash: Fast Genome and Metagenome Distance Estimation Using MinHash. Genome Biol. 2016, 17, 132. [CrossRef]
  17. Lagesen, K.; Hallin, P.; Rødland, E.A.; Stærfeldt, H.-H.; Rognes, T.; Ussery, D.W. RNAmmer: Consistent and Rapid Annotation of Ribosomal RNA Genes. Nucleic Acids Res. 2007, 35, 3100–3108. [CrossRef]
  18. Camacho, C.; Coulouris, G.; Avagyan, V.; Ma, N.; Papadopoulos, J.; Bealer, K.; Madden, T.L. BLAST+: Architecture and Applications. BMC Bioinformatics 2009, 10, 421. [CrossRef]
  19. Meier-Kolthoff, J.P.; Auch, A.F.; Klenk, H.-P.; Göker, M. Genome Sequence-Based Species Delimitation with Confidence Intervals and Improved Distance Functions. BMC Bioinformatics 2013, 14, 60. [CrossRef]
  20. Lefort, V.; Desper, R.; Gascuel, O. FastME 2.0: A Comprehensive, Accurate, and Fast Distance-Based Phylogeny Inference Program: Table 1. Mol. Biol. Evol. 2015, 32, 2798–2800. [CrossRef]
  21. Farris, J.S. Estimating Phylogenetic Trees from Distance Matrices. Am. Nat. 1972, 106, 645–668. [CrossRef]
  22. Kreft, Ł.; Botzki, A.; Coppens, F.; Vandepoele, K.; Van Bel, M. PhyD3: A Phylogenetic Tree Viewer with Extended phyloXML Support for Functional Genomics Data Visualization. Bioinformatics 2017, 33, 2946–2947. [CrossRef]
  23. Meier-Kolthoff, J.P.; Hahnke, R.L.; Petersen, J.; Scheuner, C.; Michael, V.; Fiebig, A.; Rohde, C.; Rohde, M.; Fartmann, B.; Goodwin, L.A.; et al. Complete Genome Sequence of DSM 30083T, the Type Strain (U5/41T) of Escherichia Coli, and a Proposal for Delineating Subspecies in Microbial Taxonomy. Stand. Genomic Sci. 2014, 9, 2. [CrossRef]
  24. Kim, D.; Park, S.; Chun, J. Introducing EzAAI: A Pipeline for High Throughput Calculations of Prokaryotic Average Amino Acid Identity. J. Microbiol. 2021, 59, 476–480. [CrossRef]
  25. Qin, Q.-L.; Xie, B.-B.; Zhang, X.-Y.; Chen, X.-L.; Zhou, B.-C.; Zhou, J.; Oren, A.; Zhang, Y.-Z. A Proposed Genus Boundary for the Prokaryotes Based on Genomic Insights. J. Bacteriol. 2014, 196, 2210–2215. [CrossRef]
  26. Hölzer, M. POCP-Nf: An Automatic Nextflow Pipeline for Calculating the Percentage of Conserved Proteins in Bacterial Taxonomy. Bioinformatics 2024, 40, btae175. [CrossRef]
  27. Asnicar, F.; Thomas, A.M.; Beghini, F.; Mengoni, C.; Manara, S.; Manghi, P.; Zhu, Q.; Bolzan, M.; Cumbo, F.; May, U.; et al. Precise Phylogenetic Analysis of Microbial Isolates and Genomes from Metagenomes Using PhyloPhlAn 3.0. Nat. Commun. 2020, 11, 2500. [CrossRef]
  28. Segata, N.; Börnigen, D.; Morgan, X.C.; Huttenhower, C. PhyloPhlAn Is a New Method for Improved Phylogenetic and Taxonomic Placement of Microbes. Nat. Commun. 2013, 4, 2304. [CrossRef]
  29. Letunic, I.; Bork, P. Interactive Tree Of Life (iTOL) v5: An Online Tool for Phylogenetic Tree Display and Annotation. Nucleic Acids Res. 2021, 49, W293–W296. [CrossRef]
  30. Kassim, A.; Pflüger, V.; Premji, Z.; Daubenberger, C.; Revathi, G. Comparison of Biomarker Based Matrix Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry (MALDI-TOF MS) and Conventional Methods in the Identification of Clinically Relevant Bacteria and Yeast. BMC Microbiol. 2017, 17, 128. [CrossRef]
  31. Cuénod, A.; Foucault, F.; Pflüger, V.; Egli, A. Factors Associated With MALDI-TOF Mass Spectral Quality of Species Identification in Clinical Routine Diagnostics. Front. Cell. Infect. Microbiol. 2021, 11. [CrossRef]
  32. Karger, A.; Stock, R.; Ziller, M.; Elschner, M.C.; Bettin, B.; Melzer, F.; Maier, T.; Kostrzewa, M.; Scholz, H.C.; Neubauer, H.; et al. Rapid Identification of Burkholderia Mallei and Burkholderia Pseudomallei by Intact Cell Matrix-Assisted Laser Desorption/Ionisation Mass Spectrometric Typing. BMC Microbiol. 2012, 12, 229. [CrossRef]
Figure 1. Framework for identification of novel environmental Legionella species.
Figure 1. Framework for identification of novel environmental Legionella species.
Preprints 215197 g001
Figure 2. TYGS GBDP tree illustrating the relationships between the genomes of Legionella isolates PATHC032, PATHC035, and PATHC039 and the reference genomes. Figure was generated by TYGS [13]. Tree inferred with FastME 2.1.6.1 [20] from GBDP distances calculated from genome sequences. The branch lengths are scaled in terms of GBDP distance formula d5. The numbers above branches are GBDP pseudo-bootstrap support values > 60 % from 100 replications, with an average branch support of 57.1 %. The tree was rooted at the midpoint [21].
Figure 2. TYGS GBDP tree illustrating the relationships between the genomes of Legionella isolates PATHC032, PATHC035, and PATHC039 and the reference genomes. Figure was generated by TYGS [13]. Tree inferred with FastME 2.1.6.1 [20] from GBDP distances calculated from genome sequences. The branch lengths are scaled in terms of GBDP distance formula d5. The numbers above branches are GBDP pseudo-bootstrap support values > 60 % from 100 replications, with an average branch support of 57.1 %. The tree was rooted at the midpoint [21].
Preprints 215197 g002
Figure 3. Heatmap of pairwise AAI (a) and POCP (b) values of isolates PATHC032, PATHC039, PATHC035, and 36 Legionella type strains. Purple color indicates higher genomic similarity.
Figure 3. Heatmap of pairwise AAI (a) and POCP (b) values of isolates PATHC032, PATHC039, PATHC035, and 36 Legionella type strains. Purple color indicates higher genomic similarity.
Preprints 215197 g003
Figure 4. Phylogenomic tree of 36 Legionella type strains and three uncharacterized isolates. The tree was constructed in PhyloPhlAn 3.0 [27] using the supermatrix approach on 400 universal marker genes. Diversity parameter set to "low" and other parameters set to their default values. The tree was rooted at the midpoint.
Figure 4. Phylogenomic tree of 36 Legionella type strains and three uncharacterized isolates. The tree was constructed in PhyloPhlAn 3.0 [27] using the supermatrix approach on 400 universal marker genes. Diversity parameter set to "low" and other parameters set to their default values. The tree was rooted at the midpoint.
Preprints 215197 g004
Table 1. List of 4 environmental Legionella isolates representing novel taxa. These results have been adapted from our previous study [2].
Table 1. List of 4 environmental Legionella isolates representing novel taxa. These results have been adapted from our previous study [2].
Sample ID Number MALDI biotyper identification MALDI score GenBank accession number Isolation source
whole-genome sequences 16S rRNA genes
PATHC032 L. pneumophila 2.19 GCF_026191185.1 PZ416333 Hot water tap
PATHC039 L. pneumophila 2.22 GCF_026191275.1 PZ416335 Shower
PATHC035 L. cherrii 2.3 GCF_026191115.1 PZ416334 Sink
PATHC038
(L. sheltonii)
L. cherrii 1.84 GCF_026191355.1 PQ120583 Hose bib
Table 2. Summary of in silico novel species proof test for the isolates determined to be new species candidates. The data for nucleotide based OGRIs (fastANI, ANIb, dDDH) were taken from our previous study [2].
Table 2. Summary of in silico novel species proof test for the isolates determined to be new species candidates. The data for nucleotide based OGRIs (fastANI, ANIb, dDDH) were taken from our previous study [2].
Isolate name Closes hit fastANI ANIb dDDH AAI POCP
PATHC032 L. pneumophila 93,51 93,09 52,3 94,3 90,4
PATHC039 L. pneumophila 95,97 95,80 68,1 96,6 87,9
PATHC035 L. cherrii 94,3 93,97 56,2 95,5 91,0
PATHC038
(L. sheltonii)
L. cherrii 93,77 93,94 54,7 95,1 86,5
* Closest hit columns indicate the species most closely related as determined by fastANI, ANIb and Digital-DNA–DNA Hybridization (dDDH). For ANI-based search, a novel species threshold is similarity below 95%, while dDDH has a threshold of 70%.[2]; *thresholds for genus delineation -> AAI (>60–65%); POCP (>50%).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated