Preprint
Article

This version is not peer-reviewed.

Bioinformatic Identification of CRISPR-Cas Systems in Leptospira: An Update on Their Distribution Across 77 Species

A peer-reviewed article of this preprint also exists.

Submitted:

11 September 2025

Posted:

12 September 2025

You are already at the latest version

Abstract
Leptospirosis is a globally distributed zoonotic disease caused by pathogenic bacteria of the Leptospira genus. Genome editing in Leptospira has been difficult to perform. Currently, the functionality of the CRISPR-Cas system has been demonstrated in species such as Leptospira interrogans. However, the different CRISPR-Cas systems present in most of the 77 species are unknown. Therefore, the objective of this study was to identify the CRISPR-Cas systems present in the genomes of the Leptospira genus using bioinformatics tools. Methods: bioinformatics workflow was followed: the genomes were downloaded from the NCBI database, Cas proteins detection was carried out using the CRISPR-CasFinder and RAST web servers, functional analysis of Cas proteins (InterProScan, ProtParam, Swiss Model, Alphafold3, Swiss PDB Viewer, and Pymol), conservation pattern detection (MEGA12, and Seqlogos), spacer identification (Actinobacteriophages db and BLAST), and bacteriophage detection (Phaster, and Phastest). Results: Cas proteins were detected in 36/77 species of the Leptospira genus, these proteins were (Cas1-Cas9, and Cas12). The proteins were classified into class 1 and class 2 systems, and types I, II, and V. Direct repeats and spacers were detected in 19 species. The direct repeats presented two nucleotide conservation motifs. With the spacer sequences, 270 different bacteriophages were identified. Three intact bacteriophages were detected in the genome of four Leptospira species. Two saprophytic species have complete CRISPR-Cas systems. Conclusions: The presence of Cas proteins, direct repeats, and spacer sequences homologous to bacteriophage genomes suggests a functional CRISPR-Cas system in at least 19 species.
Keywords: 
;  ;  ;  ;  ;  ;  ;  ;  

1. Introduction

Leptospirosis is a globally distributed zoonosis caused by pathogenic bacteria of the Leptospira genus [1]. This zoonotic disease affects both wild and domestic animals. The bacteria can survive in moist soil and environmental water sources for long periods, and humans are considered accidental hosts of the bacteria [2]. Leptospirosis is considered a neglected disease that is mainly reported in tropical regions of developing countries [3]. Outbreaks of disease have been associated with animal handling, flooding, environmental disasters, extreme climate change, water sports, and prolonged exposure to contaminated environmental water sources and wet soils [4]. Additionally, in endemic regions, severe forms such as Weil’s disease and severe pulmonary hemorrhage syndrome have emerged as the leading cause of fatal cases [5]. Epidemiological studies estimate that ~1.03 million cases and 58,900 deaths occur due to leptospirosis worldwide annually [6]. At present, the Leptospira genus comprises 77 genomic species, which are distributed into pathogenic subgroups (p1 and p2) and saprophytic subgroups (s1 and s2) according to genomic classification [7,8,9,10,11,12].
Genome editing in Leptospira genus has been a significant challenge. Genome editing in Leptospira is challenging due to slow growth in culture and demanding nutritional requirements [13]. Various approaches, including random transposon mutagenesis [14,15,16,17], suicide plasmids [18], allelic exchange [19], and shuttle vectors [20], have been applied in Leptospira. However, these methods remain inefficient, costly, and technically demanding [13]. This limitation hampers the study of hypothetical proteins, pathogenicity mechanisms, and virulence factors. It also restricts the identification of essential proteins for drug and vaccine development. Thus, generating knockout mutant strains in Leptospira continues to be a major challenge.
However, the CRISPR-Cas system (Clustered Regularly Interspaced Palindromic Repeats - associated with Cas proteins) has emerged as a highly specific, efficient, versatile, and cost-effective genome editing tool [21,22,23]. Additionally, multiple classes, types, subtypes, variants, and molecular targets of CRISPR-Cas systems have been described in other bacterial species [24,25,26]. The CRISPR-Cas system functions as an adaptive immune mechanism in prokaryotes, preventing infection by phages and plasmids. Immunological memory is stored as short DNA fragments (spacers) integrated into the chromosome, regularly separated by direct repeats to form CRISPR arrays adjacent to a leader sequence. These regions are close to the genes that encode Cas proteins [27,28,29]. Cas proteins detect, cleave, and integrate foreign DNA from phages or plasmids, providing molecular targets for defense during future infections [30,31,32]. The CRISPR-Cas system has three functional steps: adaptation (also called insertion or acquisition), in which the foreign genetic material of bacteriophages or plasmids is recognized, cut into small fragments, and incorporated into the CRISPR array near the leader sequence. Expression: this process consists of the transcription of the CRISPR microarray locus into immature RNA precursors (pre-crRNA) and their subsequent processing into mature RNA (cr-RNA). Interference: this involves the binding of mature RNAs (crRNA) to multiprotein effector complexes or single effector proteins whose function is to recognize an identical or similar sequence in the genome of invading viruses or plasmids to cleave and inactivate them [33,34]. This system has been emulated, modified, and adapted as a genetic editing tool in prokaryotic and eukaryotic cells [35,36]. Therefore, it is interesting to explore the presence and diversity of these endogenous systems in the genomes of the Leptospira genus, as it could revolutionize our understanding of Leptospira biology and potentially lead to the development of new treatments for leptospirosis.
The methodology of gene silencing using the CRISPR-Cas system in the genus Leptospira has made great progress in recent years, with the detection of subtypes IB and IC in pathogenic species [37,38,39]. Additionally, it has been shown that these subtypes are transcriptionally active [40], form interference complexes, and different protospacer adjacent motif (PAM) have been evaluated to optimize cutting sites [41], process immature RNAs (Pre-crRNA) [42]. Additionally, the Cas1, Cas2, Cas4, Cas5, and Cas6 proteins have been biologically and functionally characterized [37,38,39,40,42,43,44]. Initial gene silencing attempts failed because Cas9-induced double-strand breaks were lethal to Leptospira cells [13,45,46]. Therefore, techniques such as CRISPR interference (CRISPRi) [13], CRISPR-Cas9/non-homologous end-joining components (NHEJ) [45], and CRISPR-prime editing [46] were developed, which were successful in creating knock-out mutants of multiple proteins in pathogenic and saprophytic species. Interestingly, the analysis of the spacer sequences detected in the genomes of Leptospira interrogans corresponds to mobile genetic elements, indicating their functionality and importance in the adaptive immune defense of the bacteria [47]. In parallel, applications based on the components of the CRISPR-Cas system have been used as diagnostic tools (CRISPR dFnCas9-based quantitative lateral flow immunoassay, and CRISPR/Cas12a platform combined with isothermal amplification) [48]. Additionally, the nucleotide variability of CRISPR-Cas systems in the Leptospira genus has been used as a taxonomic tool for the identification and differentiation of species and serovars in 18 Leptospira species [47].
However, there is little information about the presence, diversity, and functionality of the CRISPR-Cas machinery (CAS proteins, spacers, direct repeats, leader sequence, CRISPR arrays, crRNA, and Protospacer Adjacent Motif (PAM)) in all species of the Leptospira genus. Therefore, this study aimed to identify CRISPR-Cas systems across all 77 recognized Leptospira species. This knowledge could have a significant impact on our understanding of Leptospira biology, helping us to optimize the gene editing process and understand the biological function of thousands of proteins that remain hypothetical in the Leptospira genus.

2. Materials and Methods

2.1. Downloading Reference Genomes

A reference genome of each of the 77 Leptospira species was downloaded from the NCBI-Taxonomy database (https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=171, accessed on 15 July 2025). These genomes were grouped according to the species subgroups (p1, p2, s1, and s2). The sequences of the two chromosomes (three in some saprophytic species) were concatenated with the (CAT) command of the Linux operating system to obtain a continuous linear sequence (Ubuntu 24.04 LTS). The genomes were organized by species subgroups, and the genomic characteristics were identified (species names, clade, accession number, genome size (Mb), GC content (%), genes, proteins, and non-coding sequence) using the bioinformatics program Prokka V1.14.5 (https://github.com/tseemann/prokka) [49]. Genomic characteristics were compared between different species to determine the average, highest and lowest values.

2.2. Cas Proteins Detection

The concatenated sequences of each of the 77 species of the Leptospira genus were analyzed using the web server RAST (Rapid Annotations using Subsystem Technology) (https://rast.nmpdr.org/rast.cgi, accessed on 15 July 2025) with the aim of identifying genes encoding Cas proteins, spacers and direct repeats of the CRISPR-Cas system. Each sequence was analyzed independently, and the results of the 77 species were consolidated in a database using the Excel program (Microsoft®—Microsoft 365). The results were analyzed, and the genes were grouped according to the categories: Cas protein, spacer, or direct repeats. The RAST web server offers an analysis that aims to determine gene functions and metabolic pathways by comparing the existing annotated genomes. The results of the server are based on subsystems and protein families derived from the latter that are manually curated to produce assertions, with the latter being the basis for the metabolic reconstructions and gene functions maintained within the SEED integration [50,51,52]. The concatenated sequences of each of the 77 Leptospira species were analyzed using the web server CRISPRCasFinder (https://crisprcas.i2bc.paris-saclay.fr, accessed on 15 July 2025) with the aim of identifying genes related to CRISPR-Cas system. Each sequence was analyzed independently, and the results of the 77 species were consolidated in a database using the Excel program (Microsoft®—Microsoft 365). The results were analyzed, and the genes were grouped according to the categories: Cas protein, spacer, or direct repetitions. The CRISPRCasFinder program enables the simple detection of CRISPR and Cas genes in user-supplied sequence data. This is an update to the CRISPRFinder program with increased specificity and CRISPR targeting. MacSyFinder is used to identify Cas genes, CRISPR-Cas type, and subtype [53].

2.3. Analysis of the Biological Function of Cas Proteins

Using the BLAST algorithm version 1.4.0 (https://blast.ncbi.nlm.nih.gov/Blast.cgi, accessed on 15 July 2025) of the NCBI database and the sequences of the genes associated with the CRISPR-Cas system, the sequences of the encoded proteins were identified and downloaded [54]. Using the protein sequence and the InterProScan bioinformatics program (https://www.ebi.ac.uk/interpro/search/sequence/, accessed on 15 July 2025), the protein domains, biological processes, molecular function, and cellular components of the different proteins were identified [55]. Finally, the biological information available for each protein in the Panther GO Terms (https://pantherdb.org/, accessed on 15 July 2025) and GeneCards databases (https://www.genecards.org/, accessed on 15 July 2025) was consulted. In this way, information on the different CRISPR-Cas proteins was collected. The biological parameters (amino acids number, isoelectric point, molecular weight, half-life, instability index) of the proteins were calculated using the ProtParam web server (https://web.expasy.org/protparam/, accessed on 15 July 2025). The three-dimensional structure of proteins was modeled using the SwissModel web server (https://swissmodel.expasy.org, accessed on 15 July 2025) [56], and AlphaFold3 Server (https://alphafoldserver.com/about, accessed on 15 July 2025) [57]. The visualization of the structure was carried out using the Swiss-pdb viewer Version 4.1 (http://spdbv.unil.ch, accessed on 15 July 2025) [58] and Pymol Version 2.6 (https://www.pymol.org/, accessed on 15 July 2025) [59] bioinformatic programs.

2.4. Classification of Cas Proteins into Functional Stages

Detected CRISPR-Cas proteins were classified into three functional stages according to their described roles in the literature: adaptation or integration of spacers, expression or processing of pre-crRNA, and interference according to their participation in the effector complex or target cleavage, based on the currently known biological functions for each protein in the CRISPR-Cas system.

2.5. Conservation Patterns of Direct Repeats in the Leptospira Genus

A search for direct repeats was performed in the 77 species of the Leptospira genus. The following direct repeats were found according to the subgroups: subgroup p1 (332 sequences), subgroup p2 (48 sequences), subgroup s1 (26 sequences), and subgroup s2 (18 sequences). A custom database was compiled and an alignment was performed using the bioinformatics program MEGA version 12 (https://www.megasoftware.net/) [60]. The alignment was subsequently viewed using the SeqLogos web server (http://imed.med.ucm.es/Tools/seqlogo.html, accessed on 15 July 2025) to identify the nucleotide frequencies in the spacer sequences [61].

2.6. Bioinformatic Identification of the Immunological Memory (Spacer)

The actinobacteriophage database at phagesDB.org (https://phagesdb.org/, accessed on 15 July 2025) was used to identify the bacteria’s immunological memory against different bacteriophages [62]. For this analysis, spacer sequences detected in the 77 genomes using RAST (Rapid Annotations using Subsystem Technology) and CRISPRCasFinder web servers were used.

2.7. Identification of Unique Effector Proteins

The genomes of the 77 Leptospira species were scanned using the BLAST algorithm version 1.4.0 (https://blast.ncbi.nlm.nih.gov/Blast.cgi, accessed on 15 July 2025) of the NCBI database to detect the Cas9, Cas12, and Cas13 proteins, which are unique effector proteins in CRISPR-Cas systems [54].

2.8. Bioinformatic Detection of Intact Bacteriophages in Genomes

The 77 genomes of the Leptospira species were scanned using the bioinformatics program PHASTER (Phage Search Tool Enhanced Release) (https://phaster.ca/, accessed on 15 July 2025), this software is a significant upgrade to the popular PHAST web server for the rapid identification and annotation of prophage sequences within bacterial genomes and plasmids, and PHASTEST (Phage Search Tool with Enhanced Sequence Translation) is a web server designed to support the rapid identification, annotation and visualization of prophage sequences within bacterial genomes and plasmids (https://phastest.ca/, accessed on 15 July 2025 ) [63,64,65].

3. Results

3.1. Downloading Reference Genomes

In this research, we analyzed 77 reference genomes for the Leptospira genus. Finding a distribution between the species subgroups of p1 (21 species), p2 (22 species), s1 (29 species), and s2 (5 species). Regarding the genomic sequencing process, we found 14 genomes assembled at the chromosome level, 8 genomes at the scaffold level, and 55 genomes at the contig level. In relation to the genomic size in Mega-bases (Mb), we found that the range oscillates between (3.7 - 4.9 Mb). The largest genome was (L. ainazelensis with 4.9 Mb) and the smallest genomes were (L. fletcheri and L. fluminis with 3.7 Mb). The GC content of the genomes ranged between 35 and 47.5 GC%. Regarding the gene content of the different species, this ranged between 3409 and 4789 genes. The species with the highest number of genes was L. adleri (4789 genes), and the species with the lowest number of genes was L. fletcheri (3409 genes). Regarding the number of proteins, a range was found between (3326-4626 proteins). The species with the most proteins was L. adleri (4626 proteins), and the species with the fewest proteins was Leptospira fletcheri (3326 proteins). Finally, with respect to pseudogenes, a range of 39 – 352 pseudogenes were found. The species with the most pseudogenes were L. mayottensis (352 pseudogenes), and the species with the fewest pseudogenes were L. ryugenii and L. ellinghausenii (39 pseudogenes) (Table 1).

3.2. Cas Proteins Detection

The primary objective of this research was to identify proteins from the Leptospira genus’ reference genomes that belong to the CRISPR-Cas system and to classify them based on their class, type, subtype, genetic variants, and molecular target. Regarding the presence of Cas proteins in the 77 genomes analyzed, they were detected bioinformatically in 36/77 species of the Leptospira genus. The number of species containing Cas proteins in the subgroups was: subgroup p1 (15 species), subgroup p2 (11 species), subgroup s1 (7 species), and subgroup s2 (3 species). In the 36 species that have genes encoding Cas proteins, 19 species had multiple Cas proteins, and 17 species only had the Cas3 protein (however, 3/17 species had Cas3 and the Cas3a variant). Regarding class classification, 32 species are classified as class 1, and 4 species as class 2 (L. gorisiae, L. fletcheri, L. inadai, and L. ilyithenensis). However, L. gorisiae and L. inadai exhibit proteins from both classes 1 and 2. Regarding the types, proteins belonging to types I, II, III, and V were detected. Regarding subtypes, proteins belonging to subtypes IA, IB, IE, IC, IIB, and U were detected. About the variants, proteins Cas3a, Cas5a2, Cas5b, Cas5c, Cas7b, Cas7c, Cas8c, and Cas8a1a3 were detected. Respecting the molecular target, 35 species have proteins that cut DNA molecules, and only one species presented a protein with the ability to cut RNA molecules (L. haakeii - csm2_TypeIIIA). This study reports, for the first time, the presence of CRISPR-Cas systems in two saprophytic species, L. ilyithenensis and L. ryugenii, both belonging to the s2 subgroup (Table 2).

3.3. Analysis of the Biological Function of Cas Proteins

Currently, 13 Cas proteins (Cas1–Cas13) have been described as components of CRISPR-Cas systems. Analysis of the 77 Leptospira genomes revealed 10 of the 13 known Cas proteins (Cas1, Cas2, Cas3, Cas4, Cas5, Cas6, Cas7, Cas8, Cas9, and Cas12). Cas10, Cas11, and Cas13 proteins were not detected in the analyzed genomes. Regarding the presence of Cas proteins in the different species, we find that: Cas1 (18 species), Cas2 (19 species), Cas3 (35 species), Cas4 (10 species), Cas5 (17 species), Cas6 (14 species), Cas7 (17 species), Cas8 (5 species), Cas9 (1 species), and Cas12 (3 species). The Cas3 protein was the most frequent among the 36 Leptospira species with Cas proteins encoded in their genomes. The least frequent proteins were the unique effector proteins Cas9 and Cas12.
To investigate the biological functions, structural features, and functional domains of Cas proteins, representative proteins from selected Leptospira species were analyzed. Cas1 from L. interrogans functions as a DNA nuclease involved in the adaptation stage. It mediates spacer integration into the CRISPR array. It is associated with defense against bacteriophages and plasmids, as well as the maintenance of CRISPR repeat elements. Cas1 consists of 254 amino acids, with a predicted molecular weight of 29.8 kDa and an isoelectric point of 9.51. Its estimated half-life in bacteria exceeds 10 h. The instability index (43.37) classifies it as unstable. Structurally, it contains 8 β-sheets, 18 α-helices, and a Cas1_I-II-III domain that may mediate DNA integration into the CRISPR array (Figure 1).
Cas2 (Leptospira interrogans) is a protein with a biological function as an RNA nuclease and participates in the functional stage of adaptation. It facilitates the integration of spacers into the CRISPR matrix. It is related to the biological processes of defense response against bacteriophages, plasmids, and the maintenance of the repetitive elements of the CRISPR matrix. Cas2 is composed of 90 amino acids, with a predicted molecular weight of 10.3 kDa and an isoelectric point of 8.49. Its estimated bacterial half-life exceeds 10 h, and its instability index (31.57) classifies it as stable. Structurally, it contains 10 β-sheets, 4 α-helices, and an SSF:TTP0101/SSO1404-like domain. Cas2 proteins have been characterized as either endoribonucleases (ssRNA) or endodeoxyribonucleases (dsDNA), depending on the CRISPR system (Figure 1).
Cas3 (Leptospira interrogans) is a protein with a biological function as a DNA nuclease and helicase and participates in the functional stage of interference (target cleavage). It is related to the biological processes of defense response against bacteriophages and plasmids, ATP binding, hydrolase activity, and DNA binding. The protein is composed of 377 amino acids, has an isoelectric point of 8.59, a molecular weight of 43082.83 Daltons, a bioinformatically calculated half-life for bacteria of more than 10 hours, and has an instability index of 37.43, which classifies it as stable. At the structural level, it is composed of 6 β-sheets and 21 α-helices. Cas3 proteins have motifs characteristic of helicases from superfamily 2 and contain a DEAD/DEAH box region and a conserved C-terminal domain. The Cas3-type HD domain has nuclease activity against ssDNA and ssRNA (Figure 1).
Cas4 (Leptospira interrogans) is a protein with a biological function as a DNA nuclease and participates in the functional stage of Adaptation (Spacer Integration). It is related to the biological processes of defense response against bacteriophages and plasmids. The protein is composed of 142 amino acids, has an isoelectric point of 8.95, a molecular weight of 16606.38 Daltons, a bioinformatically calculated half-life of more than 10 hours, and has an instability index of 50.59, which classifies it as unstable. At the structural level, it is composed of 6 β-sheets and 6 α-helices. Cas4 endonuclease activity contributes to spacer acquisition by processing foreign DNA ends prior to integration. Cas4 is a 5’ to 3’ single-stranded DNA exonuclease (Figure 1).
Cas5 (Leptospira interrogans) participates in the functional stage of Interference (Effector Complex). It is related to the biological processes of defense response against bacteriophages and plasmids, Maintenance of CRISPR repeat elements, and RNA binding. The protein is composed of 234 amino acids, has an isoelectric point of 7.72, a molecular weight of 27128.07 Daltons, a bioinformatically calculated half-life of more than 10 hours, and has an instability index of 46.99, which classifies it as unstable. At the structural level, it is composed of 10 β-sheets and 3 α-helices. Cas5 helps process or stabilize pre-crRNA into individual crRNA units. Cas5 and Cas6 are also required for optimal crRNA processing and/or stability (Figure 1).
Cas6 (Leptospira santarosai) participates in the functional step of expression (pre-crRNA processing). It is related to the biological processes of defense response against bacteriophages and plasmids. The protein is composed of 203 amino acids, has an isoelectric point of 9.81, a molecular weight of 22721.75 Daltons, a bioinformatically calculated half-life of more than 10 hours, and has an instability index of 25.34, which classifies it as stable. At the structural level, it is composed of 16 β-sheets and 5 α-helices. Members of this protein family are found associated with several different CRISPR/Cas system subtypes. Cas6 proteins share the ability to recognize and cleave a single phosphodiester bond in a short-repeated sequence of the pre-crRNA transcript (Figure 1).
Cas7 (Leptospira interrogans) participates in the functional step Interference (Effector Complex). It is related to the biological processes of defense response against bacteriophages and plasmids, RNA recognition, and crRNA binding. The protein is composed of 279 amino acids, has an isoelectric point of 6.68, a molecular weight of 31146.15 Daltons, a bioinformatically calculated half-life of more than 10 hours, and has an instability index of 38.92, which classifies it as stable. At the structural level, it is composed of 18 β-sheets and 6 α-helices. Cas7-11 cuts single-stranded RNA (ssRNA) and can self-process pre-crRNA (guide RNA). The Cas7 (DevR) protein, which has a role in fruiting body development, sporulation, and aggregation (Figure 1).
Cas8 (Leptospira santarosai) participates in the functional step Interference (Effector Complex). It is related to the biological processes of defense response against bacteriophages and plasmids and conforms to the large subunit of the Cascade complex. The protein is composed of 532 amino acids, has an isoelectric point of 9.51, a molecular weight of 61404.33 Daltons, a bioinformatically calculated half-life of more than 10 hours, and has an instability index of 31.72, which classifies it as stable. At the structural level, it is composed of 23 β-sheets and 19 α-helices. In Myxococcus xanthus, Cas8a1 is also known as DevT (developmental protein T), which stimulates the synthesis of a signal transduction protein required for fruiting body morphogenesis (formation of fruiting bodies within the rod-shaped cells under starvation conditions that differentiate into spherical spores) (Figure 1).
Cas9 (Leptospira fletcheri) is a DNA nuclease and participates in the functional step Interference (Target Cleavage). It is related to the biological processes of defense response against bacteriophages and plasmids. The protein is composed of 1469 amino acids, has an isoelectric point of 9.43, a molecular weight of 170367.32 Daltons, a bioinformatically calculated half-life of more than 10 hours, and has an instability index of 43.69, which classifies it as unstable. At the structural level, it is composed of 42 β-sheets and 51 α-helices. Cas9 is inactive in the absence of the 2 guide RNAs (gRNA). Cas9 recognizes a short motif in the CRISPR repeat sequences (the PAM or protospacer adjacent motif) to help distinguish self from non-self, as targets within the bacterial CRISPR locus do not have PAMs. PAM recognition is also required for catalytic activity. Cuts target DNA when Cas9 and gRNAs are mixed (Figure 1).
Cas12 (Leptospira inadai) is a DNA nuclease and participates in the functional step Interference (Target Cleavage). It is related to the biological processes of defense response against bacteriophages and plasmids. The protein is composed of 1250 amino acids, has an isoelectric point of 9.26, a molecular weight of 148356.40 Daltons, a bioinformatically calculated half-life of more than 2 minutes, and has an instability index of 30.29, which classifies it as stable. At the structural level, it is composed of 33 β-sheets and 50 α-helices. The CRISPR-associated protein Cas12a (Cpf1) possesses two distinct nuclease activities: endoribonuclease activity for processing its own guide RNAs and RNA-guided DNase activity for target DNA cleavage. Cpf1, also known as CRISPR-associated endonuclease Cas12a, is an RNA-guided endonuclease of the type V CRISPR-Cas system. Cpf1 adopts a bilobed architecture consisting of a helical recognition lobe (REC) and a nuclease lobe (NUC), with the small CRISPR RNAs (crRNAs)-target DNA heteroduplex bound to the positively charged, central channel between the two lobes (Figure 1).

3.4. Classification of Cas Proteins into Functional Stages

The process that the CRISPR-Cas system uses to degrade foreign genetic material from viruses can be divided into three functional stages: Adaptation (adapter integration), Expression (pre-crRNA processing), and Interference (effector complex formation and target cleavage). Between 36 species that have genes encoding Cas proteins, 19 species had multiple Cas proteins, and 17 species only had the Cas3 protein. Apparently, the 17 species with only the Cas3 protein have a partial or incomplete CRISPR-Cas system.
Therefore, the purpose of this analysis was to establish the distribution of Cas proteins across the three functional stages of the CRISPR-Cas system in the 19 Leptospira species that possess multiple proteins. This analysis will also serve to identify species that have a complete CRISPR-Cas system. Fifteen species were found to have the CRISPR-Cas system (Class 1 - Type I), one species had the CRISPR-Cas system (Class 2 - Type II, L. fletcheri), one species had the CRISPR-Cas system (Class 2 - Type V, L. ilyithenensis), and two species had a combination of the systems (CRISPR-Cas Class 2 Type V, and CRISPR-Cas Class 1 - Type I, L. gorisiae – L. inadai). L. kirschneri, L. interrogans, and L. noguchii lack the Cas6 protein, which is necessary for the expression process (pre-crRNA processing), but the Cas5 protein could be supplying this function. The absence of Cas1 in L. borgpetersenii, and Cas8 in L. adleri, L. alstonii, and L. inadai, is striking. Additionally, the presence of csm2, a protein of the class 1 – Type IIIA system in L. haakeii, is striking (Table 3).

3.5. Conservation Patterns of the Direct Repeats in the Leptospira Genus

A search for direct repeats was performed in the 77 species of the Leptospira genus. 434 different spacer sequences were found in four species subgroups. With a size range between 23-39 nucleotides. When aligning the sequences and identifying the conservation patterns by measuring the nucleotide frequency; in subgroup p1 a central conservation pattern was found (GTTTGAACTCCCACAAGTT), in subgroup p2 a central conservation pattern was found (CTGATCCCCACACACGTGGGGATTAA), in subgroup s1 a central conservation pattern was found (CCAGATCTGCAAGTGGATCTGC), and in subgroup S2 a conservation pattern was found throughout the entire sequence (CTCACCACGCATATGGGGTTCAACCAT). Additionally, the 434 sequences of the 4 subgroups were compared, but no conservation pattern was found (Figure 3).

3.6. Bioinformatic Identification of the Immunological Memory (Spacer Sequences)

This analysis was performed on the 19 species with the presence of spacer sequences, to verify the functionality of the different CRISPR-Cas systems detected. The presence of spacer sequences within the CRISPR arrays (Leader Sequence - Direct Repeats - Spacer Sequences) is clear evidence of the functionality of the different systems, because the bacteria incorporate spacer sequences as a kind of immunological memory against bacteriophages or plasmids. Among the 19 Leptospira species with Cas proteins, all presented spacer sequences and direct repeats. Among the 19 species with a complete and functional CRISPR-Cas system (presence of Cas proteins, spacer sequences, and direct repeats), a range of (12-58) spacers per species was detected. Leptospira ryugenii with 58 spacer sequences was the species with the widest repertoire of immunological memory against bacteriophages, while L. ilyithenensis with 12 spacer sequences was the species with the lowest immunological memory. A total of 617 spacer sequences were detected among the 19 Leptospira species, but only 323 were identified as belonging to different bacteriophages in the database (The actinobacteriophage database). 213 spacer sequences could not be identified, representing bacteriophages not reported in the database. These spacer sequences that represent the different bacteriophages have been found in the genome of 12 bacterial genera (Gordonia, Arthrobacter, Streptomyces, Rhodococcus, Propionibacterium, Microbacterium, Mycobacterium, Curtobacterium, Brevibacterium, Corynebacterium, Tsukamurella, and Rhotia). It should be noted that these 12 bacterial genera are represented by multiple spacer sequences that were found in various Leptospira species (Supplementary Table 1).

3.7. Identification of Unique Effector Proteins

Given the difficulty in establishing a gene editing system with CRISPR-Cas systems composed of multiple effector proteins, a search was conducted to detect single-protein effector systems such as: Cas9, Cas12, and Cas13. One species had the Cas9 protein (Class 2 - Type II, L. fletcheri), three species had the Cas12 protein (Class 2 - Type V, L. ilyithenensis, L. gorisiae, and L. inadai). No species were found with the Cas13 protein. The Cas9 and Cas12 proteins were structurally modeled, and a functional domain analysis was performed to understand the structure and function of these endogenous proteins that could be used in the future as gene editing systems in the Leptospira genus, avoiding the problems of toxicity caused by exogenous proteins, and in theory, a more efficient gene editing system (Figure 1).

3.8. Bioinformatic Detection of Intact Bacteriophages in Genomes

Bioinformatic detection of bacteriophages was performed in the genome of the 77 species of the Leptospira genus. Three intact bacteriophages were found in four Leptospira species (L. mayottensis, L. weilii, L. bandrabouensis, and L. ellinghausenii), two questionable or almost complete bacteriophages in two Leptospira species (L. weilii, and L. fluminis), 58 species with incomplete bacteriophages, and 13 species without the presence of bacteriophages (Figure 4).

4. Discussion

The Leptospira genus has made great progress in recent years, with 77 species de-scribed and 1,157 genomes sequenced (https://www.ncbi.nlm.nih.gov/datasets/genome/?taxon=171, accessed on 5 August 2025). Each species has a reference genome and multiple genomes from different strains. These genomes are a valuable source of genetic information for computational biology studies, where different biological processes of the bacteria can be explored. The major problem with the genomes of the Leptospira genus and many prokaryotes is that a large portion of the proteins encoded in their genomes are hypothetical (predicted by an ORF, but without experimental validation), so their function in the biology of the bacteria has not been established. For example, in the species Leptospira interrogans serovar Lai, only 11.5% of the proteins have been characterized [66,67]. According to the functional annotation of the proteins in the 77 genomes of the Leptospira genus, we found a range of hypothetical proteins between (41.4 - 57.9%), with the species with the most hypothetical proteins, L. ainlahdjerensis (57.9%), and the species with the least hypothetical proteins, L. harrisiae (41.4%). This result highlights the significant knowledge gap regarding the biological functions of proteins in the Leptospira genus, underscoring the urgent need for a gene editing tool (Figure 5).
Additionally, the gene editing process in the Leptospira genus has been difficult to standardize, which has hampered the functional analysis of thousands of proteins and their assignment to the bacteria’s biological processes [13]. However, the CRISPR-Cas methodology is emerging as a promising tool for gene silencing in Leptospira, yielding knockout mutants of multiple proteins [13,45,46]. Therefore, it is important to explore endogenous CRISPR-Cas systems in the 77 currently described species to discover: the diversity of CRISPR-Cas systems, their component machinery (Cas proteins, direct repeats, and spacers), the species’ immunological memory, and intact bacteriophages that managed to evade the immune system. Regarding genome availability, a reference genome was obtained for each of the 77 Leptospira species, which provided a comprehensive view of the presence of Cas proteins in the Leptospira genus (Table 1). Similar work has been conducted in which CRISPR-Cas systems were identified in 18 Leptospira species, but to verify the taxonomic utility of CRISPR-Cas arrays in the identification of species and serovars of the genus [47].
We identified 166 Cas proteins in 36/77 species of the Leptospira genus (46.75%); these proteins were identified as, Cas1-Cas9, and Cas12. The proteins were classified into class 1 and class 2 systems, and types I, II, and V. Additionally, proteins belonging to the subtypes (IA, IB, IC, IE, IIB, and IIIA), and the variants (Cas3a, Cas5a, Cas5a2, Cas5b, Cas5c, Cas7b, Cas7c, Cas8a1a3, and Cas8c) were detected. All proteins had DNA as a molecular target for cleavage, except for the csm2_TypeIIIA protein in the Leptospira haakeii, which can cleave DNA and RNA molecules. The Csm complex comprises five Cas proteins (Csm1–Csm5) and a crRNA, which degrades invading DNA and RNA [68].
Currently, the clade of pathogenic species comprises 43 species, divided into subgroup p1 (21 species) and subgroup p2 (22 species). We identified 27 pathogenic species that possess Cas proteins. Therefore, not all pathogenic species have a CRISPR-Cas system. The saprophytic clade comprises 34 species (29 in subgroup s1 and 5 in subgroup s2). Among these, two species encode complete CRISPR-Cas systems, while 10 contain only the Cas3a_TypeI protein. With this result, we can conclude that some saprophytic species have CRISPR-Cas systems, contrary to what was reported by other authors [47], who did not analyze the new species recently described (L. ilyithenensis and L. ryugenii). Additionally, saprophytic species with the CRISPR-Cas system can help conduct experimental trials without the risk of infection in research laboratories. It is striking to find the cas3a_TypeI protein as the sole representative of the CRISPR-Cas system in 17 Leptospira species (9 pathogens and 8 saprophytes). This suggests that species acquire the system or that these vestigial proteins have been conserved in their genomes to perform different functions in other biological processes. According to the results obtained, the Leptospira genus has a wide diversity of Cas proteins in 36 species that could be used as genetic editing tools.
Currently, 13 types of Cas proteins have been described. After analyzing the genomes of the Leptospira genus, we found 10 of the 13 described Cas proteins (76.92%), so we can conclude that the genus possesses a high percentage of Cas proteins. Eight of these proteins belong to multiple effector systems (Cas1-Cas8), and two proteins belong to single-protein effector systems (Cas9 and Cas12a). This finding is significant because it describes for the first time the presence of endogenous Cas9 and Cas12a proteins in the Leptospira genus. These proteins can be used as gene editing tools without generating toxicity in Leptospira cells, and with the advantage of using a single protein. Several studies have reported gene silencing in species of the Leptospira genus using the Cas9 protein. However, these proteins are genetically modified to generate a single cut in the target DNA and originate from other bacterial species [46].
A functional bioinformatics analysis of the 10 types of Cas proteins detected (Cas1-Cas9, and Cas12a) was performed to understand their physical-chemical characteristics, biological function, the biological process in which they participate, their three-dimensional structure to understand their shape, and the functional domains to understand the exact site of the protein structure with which they perform their biological functions, in order to understand their participation in the processes of the CRISPR-Cas system and assign each protein to the three functional states of adaptation, expression, and interference. As a result of the analysis, it was found that the functionality of Cas proteins in the Leptospira genus is equal to that of other bacterial species (Figure 1).
Another important step in characterizing the CRISPR-Cas system is distributing proteins into the system’s functional states according to their biological function. The first step has been termed adaptation; in this step, foreign DNA or RNA is recognized, the viral genome is cleaved, and fragments of the viral genome are stored in the bacterial genome as spacer sequences. Typically, this functional state contains Cas1, Cas2, and Cas4 proteins, although Cas4 may not be present in most types of systems. Additionally, these proteins are absent in systems (type III, V, and VI) [22]. Among the 19 species that contain Cas proteins, we found 10 species with Cas1, Cas2, and Cas4, 8 species with Cas1 and Cas2, and one species with only Cas2. According to our results, the functional state of adaptation would be complete in 19 species. Although L. borgpetersenii does not have the Cas2 protein, its system appears to be functional, since it presents spacer sequences that were identified as belonging to bacteriophages.
The functional state of expression consists of the transcription of the CRISPR microarray locus into immature RNA precursors (pre-crRNA) and their subsequent processing into mature RNA (crRNA) [22]. In this functional state, the presence of the Cas6 or Cas5 protein in the class 1 type I system and RNAse III in the class 2 type II system is important. Additionally, in class 2 types V and VI systems, the unique effector proteins Cas12 and Cas13 participate in this function [22]. According to our results, all 19 species possess a fully functional expression stage (Table 3).
The interference stage is involved in the binding of mature RNAs (crRNA) to multiprotein effector complexes or single effector proteins whose function is to recognize an identical or similar sequence in the genome of invading viruses or plasmids to cleave and inactivate them [33,34]. In class 1 type I systems, the presence of the Cas5, Cas7, and Cas8 proteins is important to form the effector complex along with the Cas3 protein, which is responsible for breaking DNA. In class 2 type II, V, and VI systems, the presence of the Cas9, Cas12, and Cas13 proteins is important to form the effector complex. According to the results obtained, all species have their respective complete effector complexes (Table 3).
Once we verified that the adaptation, expression, and interference states were complete, we proceeded to assign the classes, types, and subtypes according to the taxonomic classification of CRISPR-Cas systems proposed by Makarova et al. in 2020 [36]. In our study we found seventeen species classified as class 1 type I, one species is classified as class 2 type II, and three species were classified as class 2 type V. Regarding the subtypes, the following results were found: IB (L. santarosai, and L. kmetyi), IC (L. kirschneri, L. interrogans, and L. noguchii), IE (L. alexanderi, L. borgpetersenii, L. mayottensis, L. stimsonii, L. weilii, L. gorisiae, L. fainei, L. koniambonensis, and L. ryugenii), IIB (L. fletcheri), VA (L. gorisiae, L. inadai, and L. ilyithenensis), IF2 (L. adleri, L. alstonii, and L. inadai). It is worth highlighting that the species L. inadai (Class 1 type IF2, and Class 2 type V) and L. gorisiae (Class 1, type IE, and Class 2 type V) have two CRISPR-Cas systems in their genomes (Table 3). Currently, subtypes IB and IC have been described in the Leptospira genus [37,38,39], so this would be the first report of subtypes IE, IIB, VA, and IF2 in the Leptospira genus. These findings significantly contribute to our understanding of the diversity and functionality of CRISPR-Cas systems in the Leptospira genus.
Another fundamental component of the CRISPR-Cas system is the direct repeats (DR), which serve to separate the spacer sequences (segments of DNA from viruses). These sequences are typically identical within a given array. Direct repeats can be similar between related species, but also very different between distant species. The average spacer size is 32 nucleotides, but they can vary between 21 and 47 base pairs [22]. In the analysis of the 77 species of the Leptospira genus, in total 434 direct repeats were identified, with a size range between 23 and 55 base pairs. Conservation patterns were observed among the species comprising the four subgroups, but not among the 77 species of the Leptospira genus. This is consistent with the literature, where genetically closely related species retain similar direct repeats, while genetically distant species use completely different direct repeats in their sequence [22] (Figure 3).
Spacer sequences, which are nucleotide sequences with a fixed size but are highly variable in sequence, as they come from different bacteriophages and plasmids. Their size ranges from 20 to 72 base pairs [22]. Regarding the spacer sequences that represent the bacteria’s immunological memory against bacteriophage infection, we found 617 spacer sequences. 404 sequences were identified as belonging to bacteriophages in the Actinobacteriophage database, representing 323 unique bacteriophages. The remaining 81 sequences correspond to repeated spacer sequences or those that recognize a different fragment of the genome of the same bacteriophage. On the other hand, 213 sequences could not be identified, reflecting many bacteriophages that have not been reported in the database. Regarding the hosts in which bacteriophages have been reported, we found 12 different bacterial genera (Supplementary Table 1). The above results confirm the presence and functionality of the CRISPR-Cas system in 19 species of the Leptospira genus. At the structural level, we identified the presence of genes encoding Cas proteins, direct repeats, and spacer sequences. At the functional level, the spacer sequences were identified as corresponding to bacteriophages; therefore, it can be inferred that the Leptospira species were in contact with the bacteriophages, avoided infection, and generated an immunological memory against them, however, an experimental verification of the system’s functionality is necessary. These findings have significant implications for potential applications in bacteriophage therapy, where the CRISPR-Cas system could be harnessed to develop targeted treatments for bacterial infections.
CRISPR-Cas systems, with their ability to cleave bacteriophages or plasmids, offer promising avenues for gene editing. Single effector systems stand out for their potential due to their simplicity, which makes them suitable for application in multiple bacterial species [22]. Our research has identified three species with single protein effector systems. L. fletcheri with the Class 2-type II system (Cas9, as cleavage protein), and the species L. inadai and L. ilyithenensis with the Class 2-type V system (Cas12a, as cleavage protein). This discovery holds great promise for the genus Leptospira, as it enables the identification of unique endogenous effector proteins. This potential will allow us to optimize gene editing in the future and avoid the toxicity of exogenous proteins from other bacterial species.
Finally, CRISPR-Cas systems are not infallible. Therefore, it is important to detect bacteriophages in bacterial genomes that have successfully evaded the CRISPR-Cas defense system and inserted themselves into the genome. These findings are important for using bacteriophages as an alternative gene editing method [69,70,71] and for developing new drugs using bacteriophages as antibacterial therapy for infections caused by multi-antibiotic-resistant bacteria in humans and animals [72,73,74,75,76]. In this research, we found the following bacteriophages infecting four species of the Leptospira genus: L. mayottensis (PHAGE_Pseudo_phi3_NC_030940), L. weilii (PHAGE_Paenib_Tripp_NC_028930), L. bandrabouensis, and L. ellinghausenii (PHAGE_Leptos_LE1_NC_048892). The only phages that have been isolated, purified, and phenotypically characterized in the Leptospira genus are LE1, LE3, and LE4. These bacteriophages were described infecting the saprophytic species Leptospira biflexa in the 1990s [77], and the LE1 genome was subsequently sequenced [78]. Additionally, in 2018, the sequencing and proteomic analysis of phages LE3 and LE4 were reported [79]. Unfortunately, there are few studies on bacteriophages in the Leptospira genus. Therefore, this study reports two new bacteriophages infecting the genus Leptospira (PHAGE_Pseudo_phi3_NC_030940 and PHAGE_Paenib_Tripp_NC_028930), and these bacteriophages will be important in the future for genetically editing species of the Leptospira genus and the development of a new generation of antibacterial drugs using bacteriophages.

5. Conclusions

The presence of Cas proteins, direct repeats, and spacer sequences homologous to bacteriophage genomes suggests a functional CRISPR-Cas system in 19 Leptospira species. Furthermore, the promising discovery of endogenous proteins Cas9 and Cas12a could be a more precise gene editing tool in the Leptospira genus, since they are unique effector proteins. The discovery of saprophytic species with complete and functional CRISPR-Cas systems suggests the recent acquisition of this system in the subgroup of saprophytic species (s2), and saprophytic species with the CRISPR-Cas system can help conduct experimental trials without the risk of infection in research laboratories. Additionally, the three intact bacteriophages that were detected in four species of the Leptospira genus are promising candidates for gene editing of the bacteria due to their ability to evade the CRISPR-Cas system and for the development of drugs for the treatment of leptospirosis, using bacteriophages as a mechanism to lyse bacteria in human and animal infections selectively.

Supplementary Materials

The following supporting information can be downloaded at the website of this paper posted on Preprints.org. Table S1: Identification of bacteriophages using spacer sequences.

Author Contributions

Conceptualization, R.G.P.S., J.E.S.F., S.P., A.M.C.L., J.M.M.G., RU., L.F.L.R., F.P.M; Methodology, R.G.P.S, S.P., F.P.M; Software, R.G.P.S, J.G.R., S.P., F.P.M; Validation, R.G.P.S; Formal analysis, R.G.P.S, J.G.R.; Investigation, R.G.P.S., J.G.R.; M.T.C., E.S.F.; Resources, R.G.P.S., M.T.C., E.S.F., S.P.; Data curation, R.G.P.S., S.P.; Writing—original draft, R.G.P.S., M.T.C., E.S.F., S.P., A.M.C.L., J.M.M.G., RU., L.F.L.R., F.P.M; Writing—review and editing, R.G.P.S., M.T.C., E.S.F., S.P., F.P.M; Supervision, R.G.P.S., J.E.S.F.; Project administration, R.G.P.S., J.E.S.F.; Funding acquisition, R.G.P.S., J.E.S.F., F.P.M. All authors have read and agreed to the published version of the manuscript.

Funding

This project was funded by the Science, Technology and Innovation Direction of the CES University, Autonomous University of Yucatan, San Martin University, Colombian Institute of Tropical Medicine (ICMT), and Northern Arizona University.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The reference genomic sequences used in this research are available in the NCBI-Genome database under the following accession numbers:.
GCA_002811985.1, GCA_016918785.1, GCA_016919175.1, GCA_000243815.3, GCA_000347175.1, GCA_002811925.1, GCA_003516145.1, GCA_002811955.2, GCA_004770155.1, GCA_000243695.3, GCA_002073495.2, GCA_003722295.1, GCA_000306675.3, GCA_000306255.2, GCA_022267325.1, GCA_000313175.2, GCA_003545875.1, GCA_001729245.1, GCA_006874765.1, GCA_003545925.1, GCA_004770105.1, GCA_000243715.3, GCA_004770895.1, GCA_000306235.2, GCA_004769195.1, GCA_004771275.1, GCA_002812225.1, GCA_002811475.1, GCA_000243675.3, GCA_003112675.1, GCA_004769555.1, GCA_004770615.1, GCA_000526875.1, GCA_002812205.1, GCA_002811875.1, GCA_002811765.1, GCA_004769615.1, GCA_004769405.1, GCA_004770055.1, GCA_002150035.1, GCA_004770635.1, GCA_016918735.1, GCA_004770555.1, GCA_000017685.1, GCA_004770145.1, GCA_004770625.1, GCA_004769295.1, GCA_016919165.1, GCA_004770265.1, GCA_003114815.1, GCA_002811945.1, GCA_004769775.1, GCA_004769235.1, GCA_004769665.1, GCA_002812085.1, GCA_004368965.1, GCA_004770045.1, GCA_004770475.1, GCA_004770765.1, GCA_004769575.1, GCA_000332495.2, GCA_004770365.1, GCA_000332515.2, GCA_004769275.1, GCA_004770995.1, GCA_004771005.1, GCA_003114835.3, GCA_004770745.1, GCA_003114855.1, GCA_026151345.1, GCA_026151335.1, GCA_026151395.1

Acknowledgments

The authors thank CES University, San Martin University, Autonomous University of Yucatan, Colombian Institute of Tropical Medicine (ICMT), and Northern Arizona University for the financial support given to the research.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Adler B, De La Peña Moctezuma A. Leptospira and leptospirosis. Vet Microbiol. 2010 Jan;140(3–4):287–96. [CrossRef]
  2. Haake DA. Spirochaetal lipoproteins and pathogenesis. Microbiol Read Engl. 2000 July;146(Pt 7):1491–504. [CrossRef]
  3. Levett PN. Leptospirosis. Clin Microbiol Rev. 2001 Apr;14(2):296–326.
  4. Vijayachari P, Sugunan AP, Shriram AN. Leptospirosis: an emerging global public health problem. J Biosci. 2008 Nov;33(4):557–69. [CrossRef]
  5. McBride AJA, Athanazio DA, Reis MG, Ko AI. Leptospirosis. Curr Opin Infect Dis. 2005 Oct;18(5):376–86.
  6. Costa F, Hagan JE, Calcagno J, Kane M, Torgerson P, Martinez-Silveira MS, et al. Global Morbidity and Mortality of Leptospirosis: A Systematic Review. PLoS Negl Trop Dis. 2015 Sept 17;9(9):e0003898.
  7. Vincent AT, Schiettekatte O, Goarant C, Neela VK, Bernet E, Thibeaux R, et al. Revisiting the taxonomy and evolution of pathogenicity of the genus Leptospira through the prism of genomics. PLoS Negl Trop Dis. 2019 May;13(5):e0007270. [CrossRef]
  8. Cerqueira GM, Picardeau M. A century of Leptospira strain typing. Infect Genet Evol J Mol Epidemiol Evol Genet Infect Dis. 2009 Sept;9(5):760–8. [CrossRef]
  9. Fernandes LGV, Stone NE, Roe CC, Goris MGA, van der Linden H, Sahl JW, et al. Leptospira sanjuanensis sp. nov., a pathogenic species of the genus Leptospira isolated from soil in Puerto Rico. Int J Syst Evol Microbiol. 2022 Oct;72(10). [CrossRef]
  10. Korba AA, Lounici H, Kainiu M, Vincent AT, Mariet JF, Veyrier FJ, et al. Leptospira ainlahdjerensis sp. nov., Leptospira ainazelensis sp. nov., Leptospira abararensis sp. nov. and Leptospira chreensis sp. nov., four new species isolated from water sources in Algeria. Int J Syst Evol Microbiol. 2021 Dec;71(12). [CrossRef]
  11. Hamond C, Tibbs-Cortes B, Fernandes LGV, LeCount K, Putz EJ, Anderson T, et al. Leptospira gorisiae sp. nov, L. cinconiae sp. nov, L. mgodei sp. nov, L. milleri sp. nov and L. iowaensis sp. nov: five new species isolated from water sources in the Midwestern United States. Int J Syst Evol Microbiol. 2025;75(1):006595. [CrossRef]
  12. Dos Santos Ribeiro P, Carvalho NB, Aburjaile F, Sousa T, Veríssimo G, Gomes T, et al. Environmental Biofilms from an Urban Community in Salvador, Brazil, Shelter Previously Uncharacterized Saprophytic Leptospira. Microb Ecol. 2023 Nov;86(4):2488–501. [CrossRef]
  13. Fernandes LGV, Hornsby RL, Nascimento ALTO, Nally JE. Genetic manipulation of pathogenic Leptospira: CRISPR interference (CRISPRi)-mediated gene silencing and rapid mutant recovery at 37 °C. Sci Rep. 2021 Jan 19;11(1):1768. [CrossRef]
  14. Pappas CJ, Xu H, Motaleb MA. Creating a Library of Random Transposon Mutants in Leptospira. In: Koizumi N, Picardeau M, editors. Leptospira spp: Methods and Protocols [Internet]. New York, NY: Springer US; 2020 [cited 2025 Aug 12]. p. 77–96. Available from: . [CrossRef]
  15. Bourhy P, Louvel H, Saint Girons I, Picardeau M. Random Insertional Mutagenesis of Leptospira interrogans , the Agent of Leptospirosis, Using a mariner Transposon. J Bacteriol. 2005 May;187(9):3255–8. [CrossRef]
  16. Murray GL, Morel V, Cerqueira GM, Croda J, Srikram A, Henry R, et al. Genome-wide transposon mutagenesis in pathogenic Leptospira species. Infect Immun. 2009 Feb;77(2):810–6. [CrossRef]
  17. Lourdault K, Matsunaga J, Evangelista KV, Haake DA. High-throughput Parallel Sequencing to Measure Fitness of Leptospira interrogans Transposon Insertion Mutants During Golden Syrian Hamster Infection. J Vis Exp JoVE. 2017 Dec 18;(130):56442.
  18. Liao S, Sun A, Ojcius DM, Wu S, Zhao J, Yan J. Inactivation of the fliY gene encoding a flagellar motor switch protein attenuates mobility and virulence of Leptospira interrogansstrain Lai. BMC Microbiol. 2009 Dec 9;9(1):253. [CrossRef]
  19. Croda J, Figueira CP, Wunder EA, Santos CS, Reis MG, Ko AI, et al. Targeted mutagenesis in pathogenic Leptospira species: disruption of the LigB gene does not affect virulence in animal models of leptospirosis. Infect Immun. 2008 Dec;76(12):5826–33. [CrossRef]
  20. Pappas CJ, Benaroudj N, Picardeau M. A Replicative Plasmid Vector Allows Efficient Complementation of Pathogenic Leptospira Strains. Appl Environ Microbiol. 2015 May;81(9):3176–81. [CrossRef]
  21. Pacesa M, Pelea O, Jinek M. Past, present, and future of CRISPR genome editing technologies. Cell. 2024 Feb 29;187(5):1076–100. [CrossRef]
  22. Nidhi S, Anand U, Oleksak P, Tripathi P, Lal JA, Thomas G, et al. Novel CRISPR–Cas Systems: An Updated Review of the Current Achievements, Applications, and Future Research Perspectives. Int J Mol Sci. 2021 Mar 24;22(7):3327. [CrossRef]
  23. Chehelgerdi M, Chehelgerdi M, Khorramian-Ghahfarokhi M, Shafieizadeh M, Mahmoudi E, Eskandari F, et al. Correction: Comprehensive review of CRISPR-based gene editing: mechanisms, challenges, and applications in cancer therapy. Mol Cancer. 2024 Feb 27;23(1):43.
  24. Hillary VE, Ceasar SA. A Review on the Mechanism and Applications of CRISPR/Cas9/Cas12/Cas13/Cas14 Proteins Utilized for Genome Engineering. Mol Biotechnol. 2023 Mar;65(3):311–25. [CrossRef]
  25. Sternberg SH, Richter H, Charpentier E, Qimron U. Adaptation in CRISPR-Cas Systems. Mol Cell. 2016 Mar;61(6):797–808.
  26. Hille F, Charpentier E. CRISPR-Cas: biology, mechanisms and relevance. Philos Trans R Soc B Biol Sci. 2016 Nov 5;371(1707):20150496. [CrossRef]
  27. Jiang F, Doudna JA. The structural biology of CRISPR-Cas systems. Curr Opin Struct Biol. 2015 Feb;30:100–11. [CrossRef]
  28. Koonin EV, Makarova KS. Origins and evolution of CRISPR-Cas systems. Philos Trans R Soc B Biol Sci. 2019 May 13;374(1772):20180087. [CrossRef]
  29. Alkhnbashi OS, Meier T, Mitrofanov A, Backofen R, Voß B. CRISPR-Cas bioinformatics. Methods. 2020 Feb;172:3–11. [CrossRef]
  30. Koonin EV, Makarova KS, Zhang F. Diversity, classification and evolution of CRISPR-Cas systems. Curr Opin Microbiol. 2017 June;37:67–78. [CrossRef]
  31. Li C, Chu W, Gill RA, Sang S, Shi Y, Hu X, et al. Computational Tools and Resources for CRISPR/Cas Genome Editing. Genomics Proteomics Bioinformatics. 2023 Feb 1;21(1):108–26. [CrossRef]
  32. Mojica FJM, Montoliu L. On the Origin of CRISPR-Cas Technology: From Prokaryotes to Mammals. Trends Microbiol. 2016 Oct;24(10):811–20. [CrossRef]
  33. Strich JR, Chertow DS. CRISPR-Cas Biology and Its Application to Infectious Diseases. Kraft CS, editor. J Clin Microbiol. 2019 Apr;57(4):e01307-18.
  34. Bhatia S, Pooja, Yadav SK. CRISPR-Cas for genome editing: Classification, mechanism, designing and applications. Int J Biol Macromol. 2023 May;238:124054. [CrossRef]
  35. Mougiakos I, Bosma EF, De Vos WM, Van Kranenburg R, Van Der Oost J. Next Generation Prokaryotic Engineering: The CRISPR-Cas Toolkit. Trends Biotechnol. 2016 July;34(7):575–87.
  36. Makarova KS, Wolf YI, Iranzo J, Shmakov SA, Alkhnbashi OS, Brouns SJJ, et al. Evolutionary classification of CRISPR–Cas systems: a burst of class 2 and derived variants. Nat Rev Microbiol. 2020 Feb;18(2):67–83. [CrossRef]
  37. Dixit B, Ghosh KK, Fernandes G, Kumar P, Gogoi P, Kumar M. Dual nuclease activity of a Cas2 protein in CRISPR –Cas subtype I-B of Leptospira interrogans. FEBS Lett. 2016 Apr;590(7):1002–16.
  38. Anand V, Prabhakaran HS, Gogoi P, Kanaujia SP, Kumar M. Structural and functional characterization of Cas2 of CRISPR-Cas subtype I-C lacking the CRISPR component. Front Mol Biosci. 2022 Sept 12;9:988569. [CrossRef]
  39. Anand V, Prabhakaran HS, Prakash A, Hussain MS, Kumar M. Differential processing of CRISPR RNA by LinCas5c and LinCas6 of Leptospira. Biochim Biophys Acta BBA - Gen Subj. 2023 Dec;1867(12):130469. [CrossRef]
  40. Prakash A, Kumar M. Transcriptional analysis of CRISPR I-B arrays of Leptospira interrogans serovar Lai and its processing by Cas6. Front Microbiol. 2022 July 29;13:960559. [CrossRef]
  41. Hussain MS, Anand V, Kumar M. Functional PAM sequence for DNA interference by CRISPR-Cas I-B system of Leptospira interrogans and the role of LinCas11b encoded within lincas8b. Int J Biol Macromol. 2023 May;237:124086. [CrossRef]
  42. Prakash A, Kumar M. Characterizing the transcripts of Leptospira CRISPR I-B array and its processing with endoribonuclease LinCas6. Int J Biol Macromol. 2021 July;182:785–95. [CrossRef]
  43. Dixit B, Prakash A, Kumar P, Gogoi P, Kumar M. The core Cas1 protein of CRISPR-Cas I-B in Leptospira shows metal-tunable nuclease activity. Curr Res Microb Sci. 2021 Dec;2:100059. [CrossRef]
  44. Dixit B, Anand V, Hussain MdS, Kumar M. The CRISPR-associated Cas4 protein from Leptospira interrogans demonstrate versatile nuclease activity. Curr Res Microb Sci. 2021 Dec;2:100040. [CrossRef]
  45. Fernandes LGV, Nascimento ALTO. A Novel Breakthrough in Leptospira spp. Mutagenesis: Knockout by Combination of CRISPR/Cas9 and Non-homologous End-Joining Systems. Front Microbiol. 2022 May 26;13:915382. [CrossRef]
  46. Fernandes LGV, Hamond C, Tibbs-Cortes BW, Putz EJ, Olsen SC, Palmer MV, et al. CRISPR-prime editing, a versatile genetic tool to create specific mutations with a single nucleotide resolution in Leptospira. Norris SJ, editor. mBio. 2024 Sept 11;15(9):e01516-24. [CrossRef]
  47. Xiao G, Yi Y, Che R, Zhang Q, Imran M, Khan A, et al. Characterization of CRISPR-Cas systems in Leptospira reveals potential application of CRISPR in genotyping of Leptospira interrogans. APMIS. 2019 Apr;127(4):202–16. [CrossRef]
  48. Natarajan S, Joseph J, Vinayagamurthy B, Estrela P. A Lateral Flow Assay for the Detection of Leptospira lipL32 Gene Using CRISPR Technology. Sensors. 2023 July 20;23(14):6544. [CrossRef]
  49. Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014 July 15;30(14):2068–9. [CrossRef]
  50. Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, et al. The RAST Server: Rapid Annotations using Subsystems Technology. BMC Genomics. 2008 Dec;9(1):75.
  51. Brettin T, Davis JJ, Disz T, Edwards RA, Gerdes S, Olsen GJ, et al. RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes. Sci Rep. 2015 Feb 10;5(1):8365. [CrossRef]
  52. Overbeek R, Olson R, Pusch GD, Olsen GJ, Davis JJ, Disz T, et al. The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res. 2014 Jan;42(D1):D206–14. [CrossRef]
  53. Couvin D, Bernheim A, Toffano-Nioche C, Touchon M, Michalik J, Néron B, et al. CRISPRCasFinder, an update of CRISRFinder, includes a portable version, enhanced performance and integrates search for Cas proteins. Nucleic Acids Res. 2018 July 2;46(W1):W246–51. [CrossRef]
  54. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990 Oct;215(3):403–10. [CrossRef]
  55. Zdobnov EM, Apweiler R. InterProScan – an integration platform for the signature-recognition methods in InterPro. Bioinformatics. 2001 Sept 1;17(9):847–8. [CrossRef]
  56. Waterhouse A, Bertoni M, Bienert S, Studer G, Tauriello G, Gumienny R, et al. SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Res. 2018 July 2;46(W1):W296–303. [CrossRef]
  57. Abramson J, Adler J, Dunger J, Evans R, Green T, Pritzel A, et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature. 2024 June 13;630(8016):493–500. [CrossRef]
  58. Johansson MU, Zoete V, Michielin O, Guex N. Defining and searching for structural motifs using DeepView/Swiss-PdbViewer. BMC Bioinformatics. 2012 Dec;13(1):173.
  59. DeLano WL, Scientific D, Carlos S. PyMOL: An Open-Source Molecular Graphics Tool.
  60. Kumar S, Stecher G, Suleski M, Sanderford M, Sharma S, Tamura K. MEGA12: Molecular Evolutionary Genetic Analysis Version 12 for Adaptive and Green Computing. Battistuzzi FU, editor. Mol Biol Evol. 2024 Dec 6;41(12):msae263. [CrossRef]
  61. Garcia-Boronat M, Diez-Rivero CM, Reinherz EL, Reche PA. PVS: a web server for protein sequence variability analysis tuned to facilitate conserved epitope discovery. Nucleic Acids Res. 2008 May 19;36(Web Server):W35–41. [CrossRef]
  62. Russell DA, Hatfull GF. PhagesDB: the actinobacteriophage database. Wren J, editor. Bioinformatics. 2017 Mar 1;33(5):784–6. [CrossRef]
  63. Wishart DS, Han S, Saha S, Oler E, Peters H, Grant JR, et al. PHASTEST: faster than PHASTER, better than PHAST. Nucleic Acids Res. 2023 July 5;51(W1):W443–50. [CrossRef]
  64. Arndt D, Grant JR, Marcu A, Sajed T, Pon A, Liang Y, et al. PHASTER: a better, faster version of the PHAST phage search tool. Nucleic Acids Res. 2016 July 8;44(W1):W16–21.
  65. Zhou Y, Liang Y, Lynch KH, Dennis JJ, Wishart DS. PHAST: A Fast Phage Search Tool. Nucleic Acids Res. 2011 July 1;39(suppl):W347–52. [CrossRef]
  66. Samaha Th S. In-Silico Characterisation of a Hypothetical Protein (LA_1016) of Leptospira Interrogans Serovar Lai Strain 56601. Austin J Proteomics Bioinform & Genomics. 2017;4(2).
  67. Ijaq J, Chandrasekharan M, Poddar R, Bethi N, Sundararajan VS. Annotation and curation of uncharacterized proteins- challenges. Front Genet [Internet]. 2015 Mar 31 [cited 2025 Aug 12];6. Available from: http://www.frontiersin.org/Bioinformatics_and_Computational_Biology/10.3389/fgene.2015.00119/abstract. [CrossRef]
  68. Takeshita D, Sato M, Inanaga H, Numata T. Crystal Structures of Csm2 and Csm3 in the Type III-A CRISPR–Cas Effector Complex. J Mol Biol. 2019 Feb;431(4):748–63. [CrossRef]
  69. Campbell A. The future of bacteriophage biology. Nat Rev Genet. 2003 June 1;4(6):471–7. [CrossRef]
  70. Hussain W, Yang X, Ullah M, Wang H, Aziz A, Xu F, et al. Genetic engineering of bacteriophages: Key concepts, strategies, and applications. Biotechnol Adv. 2023 May;64:108116. [CrossRef]
  71. Jia HJ, Jia PP, Yin S, Bu LK, Yang G, Pei DS. Engineering bacteriophages for enhanced host range and efficacy: insights from bacteriophage-bacteria interactions. Front Microbiol. 2023 May 31;14:1172635. [CrossRef]
  72. Chen Y, Batra H, Dong J, Chen C, Rao VB, Tao P. Genetic Engineering of Bacteriophages Against Infectious Diseases. Front Microbiol. 2019 May 3;10:954. [CrossRef]
  73. Segundo-Arizmendi N, Arellano-Maciel D, Rivera-Ramírez A, Piña-González AM, López-Leal G, Hernández-Baltazar E. Bacteriophages: A Challenge for Antimicrobial Therapy. Microorganisms. 2025 Jan 7;13(1):100. [CrossRef]
  74. Girma A. Bacteriophages as an alternative strategy for the treatment of drug resistant bacterial infections: Current approaches and future perspectives. Cell Surf. 2025 Dec;14:100149. [CrossRef]
  75. Subramanian A. Emerging roles of bacteriophage-based therapeutics in combating antibiotic resistance. Front Microbiol. 2024 July 5;15:1384164. [CrossRef]
  76. Summers WC. Bacteriophage Therapy. Annu Rev Microbiol. 2001 Oct;55(1):437–51.
  77. Girons IS, Margarita D, Amouriaux P, Baranton G. First isolation of bacteriophages for a spirochaete: Potential genetic tools for Leptospira. Res Microbiol. 1990 Nov;141(9):1131–8. [CrossRef]
  78. Bourhy P, Frangeul L, Couvé E, Glaser P, Saint Girons I, Picardeau M. Complete Nucleotide Sequence of the LE1 Prophage from the Spirochete Leptospira biflexa and Characterization of Its Replication and Partition Functions. J Bacteriol. 2005 June 15;187(12):3931–40. [CrossRef]
  79. Schiettekatte O, Vincent AT, Malosse C, Lechat P, Chamot-Rooke J, Veyrier FJ, et al. Characterization of LE3 and LE4, the only lytic phages known to infect the spirochete Leptospira. Sci Rep. 2018 Aug 6;8(1):11781. [CrossRef]
Figure 1. The figure shows the ten Cas proteins, which were detected in the genomes of the 77 species of the Leptospira genus. Additionally, their biological function and biological processes are described using the bioinformatics program InterPro. The bioinformatics programs Swiss-Model and alphafold3 are used to model the three-dimensional structures of the proteins Cas1, Cas2, Cas3, Cas4, Cas5, Cas6, Cas7, Cas8, Cas9, and Cas12. Finally, the functional domains of the proteins are detected, and their respective biological functions are identified.
Figure 1. The figure shows the ten Cas proteins, which were detected in the genomes of the 77 species of the Leptospira genus. Additionally, their biological function and biological processes are described using the bioinformatics program InterPro. The bioinformatics programs Swiss-Model and alphafold3 are used to model the three-dimensional structures of the proteins Cas1, Cas2, Cas3, Cas4, Cas5, Cas6, Cas7, Cas8, Cas9, and Cas12. Finally, the functional domains of the proteins are detected, and their respective biological functions are identified.
Preprints 176324 g001aPreprints 176324 g001bPreprints 176324 g001c
Figure 3. Conservation patterns of direct repeats in the Leptospira genus. The figure shows the alignment of spacer sequences (nucleotides) belonging to 30 Leptospira species (species with Cas proteins, direct repeats, and spacer sequences) and the analysis of conservation patterns between them. The colors in the alignment represent the four types of nucleotides (Adenine: black , and the Thymine, Guanine, and Cytosine in green). The size of the letters in the conservation patterns is proportional to the nucleotide frequency in the alignment (bits).
Figure 3. Conservation patterns of direct repeats in the Leptospira genus. The figure shows the alignment of spacer sequences (nucleotides) belonging to 30 Leptospira species (species with Cas proteins, direct repeats, and spacer sequences) and the analysis of conservation patterns between them. The colors in the alignment represent the four types of nucleotides (Adenine: black , and the Thymine, Guanine, and Cytosine in green). The size of the letters in the conservation patterns is proportional to the nucleotide frequency in the alignment (bits).
Preprints 176324 g002
Figure 4. Detection of bacteriophages that successfully inserted into the genome of four species of the genus Leptospira (L. mayottensis, L. weilii, L. bandrabouensis, and L. ellinghausenii), evading the CRISPR-Cas system. The figure shows: the regions of the genomes in which the proteins of the bacteriophages were detected (areas highlighted in green), Leptospira species, the region where the phage was found, region length, completeness, score, total number of bacteriophage proteins detected, position of the region where the phage was inserted, bacteriophage identification, and GC percentage. The green regions indicate intact bacteriophages (complete genome detected), the light green regions indicate partial or almost complete bacteriophages (almost the entire genome was detected), and the red regions indicate incomplete bacteriophages (a small region of the genome was detected).
Figure 4. Detection of bacteriophages that successfully inserted into the genome of four species of the genus Leptospira (L. mayottensis, L. weilii, L. bandrabouensis, and L. ellinghausenii), evading the CRISPR-Cas system. The figure shows: the regions of the genomes in which the proteins of the bacteriophages were detected (areas highlighted in green), Leptospira species, the region where the phage was found, region length, completeness, score, total number of bacteriophage proteins detected, position of the region where the phage was inserted, bacteriophage identification, and GC percentage. The green regions indicate intact bacteriophages (complete genome detected), the light green regions indicate partial or almost complete bacteriophages (almost the entire genome was detected), and the red regions indicate incomplete bacteriophages (a small region of the genome was detected).
Preprints 176324 g003
Figure 5. The figure shows the genomic annotation of the species with the highest and lowest number of hypothetical proteins using the RAST (Rapid Annotation using subsystem Technology) web server. Additionally, the name of the species, the subgroup to which it belongs, the annotated or identified proteins, the hypothetical proteins, the total number of proteins detected in the genome, the percentage of hypothetical proteins, and the position with respect to the number of hypothetical proteins detected are shown.
Figure 5. The figure shows the genomic annotation of the species with the highest and lowest number of hypothetical proteins using the RAST (Rapid Annotation using subsystem Technology) web server. Additionally, the name of the species, the subgroup to which it belongs, the annotated or identified proteins, the hypothetical proteins, the total number of proteins detected in the genome, the percentage of hypothetical proteins, and the position with respect to the number of hypothetical proteins detected are shown.
Preprints 176324 g004
Table 1. The table summarizes the 77 Leptospira species analyzed, including their taxonomic subgroup (p1, p2, s1, s2), NCBI accession number, genome size (Mb), GC content (%), and the number of genes, proteins, and non-coding sequences. Color coding indicates subgroup classification: purple for pathogenic p1, green for pathogenic p2, orange for saprophytic s1, and blue for saprophytic s2 species.
Table 1. The table summarizes the 77 Leptospira species analyzed, including their taxonomic subgroup (p1, p2, s1, s2), NCBI accession number, genome size (Mb), GC content (%), and the number of genes, proteins, and non-coding sequences. Color coding indicates subgroup classification: purple for pathogenic p1, green for pathogenic p2, orange for saprophytic s1, and blue for saprophytic s2 species.
Species subgroup Access Number Sequencing level Genome Size (Mb) GC (%) Genes Proteins Non-Coding
L. adleri p1 GCA_002811985.1 Scaffold 4.8 43.5 4789 4626 163
L. ainazelensis p1 GCA_016918785.1 Contig 4.9 42.5 4417 4310 107
L. ainlahdjerensis p1 GCA_016919175.1 Contig 4.8 42.5 4409 4310 99
L. alexanderi p1 GCA_000243815.3 Contig 4.2 40 4582 4541 41
L. alstonii p1 GCA_000347175.1 Contig 4.4 42.5 4423 4380 43
L. barantonii p1 GCA_002811925.1 Contig 4.4 44 4135 4033 102
L. borgpetersenii p1 GCA_003516145.1 Chromosome 4 40 3792 3463 329
L. ellisii p1 GCA_002811955.2 Contig 4.3 48 3998 3896 102
L. gomenensis p1 GCA_004770155.1 Contig 4.3 46 3931 3802 129
L. kirschneri p1 GCA_000243695.3 Contig 4.4 36 4029 3986 43
L. interrogans p1 GCA_002073495.2 Chromosome 4.6 35 4049 3772 277
L. kmetyi p1 GCA_003722295.1 Chromosome 4.4 45 4160 4035 125
L. mayottensis p1 GCA_000306675.3 Chromosome 4.2 39.5 3944 3592 352
L. noguchii p1 GCA_000306255.2 Contig 4.7 35.5 4565 4520 45
L. sanjuanensis p1 GCA_022267325.1 Contig 4.5 45 4152 4052 100
L. santarosai p1 GCA_000313175.2 Chromosome 4 42 4191 4080 111
L. stimsonii p1 GCA_003545875.1 Contig 4.7 42.5 4747 4599 148
L. tipperaryensis p1 GCA_001729245.1 Chromosome 4.6 42.5 4342 4255 87
L. weilii p1 GCA_006874765.1 Chromosome 4.4 41 4312 3965 347
L. yasudae p1 GCA_003545925.1 Contig 4.4 45.5 4286 4162 124
L. gorisiae p1 GCA_040833975.1 Chromosome 4.5 41.5 4104 3939 165
L. andrefontaineae p2 GCA_004770105.1 Contig 4.3 40 3969 3894 75
L. broomii p2 GCA_000243715.3 Contig 4.4 43 4249 4205 44
L. dzoumogneensis p2 GCA_004770895.1 Contig 4.1 41 3814 3724 90
L. fainei p2 GCA_000306235.2 Contig 4.3 43.5 4157 4113 44
L. fletcheri p2 GCA_004769195.1 Contig 3.7 47.5 3409 3326 83
L. fluminis p2 GCA_004771275.1 Contig 3.7 47.5 3427 3342 85
L. haakeii p2 GCA_002812225.1 Scaffold 4.2 40 3920 3814 106
L. hartskeerlii p2 GCA_002811475.1 Scaffold 4.1 40.5 3787 3708 79
L. inadai p2 GCA_000243675.3 Contig 4.5 44.5 4314 4264 50
L. johnsonii p2 GCA_003112675.1 Contig 4.1 41.5 3754 3713 41
L. koniambonensis p2 GCA_004769555.1 Contig 4.3 39 4000 3929 71
L. langatensis p2 GCA_004770615.1 Contig 4.1 45 3751 3671 80
L. licerasiae p2 GCA_000526875.1 Contig 4.2 41 3899 3834 65
L. neocaledonica p2 GCA_002812205.1 Scaffold 4.2 40 3978 3889 89
L. perolatii p2 GCA_002811875.1 Contig 4 42.5 3712 3631 81
L. saintgironsiae p2 GCA_002811765.1 Contig 4.1 39 3830 3736 94
L. sarikeiensis p2 GCA_004769615.1 Contig 4.4 40.5 4026 3928 98
L. selangorensis p2 GCA_004769405.1 Contig 4.2 40 3894 3814 80
L. semungkisensis p2 GCA_004770055.1 Contig 3.9 43 3626 3562 64
L. venezuelensis p2 GCA_002150035.1 Contig 4.3 39 4030 3973 57
L. wolffii p2 GCA_004770635.1 Contig 4.2 46 3851 3771 80
L. cinconiae p2 GCA_040833995.1 Chromosome 4.1 42 3801 3741 60
L. abararensis s1 GCA_016918735.1 Contig 4.2 39 3968 3900 68
L. bandrabouensis s1 GCA_004770555.1 Contig 4.2 37.5 3928 3857 71
L. biflexa s1 GCA_000017685.1 Chromosome 4 39 3775 3726 49
L. bourretii s1 GCA_004770145.1 Contig 4.2 38 3923 3840 83
L. bouyouniensis s1 GCA_004770625.1 Contig 4.1 37 3833 3746 87
L. brenneri s1 GCA_004769295.1 Contig 3.9 38.5 3662 3593 69
L. chreensis s1 GCA_016919165.1 Contig 4.5 40 4176 4086 90
L. congkakensis s1 GCA_004770265.1 Contig 4 38 3712 3647 65
L. ellinghausenii s1 GCA_003114815.1 Contig 4.2 37.5 3960 3921 39
L. harrisiae s1 GCA_002811945.1 Scaffold 3.9 38 3726 3651 75
L. jelokensis s1 GCA_004769775.1 Contig 4.1 39 3833 3755 78
L. kanakyensis s1 GCA_004769235.1 Contig 4.1 38.5 3884 3810 74
L. kemamanensis s1 GCA_004769665.1 Contig 3.8 39 3533 3436 97
L. levettii s1 GCA_002812085.1 Scaffold 3.9 37.5 3655 3579 76
L. meyeri s1 GCA_004368965.1 Contig 4.2 38 4028 3930 98
L. montravelensis s1 GCA_004770045.1 Contig 4 37.5 3760 3686 74
L. mtsangambouensis s1 GCA_004770475.1 Contig 4.1 38 3836 3764 72
L. noumeaensis s1 GCA_004770765.1 Contig 4.1 38.5 3832 3758 74
L. perdikensis s1 GCA_004769575.1 Contig 4 38.5 3740 3680 60
L. terpstrae s1 GCA_000332495.2 Contig 4.1 38 3932 3889 43
L. vanthielii s1 GCA_004770365.1 Contig 4.1 39 3839 3753 86
L. wolbachii s1 GCA_000332515.2 Contig 4.1 39 3956 3912 44
L. yanagawae s1 GCA_004769275.1 Contig 4 38.5 3704 3631 73
L. mgodei s1 GCA_040833985.1 Chromosome 4 39 3807 3752 55
L. milleri s1 GCA_040833955.1 Chromosome 3.9 38.5 3628 3555 73
L. iowaensis s1 GCA_040833965.1 Chromosome 4.1 37 3867 3812 55
L. paudalimensis s1 GCA_026151345.1 Contig 4.1 37.5 3769 3711 58
L. soteropolitanensis s1 GCA_026151335.1 Contig 4.1 37.5 3934 3863 71
L. limi s1 GCA_026151395.1 Contig 3.9 37.5 3679 3619 60
L. idonii s2 GCA_004770995.1 Contig 4.1 41 3797 3724 73
L. ilyithenensis s2 GCA_004771005.1 Contig 4.2 40.5 3950 3849 101
L. kobayashii s2 GCA_003114835.3 Chromosome 4.3 40.5 3945 3902 43
L. ognonensis s2 GCA_004770745.1 Scaffold 4 39.5 3733 3660 73
L. ryugenii s2 GCA_003114855.1 Scaffold 4 40 3698 3659 39
Table 2. The table summarizes the 77 Leptospira species analyzed, including their taxonomic subgroup (p1, p2, s1, s2), NCBI accession number, genome size (Mb), GC content (%), and the number of genes, proteins, and non-coding sequences. Color coding indicates subgroup classification: purple for pathogenic p1, green for pathogenic p2, orange for saprophytic s1, and blue for saprophytic s2 species.
Table 2. The table summarizes the 77 Leptospira species analyzed, including their taxonomic subgroup (p1, p2, s1, s2), NCBI accession number, genome size (Mb), GC content (%), and the number of genes, proteins, and non-coding sequences. Color coding indicates subgroup classification: purple for pathogenic p1, green for pathogenic p2, orange for saprophytic s1, and blue for saprophytic s2 species.
Species Clade Proteins Class Type Subtype Variant Native target
L. adleri p1 cas1_TypeIA, cas4_TypeI-II, cas3_TypeI, cas3a_TypeI, cas2_TypeI-II-III, cas5a2_TypeIA, cas6_TypeIA, cas7b_TypeIB Class 1 I, II, III IA, IB cas3a, cas5a2, cas7b DNA
L. ainazelensis p1 Not detected - - - - -
L. ainlahdjerensis p1 cas3_TypeI Class 1 I - - DNA
L. alexanderi p1 cse1_TypeIE, cse2_TypeIE, cas1_TypeIE, cas2_TypeIE, cas3_TypeI, cas4_TypeI-II , cas5_TypeIE, cas6_TypeIE, cas7_TypeIE Class 1 I, II IE - DNA
L. alstonii p1 cas1_TypeIA , cas2_TypeI-II-III, cas3_TypeI, cas5a2_TypeIA, cas6_TypeIE, cas7_TypeI, cas4_TypeI-II, cas5_TypeIE Class 1 I, II, III IA, IE Cas5a DNA
L. barantonii p1 cas3_TypeI Class 1 I - - DNA
L. borgpetersenii p1 cse2_TypeIE, cse1_TypeIE, cas1_TypeIE, cas2_TypeIE, cas3_TypeI, cas5_TypeI, cas6_TypeIE, cas7_TypeIE Class 1 I IE - DNA
L. ellisii p1 cas3a_TypeI Class 1 I - - DNA
L. gomenensis p1 Not detected - - - - -
L. kirschneri p1 cas1_TypeIC, cas2_TypeI-II-III, cas3_TypeI, cas4_TypeI-II, cas5_TypeIA, cas5c_TypeIC, cas7c_TypeIC, cas8c_TypeIC Class 1 I, II, III IC, IA Cas5c, cas7c, cas8c DNA
L. interrogans p1 cas1_TypeIC, cas2_TypeI-II-III, cas3_TypeI, cas3a_TypeI, cas4_TypeI-II, cas5c_TypeIC, cas7c_TypeIC, cas8c_TypeIC Class 1 I, II, III IC Cas3a DNA
L. kmetyi p2 cas1_TypeIA, cas2_TypeI-II-III, cas3_TypeI, cas5a2_TypeIA, cas6_TypeIA, cas7_TypeI, cas8a1a3_TypeIA Class 1 I, II, III IA Cas5a2, cas8a1a3 DNA
L. mayottensis p1 cse1_TypeIE, cse2_TypeIE, cas1_TypeIE, cas2_TypeIE, cas3_TypeI, cas5_TypeIE, cas6_TypeIE, cas7_TypeIE Class 1 I IE - DNA
L. noguchii p1 cas2_TypeI-II-III, cas3_TypeI, cas3a_TypeI, cas4_TypeI-II, cas5c_TypeIC, cas7c_TypeIC, cas8c_TypeIC Class 1 I, II, III IC Cas3a, cas5c, cas7c,cas8c DNA
L. sanjuanensis p1 Not detected - - - - -
L. santarosai p1 cse1_TypeIE, cas1_TypeIE, cas1_TypeIA, cas2_TypeI-II-III, cas2_TypeIE, cas3_TypeI, cas5_TypeIE, cas5a2_TypeIA, cas6_TypeIE, cas6_TypeIA, cas7_TypeIE, cas7_TypeI, cas8a1a3_TypeIA Class 1 I, II, III IE, IA Cas5a2, cas8a1a3 DNA
L. stimsonii p1 cse1_TypeIE, cse2_TypeIE, cas1_TypeIE, cas2_TypeIE, cas3_TypeI, cas5_TypeIE, cas6_TypeIE, cas7_TypeIE Class 1 I IE - DNA
L. tipperaryensis p1 Not detected - - - - -
L. weilii p1 cse1_TypeIE, cse2_TypeIE, cas1_TypeIE, cas2_TypeIE, cas3_TypeI, cas5_TypeIE, cas6_TypeIE, cas7_TypeIE Class 1 I IE - DNA
L. yasudae p1 Not detected - - - - -
L. gorisiae p1 cse1_TypeIE, cpf1_TypeU (Cas12a), cas1_TypeIE, cas1_TypeU, cas2_TypeIE, cas2_Type I-II-III, cas3_TypeI, cas4_TypeU, cas5_TypeIE, cas6_TypeIE, cas7_TypeIE Class 1
Class 2
II, V IE - DNA
L. andrefontaineae p2 cas3_TypeI Class 1 I - - DNA
L. broomii p2 Not detected - - - - -
L. dzoumogneensis p2 cas3_TypeI, cas3a_TypeI Class 1 I - Cas3a DNA
L. fainei p2 cse1_TypeIE, cse2_TypeIE, cas1_TypeIE, cas2_TypeIE, cas3_TypeI, cas5_TypeIE, cas6_TypeIE, cas7_TypeIE Class 1 I IE - DNA
L. fletcheri p2 cas1_TypeI-II-III, cas2_TypeI-II-III, cas4_TypeI-II, cas9_TypeIIB Class 2 II IIB - DNA
L. fluminis p2 Not detected - - - - -
L. haakeii p2 csm2_TypeIIIA, cas3_TypeI Class 1 I, III IIIA - DNA and RNA
L. hartskeerlii p2 cas3_TypeI Class 1 I - - DNA
L. inadai p2 cas1_TypeIB, cas1_TypeU, cpf1_TypeU (Cas12a), cas1_TypeI-II-III, cas2_TypeI-II-III, cas3_TypeI, cas4_TypeI-II, cas4_TypeU, cas4_TypeI-II, cas5b_TypeIB, cas6_TypeI-III, cas7b_TypeIB Class 1
Class 2
I, II, II, V IB Cas5b, cas7b DNA
L. johnsonii p2 Not detected - - - - -
L. koniambonensis p2 cse1_TypeIE, cse2_TypeIE, cas1_TypeIE, cas2_TypeIE, cas3_TypeI, cas5_TypeIE, cas6_TypeIE, cas7_TypeIE Class 1 I IE - DNA
L. langatensis p2 Not detected - - - - -
L. licerasiae p2 cas3_TypeI Class 1 I - - DNA
L. neocaledonica p2 Not detected - - - - -
L. perolatii p2 Not detected - - - - -
L. saintgironsiae p2 cas3_TypeI Class 1 I - - DNA
L. sarikeiensis p2 Not detected - - - - -
L. selangorensis p2 Not detected - - - - -
L. semungkisensis p2 Not detected - - - - -
L. venezuelensis p2 Not detected - - - - -
L. wolffii p2 Not detected - - - - -
L. cinconiae p2 Not detected - - - - -
L. abararensis s1 cas3_TypeI, cas3a_TypeI Class 1 I - Cas3a DNA
L. bandrabouensis s1 cas3_TypeI Class 1 I - - DNA
L. biflexa s1 Not detected - - - - -
L. bourretii s1 Not detected - - - - -
L. bouyouniensis s1 cas3_TypeI Class 1 I - - DNA
L. brenneri s1 Not detected - - - - -
L. chreensis s1 cas3a_TypeI Class 1 I - - DNA
L. congkakensis s1 cas3a_TypeI Class 1 I - - DNA
L. ellinghausenii s1 Not detected - - - - -
L. harrisiae s1 Not detected - - - - -
L. jelokensis s1 Not detected - - - - -
L. kanakyensis s1 Not detected - - - - -
L. kemamanensis s1 cas3_TypeI Class 1 I - - DNA
L. levettii s1 Not detected - - - - -
L. meyeri s1 Not detected - - - - -
L. montravelensis s1 Not detected - - - - -
L. mtsangambouensis s1 Not detected - - - - -
L. noumeaensis s1 Not detected - - - - -
L. perdikensis s1 Not detected - - - - -
L. terpstrae s1 Not detected - - - - -
L. vanthielii s1 cas3a_TypeI Class 1 I - - DNA
L. wolbachii s1 Not detected - - - - -
L. yanagawae s1 Not detected - - - - -
L. mgodei s1 Not detected - - - - -
L. milleri s1 Not detected - - - - -
L. iowaensis s1 Not detected - - - - -
L. paudalimensis s1 Not detected - - - - -
L. soteropolitanensis s1 Not detected - - - - -
L. limi s1 Not detected - - - - -
L. idonii s2 cas3_TypeI, cas3a_TypeI Class 1 I - cas3a DNA
L. ilyithenensis s2 cas1_TypeU, cas2_TypeI-II-III, cas3_TypeI, cas4_TypeU, cpf1_TypeU(Cas12a) Class 2 I, II, III, V - - DNA
L. kobayashii s2 Not detected - - - - -
L. ognonensis s2 Not detected - - - - -
L. ryugenii s2 cse1_TypeIE, cse2_TypeIE, cas1_TypeIE, cas2_TypeIE, cas3_TypeI, cas5_TypeIE, cas6_TypeIE, cas7_TypeIE Class 1 I IE - DNA
Table 3. Distribution of the Cas proteins across the three functional states of the CRISPR-Cas system: Adaptation (adapter integration), Expression (pre-crRNA processing), and Interference (effector complex formation or target cleavage) in the 19 Leptospira species that possess multiple proteins.
Table 3. Distribution of the Cas proteins across the three functional states of the CRISPR-Cas system: Adaptation (adapter integration), Expression (pre-crRNA processing), and Interference (effector complex formation or target cleavage) in the 19 Leptospira species that possess multiple proteins.
Species Adaptation
Spacer Integration
Expression
Pre-crRNA Processing
Interference Taxonomic
classification
Effector Complex Target Cleavage
L. adleri Cas1_TypeIA, Cas2_TypeI-II-III, Cas4_TypeI-II Cas6_TypeIA Cas5a2_TypeIA, Cas7b_TypeIB Cas3_TypeI,Cas3a_TypeI Class 1 type IF2
L. alexanderi Cas1_TypeIE, Cas2_TypeIE, Cas4_TypeI-II Cas6_TypeIE Cas5_TypeIE, Cas7_TypeIE, Cse1_TypeIE, Cse2_TypeIE Cas3_TypeI Class 1, type IE
L. alstonii Cas1_TypeIA , Cas2_TypeI-II-III, Cas4_TypeI-II Cas6_TypeIE Cas5a2_TypeIA, Cas5_TypeIE, Cas7_TypeI Cas3_TypeI Class 1 type IF2
L. borgpetersenii Cas2_TypeIE Cas6_TypeIE Cas5_TypeI, Cas7_TypeIE, Cse1_TypeIE Cas3_TypeI Class 1, type IE
L. kirschneri Cas1_TypeIC, Cas2_TypeI-II-III, Cas4_TypeI-II Cas5_TypeIA, Cas5c_TypeIC, Cas7c_TypeIC, Cas8c_TypeIC Cas3_TypeI Class 1 type IC
L. interrogans Cas1_TypeIC, Cas2_TypeI-II-III, Cas4_TypeI-II Cas5c_TypeIC, Cas7c_TypeIC, Cas8c_TypeIC Cas3_TypeI, Cas3a_TypeI Class 1 type IC
L. mayottensis Cas1_TypeIE, Cas2_TypeIE Cas6_TypeIE Cas5_TypeIE, Cas7_TypeIE, Cse1_TypeIE, Cse2_TypeIE Cas3_TypeI Class 1, type IE
L. noguchii Cas2_TypeI-II-III, Cas4_TypeI-II Cas5c_TypeIC, Cas7c_TypeIC, Cas8c_TypeIC Cas3_TypeI, Cas3a_TypeI Class 1 type IC
L. santarosai Cas1_TypeIA,Cas1_TypeIE,Cas2_TypeI-II-III, Cas2_TypeIE Cas6_TypeIA, Cas6_TypeIE Cas5_TypeIE,Cas5a2_TypeIA,Cas7_TypeIE,Cas7_TypeI, Cas8a1a3_TypeIA, Cse1_TypeIE Cas3_TypeI Class 1 type B
L. stimsonii Cas1_TypeIE, Cas2_TypeIE Cas6_TypeIE Cas5_TypeIE, Cas7_TypeIE, Cse1_TypeIE, Cse2_TypeIE Cas3_TypeI Class 1, type IE
L. weilii Cas1_TypeIE, Cas2_TypeIE Cas6_TypeIE Cas5_TypeIE, Cas7_TypeIE, Cse1_TypeIE, Cse2_TypeIE Cas3_TypeI Class 1, type IE
L. gorisiae Cas1_TypeU, Cas1_TypeIE, Cas2_Type_I-II-III, Cas2_TypeIE, Cas4_TypeU Cas6_TypeIE Cas5_TypeIE, Cas7_TypeIE, Cse1_TypeIE Cas12a, Cas3_TypeI Class 1, type IE
Class 2 type V
L. kmetyi Cas1_TypeIA, Cas2_TypeI-II-III Cas6_TypeIA Cas5a2_TypeIA, Cas7_TypeI, Cas8a1a3_TypeIA Cas3_TypeI Class 1 type B
L. fainei Cas1_TypeIE, cas2_TypeIE Cas6_TypeIE Cas5_TypeIE, Cas7_TypeIE, Cse1_TypeIE, Cse2_TypeIE Cas3_TypeI Class 1, type IE
L. fletcheri Cas1_TypeI-II-III, Cas2_TypeI-II-III, Cas4_TypeI-II RNAse III Cas9_TypeIIB Class 2 type IIB
L. inadai Cas1_TypeIB, Cas1_TypeU, Cas1_TypeI-II-III, Cas2_TypeI-II-III, Cas4_TypeI-II,Cas4_TypeU, Cas4_TypeI-II Cas6_TypeI-III Cas5b_TypeIB, Cas7b_TypeIB Cas3_TypeI, Cas12a Class 1 type IF2
Class 2 type V
L. koniambonensis Cas1_TypeIE, Cas2_TypeIE Cas6_TypeIE Cas5_TypeIE, Cas7_TypeIE, Cse1_TypeIE, Cse2_TypeIE Cas3_TypeI Class 1, type IE
L. ilyithenensis Cas1_TypeU, Cas2_TypeI-II-III, Cas4_TypeU Cas3_TypeI,Cas12a Class 2 type V
L. ryugenii Cas1_TypeIE, Cas2_TypeIE Cas6_TypeIE Cas5_TypeIE, Cas7_TypeIE, Cse1_TypeIE, Cse2_TypeIE Cas3_TypeI Class 1, type IE
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2025 MDPI (Basel, Switzerland) unless otherwise stated