Submitted:
28 June 2026
Posted:
29 June 2026
You are already at the latest version
Abstract
Monitoring individual strains in complex microbial communities remains a fundamental challenge in microbial ecology and biotechnology. Here, we present an integrated pipeline for identifying and validating strain-specific loci (SSL) in four biotechnologically relevant strains of Stenotrophomonas, Bacillus, and Pseudomonas genera. The pipeline applies a two-round specificity-filtering strategy combining whole-genome comparison and high-sensitivity BLASTn validation of revealed strain specific loci (SSL) against the NCBI nucleotide database. SSL count inversely correlated with Average nucleotide identity (ANIb) of the strains used for the analysis, ranging from one locus in B. halotolerans (ANIb = 98.91%) to 15 loci in S. rhizophila (ANIb = 86.49%). All 25 SSL were universally AT-rich, mainly accessory-genome-associated, with flanking regions enriched in genes of unknown function (34.6%) and mobile genetic elements (19.2%). TaqMan qPCR assays targeting SSL demonstrated high specificity – no target sequences were detected across ten geographically distinct soil samples – and sensitivity, with limits of detection of 0.01–0.1 pg of genomic DNA. Spike‑in experiments in soil yielded method detection limits (MDL) of 850 – 15000 CFU/g. All four strains were detected in the wheat rhizosphere seven days after consortium application in a field experiment, validating the pipeline for multi-strain field monitoring.
Keywords:
1. Introduction
2. Materials and Methods
2.1. Bacterial Strains and Culture Conditions
2.2. Identification of Strain-Specific Loci (SSL)
2.3. Genomic Environment Analysis of SSL
2.4. BLAST Analysis of Soil Metagenome Samples
2.5. DNA Manipulation and Polymerase Chain Reaction (PCR)
2.5.1. Sampling, Extraction, and Quantification of DNA
2.5.2. Primer Design, PCR, and Electrophoresis
2.5.3. qPCR for Sensitivity and Specificity Testing
2.6. Assessment of SSL Transcriptional Activity
2.7. Method Detection Limit
- Genome copies per CFU = (qPCR-measured genome equivalents per g soil) / (inoculated CFU per g soil).
2.8. Field Experiment on Winter Wheat
3. Results
3.1. Identification of Strain-Specific Regions in Whole-Genome Sequences
5.2. Analysis of SSL Genomic Environment
3.3. Selection of Target Regions and Primer Design for qPCR
3.4. Transcriptional activity of selected SSL
3.5. Sensitivity Assessment
| Sample | 100 pg | 10 pg | 1 pg | 0,1 pg | 0,01 pg |
| MGMM118 | 21,23 ± 0,24 | 24,85 ± 0,19 | 28,51 ± 0,28 | 32,12 ± 0,33 | 35,20 ± 1,10 |
| MGMM119 | 19,61 ± 0,06 | 22,99 ± 0,19 | 26,38 ± 0,12 | 29,37 ± 0,36 | 33,41 ± 0,81 |
| MGMM120 | 26,94 ± 0,02 | 30,56 ± 0,12 | 34,24 ± 0,40 | 37,35 ± 0,98 | - |
| MGMM121 | 24,46 ± 0,03 | 27,94 ± 0,06 | 31,44 ± 0,21 | 34,95 ± 0,07 | 37,10 ± 0,22 |
| Strain | MDL (CFU/g) | Genome copies per CFU (mean ± SD) |
| MGMM118 | 1.4 × 104 | 0.66 ± 0.03 |
| MGMM119 | 6.5 × 103 | 0.48 ± 0.10 |
| MGMM120 | 1.5 × 104 | 2.12 ± 0.43 |
| MGMM121 | 1.3 × 104 | 0.52 ± 0.10 |
3.6. Specificity Testing
3.7. Search for SSL in Metagenomics Data
3.8. Detection of Strains in the Field Experiment
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| SL | Specific loci |
| preSSL | Presumed strain specific loci |
| SSL | Strain specific loci |
References
- Briczinski, E.P.; Loquasto, J.R.; Barrangou, R.; Dudley, E.G.; Roberts, A.M.; Roberts, R.F. Strain-Specific Genotyping of Bifidobacterium Animalis Subsp. Lactis by Using Single-Nucleotide Polymorphisms, Insertions, and Deletions. Appl. Environ. Microbiol. 2009, 75, 7501–7508. [Google Scholar] [CrossRef] [PubMed]
- Andronov, E.E.; Aksenova, T.S.; Onishchuk, O.P.; Kurchak, O.N.; Safronova, V.I.; Pinaev, A.G.; Evsyukov, I.V.; Provorov, N.A. Strain-Specific Markers of Rhizobia According to Whole Genome Sequencing Data. Microbiology 2025, 94, 29–37. [Google Scholar] [CrossRef]
- Jo, B.-H.; Lee, C.S.; Song, H.-R.; Lee, H.-G.; Oh, H.-M. Development of Novel Microsatellite Markers for Strain-Specific Identification of Chlorella Vulgaris. J. Microbiol. Biotechnol. 2014, 24, 1189–1195. [Google Scholar] [CrossRef] [PubMed]
- Hernández, I.; Sant, C.; Martínez, R.; Fernández, C. Design of Bacterial Strain-Specific qPCR Assays Using NGS Data and Publicly Available Resources and Its Application to Track Biocontrol Strains. Front. Microbiol. 2020, 11, 208. [Google Scholar] [CrossRef] [PubMed]
- Wang, X.; Tian, X.; Li, W.; Yang, Y.; Zhang, S.; Wang, H.; Geng, W.; Zhai, J. An SNP-Based Diagnostic Method for Brucella S2 Vaccine Strain Infections. Front. Vet. Sci. 2025, 12, 1570220. [Google Scholar] [CrossRef] [PubMed]
- Louws, F.; Rademaker, J.; De Bruijn, F. The three Ds of PCR-based genomic analysis of phytobacteria : Diversity, Detection, and Disease Diagnosis. Annu. Rev. Phytopathol. 1999, 37, 81–125. [Google Scholar] [CrossRef] [PubMed]
- Chambers, G.K.; MacAvoy, E.S. Microsatellites: Consensus and Controversy. Comp. Biochem. Physiol. B Biochem. Mol. Biol. 2000, 126, 455–476. [Google Scholar] [CrossRef] [PubMed]
- Ghezzi, H.; Fan, Y.M.; Ng, K.M.; Burckhardt, J.C.; Pepin, D.M.; Lin, X.; Ziels, R.M.; Tropini, C. PUPpy: A Primer Design Pipeline for Substrain-Level Microbial Detection and Absolute Quantification. mSphere 2024, 9, e0036024. [Google Scholar] [CrossRef] [PubMed]
- Kodama, Y.; Shumway, M.; Leinonen, R.; on behalf of the International Nucleotide Sequence Database Collaboration. The Sequence Read Archive: Explosive Growth of Sequencing Data. Nucleic Acids Res. 2012, 40, D54–D56. [Google Scholar] [CrossRef] [PubMed]
- Teeling, H.; Meyerdierks, A.; Bauer, M.; Amann, R.; Glöckner, F.O. Application of Tetranucleotide Frequencies for the Assignment of Genomic Fragments. Environ. Microbiol. 2004, 6, 938–947. [Google Scholar] [CrossRef] [PubMed]
- Konstantinidis, K.T.; Tiedje, J.M. Genomic Insights That Advance the Species Definition for Prokaryotes. Proc. Natl. Acad. Sci. 2005, 102, 2567–2572. [Google Scholar] [CrossRef] [PubMed]
- Beran, P.; Stehlíková, D.; Cohen, S.P.; Čurn, V. KEC: Unique Sequence Search by K-Mer Exclusion. Bioinformatics 2021, 37, 3349–3350. [Google Scholar] [CrossRef] [PubMed]
- Brunk, C.F.; Li, J.; Avaniss-Aghajani, E. Analysis of Specific Bacteria from Environmental Samples Using a Quantitative Polymerase Chain Reaction. Curr. Issues Mol. Biol. 2002, 4, 13–18. [Google Scholar] [CrossRef] [PubMed]
- Boodman, C.; Gupta, N.; Cimen, C.; Van Griensven, J.; Cheng, M.P.; Yansouni, C.P.; Bottieau, E. Etiologies of Community-Acquired Febrile Illness Identified by TaqMan Array Card qPCR on Blood Samples: A Systematic Review and Meta-Analysis. J. Clin. Microbiol. 2026, 64, e00101-26. [Google Scholar] [CrossRef] [PubMed]
- King, E.O.; Ward, M.K.; Raney, D.E. Two Simple Media for the Demonstration of Pyocyanin and Fluorescin. J. Lab. Clin. Med. 1954, 44, 301–307. [Google Scholar] [PubMed]
- Camacho, C.; Coulouris, G.; Avagyan, V.; Ma, N.; Papadopoulos, J.; Bealer, K.; Madden, T.L. BLAST+: Architecture and Applications. BMC Bioinform. 2009, 10, 421. [Google Scholar] [CrossRef] [PubMed]
- Mistry, J.; Chuguransky, S.; Williams, L.; Qureshi, M.; Salazar, G.A.; Sonnhammer, E.L.L.; Tosatto, S.C.E.; Paladin, L.; Raj, S.; Richardson, L.J.; et al. Pfam: The Protein Families Database in 2021. Nucleic Acids Res. 2021, 49, D412–D419. [Google Scholar] [CrossRef] [PubMed]
- Wishart, D.S.; Han, S.; Saha, S.; Oler, E.; Peters, H.; Grant, J.R.; Stothard, P.; Gautam, V. PHASTEST: Faster than PHASTER, Better than PHAST. Nucleic Acids Res. 2023, 51, W443–W450. [Google Scholar] [CrossRef] [PubMed]
- Jiang, H.; Lei, R.; Ding, S.-W.; Zhu, S. Skewer: A Fast and Accurate Adapter Trimmer for next-Generation Sequencing Paired-End Reads. BMC Bioinform. 2014, 15, 182. [Google Scholar] [CrossRef] [PubMed]
- Miftakhov, A.K.; Diabankana, R.G.C.; Frolov, M.; Yusupov, M.M.; Validov, S.Z.; Afordoanyi, D.M. Persistence as a Constituent of a Biocontrol Mechanism (Competition for Nutrients and Niches) in Pseudomonas Putida PCL1760. Microorganisms 2022, 11, 19. [Google Scholar] [CrossRef] [PubMed]
- Vanni, C.; Schechter, M.S.; Acinas, S.G.; Barberán, A.; Buttigieg, P.L.; Casamayor, E.O.; Delmont, T.O.; Duarte, C.M.; Eren, A.M.; Finn, R.D.; et al. Unifying the Known and Unknown Microbial Coding Sequence Space. eLife 2022, 11, e67667. [Google Scholar] [CrossRef] [PubMed]
- Newton, I.L.G.; Bordenstein, S.R. Correlations Between Bacterial Ecology and Mobile DNA. Curr. Microbiol. 2011, 62, 198–208. [Google Scholar] [CrossRef] [PubMed]
- De Lazzari, E.; Grilli, J.; Maslov, S.; Cosentino Lagomarsino, M. Family-Specific Scaling Laws in Bacterial Genomes. Nucleic Acids Res. 2017, 45, 7615–7622. [Google Scholar] [CrossRef] [PubMed]
- Zeng, Q.; Xie, J.; Li, Y.; Gao, T.; Xu, C.; Wang, Q. Comparative Genomic and Functional Analyses of Four Sequenced Bacillus Cereus Genomes Reveal Conservation of Genes Relevant to Plant-Growth-Promoting Traits. Sci. Rep. 2018, 8, 17009. [Google Scholar] [CrossRef] [PubMed]
- Sharma, G.K.; Sharma, R.; Joshi, K.; Qureshi, S.; Mathur, S.; Sinha, S.; Chatterjee, S.; Nunia, V. Advancing Microbial Diagnostics: A Universal Phylogeny Guided Computational Algorithm to Find Unique Sequences for Precise Microorganism Detection. Brief. Bioinform. 2024, 25, bbae545. [Google Scholar] [CrossRef] [PubMed]
- Haubold, B.; Klötzl, F.; Hellberg, L.; Thompson, D.; Cavalar, M. Fur : Find Unique Genomic Regions for Diagnostic PCR. Bioinformatics 2021, 37, 2081–2087. [Google Scholar] [CrossRef] [PubMed]
- Tsifintaris, M.; Koutra, P.; Tsiartas, P.; Repanas, P.; Touliopoulos, S.; Nelios, G.; Anastasiadou, A.; Tamouridou, G.; Nikolaou, A.; Tsochantaridis, I. Erimin: A Pipeline to Identify Bacterial Strain Specific Primers. DNA 2026, 6, 11. [Google Scholar] [CrossRef]
- Vernikos, G.; Medini, D.; Riley, D.R.; Tettelin, H. Ten Years of Pan-Genome Analyses. Curr. Opin. Microbiol. 2015, 23, 148–154. [Google Scholar] [CrossRef] [PubMed]
- SantaLucia, J.; Hicks, D. The Thermodynamics of DNA Structural Motifs. Annu. Rev. Biophys. Biomol. Struct. 2004, 33, 415–440. [Google Scholar] [CrossRef] [PubMed]
- Sakoparnig, T.; Field, C.; Van Nimwegen, E. Whole Genome Phylogenies Reflect the Distributions of Recombination Rates for Many Bacterial Species. eLife 2021, 10, e65366. [Google Scholar] [CrossRef] [PubMed]
- Hershberg, R.; Petrov, D.A. Evidence That Mutation Is Universally Biased towards AT in Bacteria. PLoS Genet. 2010, 6, e1001115. [Google Scholar] [CrossRef] [PubMed]
- Koonin, E.V.; Makarova, K.S.; Wolf, Y.I. Evolutionary Genomics of Defense Systems in Archaea and Bacteria. Annu. Rev. Microbiol. 2017, 71, 233–261. [Google Scholar] [CrossRef] [PubMed]
- Kibby, E.M.; Whiteley, A.T. The Linguistics of Bacterial Conflict Systems Reveal Ancient Origins of Eukaryotic Innate Immunity. J. Bacteriol. 2020, 202, e00507-20. [Google Scholar] [CrossRef] [PubMed]
- Boissinot, S. On the Base Composition of Transposable Elements. Int. J. Mol. Sci. 2022, 23, 4755. [Google Scholar] [CrossRef] [PubMed]
- Plaire, D.; Puaud, S.; Marsolier-Kergoat, M.-C.; Elalouf, J.-M. Comparative Analysis of the Sensitivity of Metagenomic Sequencing and PCR to Detect a Biowarfare Simulant (Bacillus Atrophaeus) in Soil Samples. PLoS ONE 2017, 12, e0177112. [Google Scholar] [CrossRef] [PubMed]
- Pecoraro, V.; Zerulla, K.; Lange, C.; Soppa, J. Quantification of Ploidy in Proteobacteria Revealed the Existence of Monoploid, (Mero-)Oligoploid and Polyploid Species. PLoS ONE 2011, 6, e16392. [Google Scholar] [CrossRef] [PubMed]
- Böttinger, B.; Semmler, F.; Zerulla, K.; Ludt, K.; Soppa, J. Regulated Ploidy of Bacillus Subtilis and Three New Isolates of Bacillus and Paenibacillus. FEMS Microbiol. Lett. 2018, 365. [Google Scholar] [CrossRef] [PubMed]



| Target (strain, SSL or gene) | Primer | Tm, °С | Sequence 3’-5’ | Reporter/ Quencher |
|---|---|---|---|---|
| MGMM118 (SSL№10) | USEQ-118-F | 57 | GAGCGCTGTTTCTGTCGAGCG | – |
| USEQ-118-R | 59 | GCCTGCCAACAAGACCGATAACAG | – | |
| USEQ-118-Z | 64 | ATGAGAGTCGAGATTGGTCCGGCACGCT | FAM/BHQ-1 | |
| MGMM119 (SSL№16) | USEQ-119-F | 57 | GCGACTTGTTCCTAGTGTAATATATCAAC | – |
| USEQ-119-R | 58 | CATGAAATATTCTCATTCTTTAATAGCCATCTC | – | |
| USEQ-119-Z | 63 | ACGAAGGCTGGGTGACAAATAATGAACACCCT | R6G/BHQ-1 | |
| MGMM120 (SSL№18) | USEQ-120-F | 58 | TATAGCAAGTAAAACCTTAGATAGAACATCAACC | – |
| USEQ-120-R | 58 | GGCTTTCACGAAAGATGCGTGAGC | – | |
| USEQ-120-Z | 63 | TAGAGGCATGTCGTCGGAATGGTTGGGAA | TAMRA/BHQ-2 | |
| MGMM121 (SSL№25) | USEQ-121-F | 57 | ACACAGTAATCTGATAGATTGGATCTAGG | – |
| USEQ-121-R | 58 | CATAACGAAGGGCTGGCCGAC | – | |
| USEQ-121-Z | 64 | TGGCAGTGTACAGGCGTCACAGGCATACAA | Cy5/BHQ-3 | |
| rsfS gene [20] | rsfS-rt-F | 60 | GGCGAAGAACTGGTCGCAGTGAC | Cy5/BHQ-3 |
| rsfS-rt-R | 60 | GGCAATGATCATGTAGTCGGTCAGGC | – | |
| Universal primers (789-1053 16S region) [13] |
U789-F | 45 | GATACCCSSGTAGTCC | – |
| U1053-R | 45 | CTGACGRCRGCCATGС | – |
| S. rhizophila group | B. halotolerans group | P. grimontii group | P. viciae group | |
| Size | 4444949,8 | 4119309,2 | 6766996,2 | 6661807,2 |
| Contigs | 4,6 | 4 | 11,2 | 1 |
| GC [%] | 66,94 | 43,8 | 60,5 | 60,5 |
| tRNAs | 69 | 82,2 | 65 | 66,2 |
| rRNAs | 8,4 | 22,4 | 13,4 | 16 |
| CDSs | 4049 | 4056,2 | 6095 | 5746,8 |
| pseudogenes | 84,4 | 29,6 | 10,4 | 11,4 |
| hypotheticals | 351,6 | 134,4 | 156,4 | 167,2 |
| ANIb [%] | 86,49 | 98,9175 | 94,08 | 93,9275 |
| Aligned [%] | 72,9075 | 94,585 | 85,5625 | 86,205 |
| Number of SL | 111 | 96 | 105 | 104 |
| Number of preSSL | 20 | 1 | 12 | 11 |
| Number of SSL | 15 | 1 | 5 | 4 |
| Functional Category | Gene product | Count | % of total | Ref % |
|---|---|---|---|---|
| DNA Replication, Repair and Modification | DNA adenine methylase (3); DNA polymerase III subunit delta'; EcoRII-like endonuclease; NUDIX hydrolase; DNA polymerase V subunit D | 7 | 9,0% | <10% [21] |
| Mobile Genetic Elements and Phage-Related | SLATT-5 domain-containing protein; HNH endonuclease-like; NHH-endonuclease-like; Phage-base-V domain-containing protein; ApeA-NTD1 domain-containing protein; RHS repeat-associated protein (2); colicin E3/pyocin S6 family cytotoxin; C4-antisense RNA (3); ogr/Delta-like zinc finger containing protein; Caspase family protein; HEPN domain-containing protein; NYN domain-containing protein | 15 | 19,2% | <5% [22] |
| Regulation and Signaling | Histidine kinase; DNA-binding response regulator; GGDEF domain-containing protein; Type IV pilus assembly protein PilZ; sensor domain-containing diguanylate cyclase; Competence protein J (ComJ); Bacterial nucleoid DNA-binding protein IHF-alpha; RelA-SpoT; SpoVT-AbrB domain-containing protein; Serine/threonine protein kinase; ArsR family transcriptional regulator | 11 | 14,1% | <10% [23] |
| Cellular Metabolism and Biosynthesis | bifunctional aconitate hydratase; Putative metalloprotease with PDZ domain; 4-hydroxy-3-methylbut-2-enyl diphosphate reductase; signal peptidase II; isoleucine--tRNA ligase; pyruvate dehydrogenase ; GtrA domain-containing protein; HAD family phosphatase; HAD family hydrolase; Glycosyltransferase; Quinol monooxygenase YgiN; tRNA-Ser(gga); crotonase/enoyl-CoA hydratase family protein; rubredoxin RubB; Arsenical pump membrane protein; M24 family metallopeptidase; dTMP kinase; S49 family peptidase | 18 | 23,1% | >35% [24] |
| Unknown Function or Uncharacterized | hypothetical protein (11); Various DUF domain-containing protein (10); AAA ATPase-like protein (2); sce7726 family protein; beta family protein; P63C domain protein; BIG2 domain-containing protein | 27 | 34,6% | – |
| Strain | Strain GC [%] | Fragments GC [%] | 2kb-flanked SSL GC [%] |
|
S. rhizophila MGMM118 |
67,1 | 46,4 | 56,7 |
|
B. halotolerans MGMM119 |
43,9 | 36,2 | 33,3 |
|
P. grimontii MGMM120 |
60,7 | 38 | 48,3 |
|
P. viciae MGMM121 |
60,7 | 41,9 | 51,3 |
| Sample | Cq (SSL) | Cq (rsfS) | ΔCq = SSL – rsfS | Relative expression, % of rsfS |
| MGMM118 | 24,62 | 17,55 | 7,07 | 0,75 |
| MGMM119 | 32,99 | 25,18 | 7,81 | 0,45 |
| MGMM120 | 27,96 | 23,39 | 4,57 | 4,2 |
| MGMM121 | 23,7 | 25,11 | -1,41 | 266 |
| Sample | Cq (mean ± SD) | DNA per reaction (pg, mean ± SD) | DNA copies per reaction mean ± SD) | Calculated CFU/g |
|---|---|---|---|---|
| MGMM118 | 23.07 ± 0.12 | 31.19 ± 2.29 | 6780 ± 498 | 3.9×106 |
| MGMM119 | 25.71 ± 0.17 | 1.44 ± 0.17 | 321 ± 38 | 2.5×105 |
| MGMM120 | 26.86 ± 0.29 | 113.56 ± 22.23 | 16223 ± 3176 | 2.9×106 |
| MGMM121 | 24.82 ± 0.09 | 78.30 ± 4.68 | 11029 ± 659 | 8.0×106 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).