Submitted:
19 May 2025
Posted:
20 May 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Materials and Methods
2.1. Model Description
2.2. Data Sources and Preparation
2.3. Study Design
2.4. Ethical Considerations and Data Availability
2.5. Procedures
2.5.1. Element 1: Evaluation of the System’s Ability to Distinguish and Recall Trained from New Single Data Points
2.5.2. Element 2: Evaluation of the System’s Ability to Distinguish and Recall Trained from New Complex Datasets
2.5.3. Element 3: Evaluation of Human in-the-Loop Component and the Accuracy of Clinical Recommendations
- -
- Were the microbes being treated as pathogens accurately identified?
- -
- Does the antibiotic recommended in OneChoice have activity against the microbe that is presumed to be the pathogen?
- -
- Was the recommended dose accurate?
- -
- Was the recommended duration of treatment accurate?
- -
- Was the preferred therapy the optimal therapy?
- -
- Were there organisms that should have been addressed but were not?
2.6. Data Analysis
3. Results
3.1. Element 1: Evaluation of the System’s Ability to Distinguish and Recall Trained from New Single Data Points
3.2. Element 2: Evaluation of the System’s Ability to Distinguish and Recall Trained from New Complex Data Sets
- -
- Precision: TP / (TP + FP) = 519 / (519 + 0) = 1.0 (100%)
- -
- Recall: TP / (TP + FN) = 519 / (519 + 0) = 1.0 (100%)
3.3. Element 3. Evaluation of the HITL Component in the Accuracy of Clinical Recommendations
4. Discussion
5. Conclusions
Ethical Considerations and Data Availability
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| ML | Machine learning |
| CDSS | Clinical decision support systems |
| HITL | Human involvement in the loop |
| AI | Artificial intelligence |
| ASP | Antimicrobial stewardship program |
| AMR | Antimicrobial resistance |
References
- Jordan, M. I. , & Mitchell, T. M. Machine learning: Trends, perspectives, and prospects. Science 2015, 349, 255–260. [Google Scholar] [PubMed]
- Russell, S. , & Norvig, P. (2020). Artificial Intelligence: A Modern Approach (4th ed.). Pearson.
- Cabitza, F. , Rasoini, R., & Gensini, G. F. Unintended consequences of machine learning in medicine. JAMA 2017, 318, 517–518. [Google Scholar] [PubMed]
- Mehrabi, N. , Morstatter, F., Saxena, N., Lerman, K., & Galstyan, A. A survey on bias and fairness in machine learning. ACM Computing Surveys (CSUR) 2021, 54, 1–35. [Google Scholar]
- Bolton, W. , Wilson, R., Gilchrist, M., Georgiou, P., Holmes, A., & Rawson, T. Personalising intravenous to oral antibiotic switch decision making through fair interpretable machine learning. Nature Communications 2024, 15. [Google Scholar] [CrossRef]
- Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.
- Rudin, C. Stop explaining black box machine learning models for high-stakes decisions and use interpretable models instead. Nature Machine Intelligence 2019, 1, 206–215. [Google Scholar] [CrossRef] [PubMed]
- Wynants, L. , et al. (2020). “Prediction models for diagnosis and prognosis of COVID-19: systematic review and critical appraisal.” BMJ.
- Creswell, J. (2019). Google’s DeepMind faces a reckoning in health care. The New York Times.
- Herper, M. (2018). IBM Watson’s Health Struggles Show How Hard It Is to Use AI to Transform Health Care. Forbes.
- Kelion, L. (2019). DeepMind AI achieves Grandmaster status in Starcraft 2. https://www.bbc.com/news/technology-50212841.
- McKinney, S. M. (2020). International evaluation of an AI system for breast cancer screening. Nature.
- IBM to sell Watson Health assets to Francisco Partners. Healthcare IT News. Published January 21, 2022. Accessed May 16, 2025. https://www.healthcareitnewsibm-sell-watson-health-assets-francisco-partners.
- Shickel B, Tighe PJ, Bihorac A, Rashidi P. Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis. IEEE J Biomed Health Inform. 2018, 22, 1589–1604. [Google Scholar] [CrossRef] [PubMed]
- cO’Neill, J. Tackling drug-resistant infections globally: final report and recommendations. Wellcome Collection; 2016 May 19.
- Sanchez-Martinez S, Camara O, Piella G, et al. Machine learning for clinical decision-making: challenges and opportunities in cardiovascular imaging. Front Cardiovasc Med. 2022, 8, 765693. [Google Scholar] [CrossRef] [PubMed]
- Ramgopal S, Lorenz D, Navanandan N, Cotter JM, Shah SS, Ruddy RM, Ambroggio L, Florin TA. Validation of Prediction Models for Pneumonia Among Children in the Emergency Department. Pediatrics. 2022, 150, e2021055641. [Google Scholar] [CrossRef] [PubMed]
- Van Calster B, Wynants L, Timmerman D, Steyerberg EW, Collins GS. Predictive analytics in healthcare: how can we know it works? J Am Med Inform Assoc. 2019, 26, 1651–1654. [Google Scholar] [CrossRef] [PubMed]
- Molinaro, A. M. , Simon, R., & Pfeiffer, R. M. Prediction error estimation: a comparison of resampling methods. Bioinformatics 2005, 21, 3301–3307. [Google Scholar] [PubMed]
- Murphy, K. P. (2012). Machine Learning: A Probabilistic Perspective. MIT Press.
- Yao, Y. , Rosasco, L., & Caponnetto, A. On early stopping in gradient descent learning. Constructive Approximation 2007, 26, 289–315. [Google Scholar]
- Hastie, T. , Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd ed.). Springer.
- James, G. , Witten, D., Hastie, T., & Tibshirani, R. (2021). An Introduction to Statistical Learning: with Applications in R. Springer.Second Edition.
- Fawcett, T. An introduction to ROC analysis. Pattern Recognition Letters 2006, 27, 861–874. [Google Scholar] [CrossRef]
- Snoek, J. , Larochelle, H., & Adams, R. P. Practical Bayesian optimization of machine learning algorithms. Advances in Neural Information Processing Systems 2012, 2951–2959. [Google Scholar]
- Ian Goodfellow, Yoshua Bengio, and Aaron Courville: Deep learning. The MIT Press, 2016, 800 pp, ISBN, 2016.
- Kohavi, R. , & Provost, F. Glossary of terms. Machine Learning 1998, 30, 271–274. [Google Scholar]
- Raschka, S. , & Mirjalili, V. (2019). Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow (3rd ed.). Packt Publishing.
- David Vázquez-Lema, Eduardo Mosqueira-Rey, Elena Hernández-Pereira, Carlos Fernandez-Lozano, Fernando Seara-Romera, and Jorge Pombo-Otero. 2024. Segmentation, classification and interpretation of breast cancer medical images using human-in-the-loop machine learning. Neural Comput. Appl. 2025, 37, 3023–3045. [Google Scholar] [CrossRef]
- Yuan H, Kang L, Li Y, Fan Z. Human-in-the-loop machine learning for healthcare: current progress and future opportunities in electronic health records. Med Adv. 2024, 2, 318–322. [Google Scholar] [CrossRef]
- Collins G S, Dhiman P, Ma J, Schlussel M M, Archer L, Van Calster B et al. Evaluation of clinical prediction models (part 1): from development to external validation. BMJ 2024, 384, e074819. [CrossRef]


| Panel Name | Panel Description | |
|---|---|---|
| Panel 1 | Respiratory | Identifies pathogens responsible for respiratory illnesses from sputum or nasopharyngeal samples |
| Panel 2 | Blood | Identifies pathogens and antimicrobial resistance genes from blood samples |
| Panel 3 | Gastrointestinal (GI) | Identifies causes of GI infections from GI specimens |
| Panel 4 | Meningitis | Identifies pathogens that cause meningitis from cerebral spinal fluid (CSF) samples |
| Panel 5 | Pneumonia | Identifies pathogens responsible for pneumonia from sputum, nasopharyngeal samples, or other respiratory samples |
| Panel 6 | Joint | Identifies pathogens that may cause joint infections from synovial fluid samples |
|
N data points from Biofire’s six different panels N unique data points noted after redundancies across the panels Brackets were placed around data points to ensure no overfitting, and they are new to the system. Training Session 1: All unique data points were entered as a single set of data. System performance: The System identified all data as new. Training Session 2: The data points, then divided into respective panels as per the Biofire manufacturer (respiratory, blood, CNS, joint, etc). System performance: The System identified all data as new within these panels. Data was then divided into randomized groups or K-folds. Training session 3: Fold 1 was trained and tested against the data in the remaining untrained folds. System performance: Only the data that was form 1 was trained was noted to be trained while the other data remained untrained. Training sessions 4 - 7: Training was repeated for 2, 3, 4, and 5 in separate instances. System performance: Only the data that was from a fold that was trained was noted to be trained, while the other data remained untrained. Training session 8: Random untrained data was placed within the 5 previously trained folds and tested. System performance: Only the data that was previously trained in the other sessions was noted to be trained. The remaining data was then trained as well. Training session 9: All data was entered into the system collectively again, as one set. System performance: The system noted that all data points were trained. |
| Data Set | # Variables | Description of variables |
|---|---|---|
| K fold-1 | 21 | Staphylococcus aureus, Clostridium perfringens, Cryptosporidium, Varicella zoster virus (VZV), Cryptococcus (C. neoformans/C. gattii), Shigella/Enteroinvasive E. coli (EIEC), Neisseria gonorrhoeae, Vibrio (V. parahaemolyticus / V. vulnificus / V. cholerae), Human metapneumovirus, Klebsiella oxytoca, Enterococcus faecalis, Parainfluenza virus 1, Candida parapsilosis, Klebsiella aerogenes, Enterobacter cloacae complex, Haemophilus influenzae, Adenovirus F40/41, Coronavirus 229E, IMP, mcr-1. |
| K fold-2 | 19 | Influenza A virus A/H3, Clostridioides (Clostridium) difficile (toxin A/B), Streptococcus agalactiae, Adenovirus, Bordetella pertussis, Candida krusei, Herpes simplex virus 2 (HSV-2), Serratia marcescens, Cytomegalovirus (CMV), Parainfluenza virus 2, Moraxella catarrhalis, Staphylococcus lugdunensis, Human herpesvirus 6 (HHV-6), Bacteroides fragilis, Campylobacter (C. jejuni / C. coli / C. upsaliensis), Candida albicans, Enteroaggregative E. coli (EAEC), Coronavirus OC43, OXA-48-like |
| K fold-3 | 18 | Human rhinovirus/enterovirus, Vibrio cholerae, Mycoplasma pneumoniae, Influenza B virus, Legionella pneumophila, Chlamydia pneumoniae, Candida tropicalis, KPC, Plesiomonas shigelloides, Shiga-like toxin-producing E. coli (STEC) stx1/stx2, Enteropathogenic E. coli (EPEC), Cyclospora cayetanensis, Enterobacterales, Anaerococcus prevotii/vaginalis, Cutibacterium avidum/granulosum, Parainfluenza virus 3, VIM, NDM |
| k-fold 4 | 12 | Streptococcus pyogenes, Enterococcus faecium, Influenza A virus A/H1, Rotavirus A, Staphylococcus epidermidis, Human parechovirus (HPeV), Klebsiella pneumoniae group, Neisseria meningitidis, Candida auris, Bordetella parapertussis, Peptostreptococcus anaerobius, Coronavirus NL63. |
| K fold-5 | 14 | Norovirus GI/GII, Candida glabrata, Escherichia coli, Peptoniphilus, Acinetobacter calcoaceticus-baumannii complex, Streptococcus spp., Pseudomonas aeruginosa, Escherichia coli K1, Herpes simplex virus 1 (HSV-1), E. coli O157, Parainfluenza virus 4, Streptococcus pneumoniae, Coronavirus HKU1, Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). |
| Hold-out | 27 | Influenza A virus A/H1-2009, Influenza A virus, Proteus spp., Stenotrophomonas maltophilia, Respiratory syncytial virus, Listeria monocytogenes, Staphylococcus spp., Astrovirus, Sapovirus (I, II, IV, and V), Enterotoxigenic E. coli (ETEC) lt/st, Entamoeba histolytica, Giardia lamblia, Yersinia enterocolitica, Enterovirus (EV), Coronavirus, Citrobacter, Kingella kingae, Morganella morganii, Candida spp., Finegoldia magna, Parvimonas micra, CTX-M, mecA/C, vanA/B, ESBL, Klebsiella pneumonia group, Salmonella. |
|
Auto-approve Fully trained data points and data sets |
Auto-match Partially trained data sets with completed trained data points |
High confidence Untrained data sets with trained data points |
New Untrained data sets and untrained data points |
|
|---|---|---|---|---|
| Required training session | Completed full training of all data points and data sets (at least two data set training sessions) | One data set training session was completed, however at least one more training session is required | ≥ 90 percent like previously trained data sets where data points are trained completely |
Data sets and points require full training |
| Folds | True positives (identified trained data) |
True negative (Identified new data) |
False positive (Identified new data as trained data) |
False negatives(Identified trained data as new data) |
|---|---|---|---|---|
| Fold 1 | 21 | 21 | 0 | 0 |
| Fold 2 | 19 | 19 | 0 | 0 |
| Fold 3 | 18 | 18 | 0 | 0 |
| Fold 4 | 12 | 12 | 0 | 0 |
| Fold 5 | 14 | 14 | 0 | 0 |
| Hold-out | 27 | 27 | 0 | 0 |
| Total | 111 | 111 | 0 | 0 |
| Metric | Formula | Result |
|---|---|---|
| Precision | TP / TP + FP | 111 / (111 + 0) = 1.00 (100%) |
| Recall (Sensitivity) | TP / TP + FN | 111 / (111 + 0) = 1.00 (100%) |
| F1 score | 2 x precision x Recall / Precision + Recall) |
2 x 1 x 1 / (1 + 1) = 1.00 (100%) |
| Positive Predictive Value | TP / (TP + FP) | 111 / (111 + 0) = 1.00 (100%) |
| Negative Predictive Value | TN / (TN + FN) | 111 / (111 + 0) = 1.00 (100%) |
| Classification Outcome | Count | Description |
|---|---|---|
| True Positives (TP) | 519 | Fully trained reports correctly identified as trained |
| True Negatives (TN) | 238 | Negative reports correctly identified as trained |
| False Positives (FP) | 0 | No untrained reports were incorrectly identified as trained |
| False Negatives (FN) | 0 | No trained reports were misclassified as untrained |
| Untrained (correctly flagged) | 644 | Reports requiring training correctly identified as untrained |
| Total reports | 1401 |
| n | % | N of Training | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Variable | 1 | 2 | 3 | 4 | 5 | 6 | 7 | ||
| Total | 1401 | ||||||||
| Negative | 238 | ||||||||
| Positive | 1163 | ||||||||
| Complete trained data | 519 | 44.63 | |||||||
| Trained a single time but required additional training | 233 | 20.03 | 164 | 61 | 7 | 1 | |||
| Partially untrained data | 267 | 22.96 | 186 | 41 | 19 | 5 | 5 | ||
| Completely untrained data | 97 | 8.34 | 97 | 63 | 15 | 12 | 4 | 2 | |
| Discrepancy | Frequency | % |
|---|---|---|
| Mayor discrepancy | ||
| NO discrepancy | 644 | 100 |
| A known pathogen has NOT been addressed | 0 | 0 |
| The recommended antibiotic has NO activity against the microbe detected | 0 | 0 |
| Minor discrepancy | ||
| NO discrepancy | 544 | 84.5 |
| An alternative to OneChoice could have been recommended | 11 | 1.7 |
| Among the alternatives recommendations another antibiotic or a combination of antibiotics could have been recommended | 35 | 5.4 |
| Dosing and length of therapy are not consistent with the FDA guidelines or other literature | 34 | 5.3 |
| Microbes that should have been targeted were NOT addressed | 20 | 3.1 |
| Total | 100 | 15.5 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).