Investigating the unfavored factors that interfere MALDI-TOF based AI in predicting antibiotic resistance

: Combining Matrix-Assisted Laser Desorption/Ionization Time-of-Flight (MALDI-TOF) spectra data and artificial intelligence (AI) has been introduced for rapid prediction on antibiotic susceptibility test (AST) of S. aureus . Based on the AI predictive probability, the cases with probabilities between low and high cut- offs are defined as “grey zone”. We aimed to investigate the underlying reasons of unconfident (grey zone) or wrong predictive AST. A total 479 S. aureus isolates were collected, analyzed by MALDI-TOF, and AST prediction, standard AST were obtained in a tertiary medical center. The predictions were categorized into the correct prediction group, wrong prediction group, and grey zone group. We analyzed the association between the predictive results and the demographic data, spectral data, and strain types. For MRSA, larger cefoxitin zone size was found in the wrong prediction group. MLST of the MRSA isolates in the grey zone group revealed that uncommon strain types composed 80%. Amid MSSA isolates in the grey zone group, the majority (60%) was composed of over 10 different strain types. In predicting AST based on MALDI-TOF AI, uncommon strains and high diversity would contribute to suboptimal predictive performance.


Introduction
Methicillin-resistant Staphylococcus aureus (MRSA) is causing major public health problem with resistance to commonly used antibiotics, varying epidemiology of infection and increased morbidity and mortality [1][2][3]. Rapid and correct administration of antibiotics, such as vancomycin, teicoplanin or linezolid is the key of successful treatment [4]. Antibiotic susceptibility testing (AST) is a gold standard on guiding the administration of these anti-infective agents [1,5]. However, this culture-based method can cause considerable delay in prescribing effective antimicrobial treatment because it takes additional 3 to 4 days after specimen collection to have susceptibility reporting [1,6]. A rapid assessment of antibiotics resistance can optimize antimicrobial treatment, reducing unnecessary antibiotic use, and avoiding development of antibiotics resistance. A novel method to accelerate antimicrobial susceptibility testing was developed and validated in our previous studies [7,8], with the combination of large-scale Matrix-Assisted Laser Desorption/Ionization Time-of-Flight (MALDI-TOF) mass spectra data and artificial intelligence. In the studies, Wang et al. collected around 5000 MS spectra of unique S. aureus isolates and identified 200 peaks on the MS spectra, which represent remarkable difference between MRSA and MSSA. These peaks serve as the marker features for the construction of AST predictive model. Random forest was used as the machine learning classification algorithm for its outstanding prediction performance in independent test, with the area under the receiver operating characteristic curve (AUC) at 0.845. Based on the AI model and mass spectrum, only few minutes are needed to obtain preliminary AST of oxacillin. The study demonstrated that incorporating AI method with a large-scale dataset of clinical MS spectra would recognize antibiotics resistant bacteria strains in much shorter time and lead to a more favorable clinical outcome.
Correct prediction leads to immediate and appropriate antimicrobial treatment while incorrect preliminary result may misguide and delay the antibiotics administration. In order to accurately guide clinical decision, lowering the wrong prediction rate is necessary. Preliminary AST was determined by the prediction probability calculated by the AI model. In the design of the AI model, the predictive probability ranges from 0 to 1. The cut-off was set as 0.48, whereas isolate with probability lower than 0.48 is predicted as MSSA, and MRSA is predicted when probability is higher than 0.48. In deployment of the AI model [9], we found that wrong predictions frequently happened in the probability range between 0.40 to 0.48. (See supplement table 1.) The predictions with probability in this range were not released to clinical settings. The range was defined grey zone (See supplement figure 1.) and probability of 0.40 was set as the low cut-off whereas 0.48 was the high cut-off.
Grey zone is a very common technique that is used to improve the test accuracy in many clinical laboratory tests [10][11][12]. However, grey zone mitigates the benefit brought by the MALDI-TOF based AI model because predictive AST is not provided for the cases of grey zone group. Thus, in the study we aimed to investigate the possible factors that are associated with grey zone prediction and wrong prediction. The investigation would help us find the blind spot of the MALDI-TOF based AI model.

Scheme of the study
The object of the study aimed to identify and further analyze S. aureus isolates of grey zone group and wrong prediction group. The schematic illustration of this study is demonstrated in Figure 1, comprised of three steps, (1) samples collections and MALDI-TOF spectra measurement, (2) AST prediction with AI model and final AST report, (3) analysis of grey zone and wrong prediction samples. First, samples were collected in the clinical microbiology laboratory of Linkou Chang Gung Memorial Hospital. The cultured bacterial samples were analyzed by MALDI-TOF MS for identification of bacterial species. Preliminary AST was predicted by the AI model after inputting preprocessed MS spectra. Predictive probability ranged between 0.40 to 0.48 was defined as "grey zone" [10][11][12] and not be applied to clinical usage. Specimen types, MALDI-TOF MS spectra, and phenotypic susceptibility test reports were collected. MLST were also identified for further investigation on the unconfident AST predictions (i.e. grey zone). The cases with wrong AST prediction were also analyzed with the same method as grey zone group. The antibiotic susceptibility test (AST) is predicted by analyzing MALDI-TOF spectra with artificial intelligence (AI) model. Based on the predictive probability that generated by the AI model, cases with probability less than 0.4 are predicted as susceptible, whereas cases whose probability larger than 0.48 are predicted as resistant. In contrast, the cases whose probabilities lie between 0.4-0.48 are defined as "grey zone". In addition, cases whose predictive ASTs are different from final ASTs are categorized as "wrong prediction", meanwhile whose prediction matches final AST are "correct prediction". We collect "grey zone" and "wrong prediction" cases for further analysis.

Samples and mass spectra collection
The study was approved by the Institutional Review Board of the Chang Gung Medical Foundation (No. 202000694B1). Clinical specimens were collected at Linkou branch of Chang Gung Memorial Hospital (CGMH) from August to October, 2020 and were sent to CGMH clinical microbiology laboratory. The specimen types included wound, respiratory tract (i.e sputum, nasopharyngeal swab, bronchoalveolar lavage), blood, tissue, urinary tract, sterile body fluid (i.e ascites, pleural effusion, synovial fluid, dialysates, cerebrospinal fluid) and others. Cultures were obtained by routine method in CGMH clinical microbiology laboratory [7,8]. Single colonies on agar plates were chosen for bacterial species identification. S. aureus was identified according to colony morphology, coagulase test and MALDI-TOF MS (Bruker Daltonics GmnH, Bremen, Germany) [13]. Once the MS spectra were generated and were identified as Staphylococcus aureus, they underwent preprocessing and feature extraction [8] as the preparation for inputting to the AST prediction model.

Preliminary AST with AI model and traditional AST
We applied the AST predictive models that we have developed and validated in previous study [7,8]. After applying the preprocessed S. aureus MS spectra to the AI model, preliminary AST would be predicted within one minute. The prediction results were presented with peak number, probability, and the preliminary AST. The peaks in mass spectra can be representative of ribosomal proteins that were specific to species, and can serves as biomarkers on species identification [14,15]. Peak number represents the quality of input MS spectra. Predictive probability served as the basis of classification as previously mentioned. For those with probability range from 0.40-0.48, grey zone was assigned but would not be reported for clinical usage. If the probability >0.48, the sample would be classified as MRSA; if < 0.4, it would be predicted as MSSA. Traditional ASTs such as cefoxitin paper disc method and broth microdilution method were performed to determine the susceptibility of S. aureus to oxacillin. Broth microdilution method was performed on specimens from blood, while cefoxitin paper disc method were conducted on other types of specimens. The interpretation of AST was based on the Clinical & Laboratory Standards Institute (CLSI) guidelines. Both methods are the standard CLSI endorsed methods for determining the susceptibility of S. aureus to oxacillin.

Grey zone and wrong prediction cases analysis
For further understanding of cases in grey zone group and wrong prediction group, the demographic information, MALDI-TOF MS spectra, predictive results, and traditional AST reports were reviewed. Regarding traditional ASTs, we recorded minimal inhibitory concentration of oxacillin or diffuse zone diameter of cefoxitin paper disc. For molecular characterization, we used multi-locus sequence typing (MLST) for strain typing of the S. aureus isolates [16]. Sequence type was assigned based on the sequence allelic profiles at the seven loci, via the MLST database [17].

Statistical analysis
Continuous variables were expressed as the means and standard deviations, categorical variables were documented as numbers and percentages, and nonparametric dependent variables were noted as medians and interquartile range. Student's t test was used for continuous variables, and chi-square was used for categorical variables. ANOVA analysis was used for mean comparison between more than two groups means, and post hoc test (Scheffe) would be performed if there is statistically significant result. Kruskal-Wallis test was used for nonparametric dependent variables with more than two groups, and Dunn's multiple comparison test was used for a significant Kruskal-Wallis test. The p Values were calculated and documented as two-sided, and null hypothesis would be rejected if the p value was smaller than or equal to 0.05. All analyses were performed with SPSS version 28 (Statistical Product and Service Solutions).

Result
In the study, we aimed to investigate the factors that associated with unconfident prediction (i.e. grey zone) or wrong prediction. First, we tested the association between the demographic information and the predictive results. In Table 1, we demonstrated the age of infected patients and specimen types of total 479 cases. Among 479 collected samples, 401 cases were in the correct prediction group, 56 cases were in the grey zone group and 22 cases were in the wrong prediction group. There was no significant difference between the groups regarding age (p = 0.340). In each group, adult was the majority (58.4% vs. 60.7% vs. 59.1%). Pus was the major specimen type in three groups (46.4-68.1%). The correct prediction group had significantly more respiratory tract specimens, while the grey zone group contained significantly more sterile body fluid specimen than others (p = 0.006). For MSSA isolates, the correct prediction group had significantly more respiratory tract specimen, and the grey zone group had significantly more blood and sterile body fluid samples (p = 0.008). In the other hand, for MRSA, there was no age or specimen type difference between the groups. 0.008* Second, initial quality of mass spectrum in different groups of prediction was examined. Figure 2 shows the peak numbers of the MALDI-TOF mass spectra in different groups. The peak numbers of spectra in different groups were consistently around 120. No difference on peak numbers was detected between correct prediction group, grey zone group, and wrong prediction group (see supplement Table 2.). The results indicated that the quality of MALDI-TOF mass spectra was comparable between the different groups. MSSA with correct prediction, the grey zone, MRSA with correct prediction, MSSA with wrong prediction, and MRSA with wrong prediction) shows no significant difference, indicating comparable quality of spectra between groups. . Third, we examined the AST results of oxacillin in different groups. Table 2 presented the mean zone size of cefoxitin disc diffusion test for all the S. aureus isolates. The mean zone size for MRSA in the correct prediction group was 10.78±3.49 cm; for the grey zone group it was 11.4±3.95 cm; for the wrong prediction group, it was 14.33±3.01 cm. ANOVA analysis was performed and showed significant difference of the zone size (p-value = 0.004). Post Hoc test (Scheffe) showed the MRSA isolates in the wrong prediction group had significant larger zone diameter than the MRSA isolates in the correct prediction group. For MSSA, no significant difference was noted between the three groups. Table 3 demonstrated MIC of oxacillin by using broth microdilution method. There was no significant MIC difference between groups. Table 2. Zone size of cefoxitin for MRSA and MSSA. MRSA with wrong AST prediction (predicted as MSSA) had significantly larger zone diameter than correct prediction and grey zone groups, indicating that the phenotype of MRSA with wrong AST prediction is more MSSA-like.

MRSA Correct prediction (n=190) Grey zone (n=10) Wrong prediction (n=6) p-value
Zone size (cm) 10  Fourth, we analyzed the strain types (MLST) for the S. aureus isolates of the grey zone group ( Table 4). The isolate number of MRSA (n=10) were much less than that of MSSA (n=46). For MSSA isolates in the grey zone group (n=46), much more strain types were identified. In total, 15 different types were identified. MSSA ST15 accounted for the highest percentage (41%) in the group. The rest 59% MSSA in the grey zone group was composed of 14 different types. Amid the group, 2 MSSA isolates could not be typed. For MRSA isolates in the grey zone group (n=10), six types were identified. MRSA ST1232 accounted for the highest percentage (40%) in the group, followed by ST59 (20%). Only one isolate was identified for ST6954, ST 239, ST30, and ST1. The compositions of MRSA and MSSA in the grey zone group were also compared with the molecular epidemiology published in the previous studies [18][19][20][21][22][23]. Top 5 types of MSSA except ST1 (i.e. ST15, ST188, ST7, ST97) in the grey zone group were also reported in the previous studies ( Figure 3) [18,19]. By contrast, for the MRSA isolates in the grey zone group, ST1232 and ST6954 were not reported as the major circulating strain type in the previous studies ( Figure 4) [21][22][23].  [19], and the grey zone group in the study. The previous studies [18,19] showed that the common sequence types of MSSA infection in Taiwan were ST188, ST15, ST7, ST97, but still around 30%-66% MSSA isolates were other ST types. Similarly, in the grey zone group of this study, MSSA strain types also show high diversity where 31% MSSA isolates are not characterized in top 5 common types (ST15, ST188, ST1, ST7, and ST97). Literature review of S. aureus epidemiology investigation in Taiwan were showed in supplement Table 3.  [21], Peng et al [22], Wang et al [23], and the grey zone group in the study. The previous studies [21][22][23] showed that the common sequence types of MRSA infection in Taiwan were ST59 and ST239, followed by ST45 and ST5. In the study, the MRSA isolates in the grey zone group are composed of uncommon sequence types like ST1232, ST6954 and other types (accounting for 80%). Literature review of S. aureus epidemiology investigation in Taiwan were showed in supplement, Table 3.

Discussion
Early detection of MRSA through combining AI and clinical MALDI-TOF mass spectra can contribute to immediate and appropriate antimicrobial treatment. Although good predictive performance has been validated, wrong predictions are still noted. Moreover, grey zone method in which unconfident predictions are eliminated is used to improve the predictive performance of the MALDI-TOF based AI model. The predictive performance can be improved significantly but the clinical benefit from rapid prediction by AI are still limited for cases in grey zone. We investigated the possible factors that would contribute to uncertain predictions (i.e. grey zone) and wrong predictions. The results disclosed that high diversity of MSSA types would be the reason of uncertain predictions. By contrast, the MRSA isolates in the grey zone group were identified as uncommon strain types. In brief, we found that uncommon cases and high diversity would contribute to suboptimal predictive performance of MALDI-TOF based AI.
Specimen type may have impact on predictive performance of AI models. Certain types of specimens tend to be infected by specific strain types of microorganism. Taking vancomycin resistant Enterococcus faecium as an example, only a few strain types cause invasive infections in blood or sterile body fluids [24]. Consequently, over-concentration of specific specimens would be associated with only a few specific strain types. When the number of class is low would simplify the classification problem for AI model, and so that theoretically will elevate the predictive performance. We examined the specimen types for the three groups (i.e. correct prediction group, grey zone group, and wrong prediction group). The results disclosed that MSSA isolated from respiratory tract specimens tended to have correct AST prediction for oxacillin (Table 1). By contrast, MSSA isolated from blood or sterile body fluid was associated with higher chance to have grey zone prediction (Table 1). MSSA isolates in the grey zone group had more diverse sequence types than MRSA (Table 4). Other studies also revealed more heterogeneous MSSA lineages and wide genotypic diversity [18,19]. When the diversity of MSSA lineage would be the reason of the uncertainty on AST prediction, we hypothesize that MSSA in respiratory tract were more homogeneous than MSSA from another specimen type. In the study, we only investigated the factors associated with grey zone prediction and wrong prediction. The heterogeneity of S. aureus in different specimen type has not been well established. For the correct predictions from respiratory tract specimens, it would be of interesting to investigate the underlying reasons.
We also examined the association between quality of mass spectra and the predictive results. The peaks in mass spectra can be representative of ribosomal proteins that were specific to species, and can serves as biomarkers on species identification [14,15]. Rich peak content provides adequate information for highly efficient species identification [25]. Peak numbers were analyzed to evaluate if the AST prediction is affected by the peak content of MS spectra. The peak detection of MS spectra would be affected by many steps during MS spectra preparation, including sample collection, cultivation, subculture, incubation, colony selection, plate smearing and even the condition of Microflex LT mass spectrometer [8]. There was no significant different of peak numbers between the grey zone group, the wrong prediction, and the correct prediction groups (Figure 2). The peak content of the mass spectra was comparable between groups. Thus, we disclosed that quality of mass spectrum would not be the factor that associated with the uncertainty or wrong prediction of AST. The nature of drug resistance level would be another factor that leads to uncertainty or wrong prediction of AST. For the isolates with very high or very low antibiotic MIC, AI model would perform well for these kinds of definite cases. By contrast, when the antibiotic MIC is close to the cut-off that discriminate resistant and susceptible results, the cases would be difficult for AI model to have good performance of prediction. The zone size of the MRSA in the wrong prediction group had significant larger zone diameter than the MRSA in correct prediction group (Table 2). Despite classified as MRSA, the MRSA isolates in the wrong prediction group showed more MSSA mimic phenotype in terms of Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 10 January 2022 zone size of oxacillin. Resistance to a specific antimicrobial may require a complex mechanism rather than depending on the expression of a single gene. Different mechanism may contribute to diverse phenotype. Study had discovered the inconsistencies in antibioticresistance phenotypes and genotypes [26]. Some strains carry a drug resistance gene that were susceptible to the corresponding antibiotic, while some have drug resistance genes that were not expressed. This phenomenon may interfere our predictive model to provide correct preliminary AST. In the study, the MRSA in wrong prediction group with larger zone diameter may carry drug resistance gene that were sub-optimally expressed. The low expression proteins or peptides may be undetectable with standard protocol of MALDI-TOF mass. Thus, the mass spectrum of these MRSA cases may mislead the ML model to give the wrong preliminary AST.
Composition of strain types could be the predominant factor for the grey zone predictions based on our results. In grey zone, the majority was MSSA (82.1%) ( Table 4), indicating that a confident prediction for MSSA is more difficult. As previously mentioned, studies have shown that MSSA had more complicated genotypic diversity [18][19][20]. According to epidemiology investigation in Taiwan [18,19], shown in Figure 3, the predominant clones of MSSA infection were ST188, ST15, ST7, ST97, but other one-third of MSSA samples composed of other various types. The molecular characteristics of MSSA in grey zone showed comparable result where ST15 and ST188 were as the major strain types, the rest one third were composed of many other types ( Figure 3). This heterogeneity of MSSA lineage would also exist in the training dataset for the machine learning model. AI model would have suboptimal learning for the diverse strain types because only a small number share for the minor strain types. Subsequently, the unconfident predictions (i.e. grey zone group) could be resulted from the suboptimal learning. By contrast, there were only 10 MRSA isolates (17.9%) ( Table 4) in the grey zone group. According to epidemiology investigation in Taiwan, the predominant clones of MRSA infection were ST59 and ST239, followed by ST45 and ST5 [20][21][22]. Emergence of MRSA ST8(USA300) also gained much attention. With increasing prevalence since 2010, ST8 has become one of the major clones of MRSA infection in Taiwan [27]. The composition of predominant clones for MRSA is much simpler than that of MSSA. In the study, however, the MRSA isolates in the grey zone group showed different sequence type combination. Uncommon types like ST1232, ST6954 and ST1 accounts for 80% of MRSA in the grey zone group (Figure 4). ST1232 is a single-locus variant of ST398 [28]. Both ST1232 and ST398 were clusters of CC398 MRSA. ST1232 MRSA strain was related to South-East Asia traveling, and ST398 was similar to European livestock-associated MRSA (LA-MRSA) [29]. Local transmission of CC398 MRSA strain was still rare in Taiwan, and most of the cases were possibly live-stock related [30]. According to previous study, the predominant LA-MRSA clone in Taiwan is ST9 [31], also indicating the paucity of CC398 strain in Taiwan. These uncommon sequence types of MRSA were less understood during model training, resulted in higher difficulty and uncertainty for machine learning model to assign preliminary AST.

Conclusion
Molecular characteristic is the key contributing factor in unfavored AST prediction by MALDI-TOF AI model. Uncommon sequence type of MRSA has higher chance to have wrong preliminary AST. Genotypic diversity of MSSA is the main cause of inferior prediction performance in the grey zone.
Author Contributions: Dr. Wang had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. Concept and design: Wang. Acquisition, analysis, or interpretation of data: Wang, Liu, Tseng, Chung. Drafting of the manuscript: Wang and Liu. Critical revision of the manuscript for important intellectual content: Wang, Tseng, Lin, Yu. Statistical analysis: Wang and Liu. Obtained funding: Wang and Lu. Administrative, technical, or material support: Wang, Liu, Tseng, Chung. Supervision: YC Huang and JJ Lu contributed equally to this work.