Implementation of MALDI-TOF Mass Spectrometry and Peak Analysis: application to the discrimination of Cryptococcus species and their interspecies hybrids

MALDI-TOF (Matrix-Assisted Laser Desorption/Ionization Time-of Flight) is a type of mass spectrometry (MS) that has been widely implemented for the rapid identification of microorganisms over the last decade. The accuracy and flexibility of this method has encouraged researchers to implement the analysis of protein spectra obtained by MALDI-TOF for the discrimination of close-related species and bacterial typing. In this study, a standardized methodology based on the detection of species-specific protein peaks from the spectra obtained with MALDI-TOF is described. The methodology was applied to a collection of Cryptococcus spp. (n=70) previously characterized by Amplified Fragment Length Polymorphism (AFLP) and sequencing of the ITS1-5.8S-ITS2 region. An expanded ad-hoc database was also built for their discrimination with MALDI-TOF. This approach did not allow the discrimination of the interspecies hybrids. However, the performance of peak analysis with the application of the PLS-DA and SVM algorithms in a two-step analysis allowed 96.95% and 96.55% correct discrimination of C. neoformans from the interspecies hybrids, respectively. Besides, PCA analysis prior to SVM provided 98.45% correct discrimination of the 3 analyzed species in a one-step analysis. The method is cost-efficient, rapid and user-friendly. The procedure can also be automatized for an optimized implementation in the laboratory routine.


Introduction
The genus Cryptococcus has classically comprised two sibling species with great importance from the clinical point of view: Cryptococcus neoformans and C. gattii, the causative agents of cryptococcosis. Whilst C. neoformans complex has been associated with meningitis in immunosuppressed patients, C. gatti has been shown to cause disease in both immune competent and immunocompromised population [1,2]. Species differentiation is important in order to establish the epidemiology, virulence and susceptibility pattern to the commonly used antifungal drugs [3][4][5][6]. Traditionally, species assignment has been achieved by morphology analysis of the colonies grown on specific culture media and serological tests [7]. The availability of DNA-based methodologies as restriction fragment length polymorphism (RFLP) analysis [8], amplified fragment length polymorphism (AFLP) analysis [9], multilocus microsatellite typing -MLMT- [10], and multilocus sequence typing -MLST- [11] has allowed the identification of Cryptococcus species and molecular types in the last years [12,13]. Genotyping methods have identified the following major molecular types: AFLP1/VNI, AFLP1A, AFLP1B/VNII for C. neoformans; AFLP2/VNIV for C. deneoformans, AFLP3/VNIII for the interspecies hybrid C. neoformans neoformans x C. deneoformans; and AFLP4/VGI, AFLP5/VGIII, AFLP6/VGII, AFLP7/VGIV and AFLP10/VGIV, VGII for C. gattii complex [14,15]. Molecular techniques have shown to be accurate and robust although the whole procedure is cumbersome, time consuming, and delays the final identification. Although genomic analysis is currently the gold standard for Cryptococcus spp. identification, its high requirements in hands-on time and expertise has led to the evaluation of alternative tools.
Matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS) has emerged as a promising technology for the rapid and reliable identification of yeasts [16][17][18]. Isolates belonging to the Candida genus have been shown to be easily identified at the species level either from single colonies or directly from clinical samples using MALDI-TOF MS [19]. However, non-Candida yeasts still represent a challenge for this technology, especially when trying to identify genera poorly represented or even lacking in the commercial databases [20]. In this case, expanded in-house databases containing protein spectra from the underrepresented species and genera have shown to overcome this drawback [16]. Although this approach has worked before for the discrimination between C. neoformans and C. gatti complexes [21,22], the available information about MALDI-TOF discrimination within the C. neoformans complex is still limited [23].
In this study, MALDI-TOF has been applied for the discrimination between C. deneoformans, C. neoformans and the interspecies hybrids. For this purpose, a "classical" approach was applied: a database was built using well-characterized isolates for the identification of the Cryptococcus spp. isolates using the Biotyper system developed by Bruker Daltonics (Bremen, Germany). Besides, the protein spectra from these isolates were processed and classified using different algorithms in order to find species-specific peaks that allowed their differentiation.

Molecular identification
To ensure the purity, the isolates were grown on Columbia agar supplemented with 5% of sheep blood plates and incubated at 35ºC for 24 h. All isolates were previously identified by DNA sequencing analysis of the ITS1-5.8S-ITS2 region [24] and AFLP analysis [25]. Molecular identifications were considered as the reference in our study.

Database construction
Twenty-six Cryptococcus isolates -C. neoformans (n=12), interspecies hybrids (n=10) and C. deneoformans (n=4) -were processed according to the manufacturer's instructions and added to the in-house database (HGM library) as individual Main Spectra (MSPs).
The procedure for adding new entries to an in-house library has already been described [26]. Briefly, the instrument was calibrated before spectra acquisition using freshly prepared BTS; Cryptococcus isolates were processed as explained below and then spotted onto eight positions in the MALDI target plate and each position was read three times. Twenty-four protein spectra were thus achieved, 20 of which had to be identical in order to be accepted by the software (Biotyper, Bruker Daltonics, Bremen, Germany) as a MSP and added to the extended library.

MALDI-TOF identification
Forty-four Cryptococcus spp. isolates were blindly analysed using an LT Microflex benchtop MALDI-TOF mass spectrometer (Bruker Daltonics) for spectra acquisition, using default settings. For the identification of the protein spectra, the updated BDAL database containing 8223 MSPs (Bruker Daltonics) was applied. This database contains 12 reference MSPs from C. neoformans and 7 from C. deneoformans. Besides, the expanded in-house HGM library developed in this study was used in combination with the commercial database.
The sample processing method applied consisted of a mechanical disruption step followed by a standard protein extraction. Briefly, a few colonies were picked, re-suspended in 300 l water HPLC-grade (High-Pressure Liquid Chromatography) and 900l ethanol, and submitted to 5 min vortexing. After a brief spin, the supernatant was discarded and the pellet allowed drying completely at RT. Standard protein extraction with formic acid and acetonitrile was performed and 1l of the supernatant was spotted onto the MALDI target plate in duplicates. Once the spots were dry, they were covered with 1l HCCA matrix (Bruker Daltonics), prepared following the manufacturer's instructions ( Figure 1). The identifications provided by MALDI-TOF MS were compared at the species level with those provided by AFLP analysis regardless of their score value (Table 1). Besides, score values ≥2.0 were considered as "high-confidence" scores and those ≥1.7 as "low-confidence" ones. Score values below 1.6 were only considered when consistent over the four top identifications, otherwise they were considered as "not reliable".

Peak Analysis
For the classification of the three Cryptococcus species, their protein spectra were processed using Clover MS Data Analysis software (Clover Biosoft, Granada, Spain) with the parameters shown in Table S1 in order to achieve a peak matrix with a representative mass list in the range 2400m/z to 12000m/z. Furthermore, spectra alignment was performed. First, the replicates from the same isolate were aligned in order to get an average spectrum. Finally, all average spectra were aligned together.
The rate of presence for the biomarker peaks was calculated for each species and then compared among species. Receiver Operating Characteristic (ROC) curve with Area under the Curve -AUChigher than 0.99 were used as quality indicators to measure the sensibility and specificity of a selected biomarker.
Once the putative biomarkers were selected and analyzed, a peak matrix was built containing all the aligned spectra from all Cryptococcus isolates, processed as described in Table S2. This peak matrix was constructed with ten species-specific biomarkers and it was used as input for a dendrogram obtained measuring Euclidean distance from Principal Component Analysis (PCA) scores.
Over the peak matrix, two approaches were applied in order to discriminate the three Cryptococcus species. The first one was a two-step method in which the discrimination of C. deneoformans from the other two species was performed as a first step and it was replicated by means of two supervised machine learning algorithms on the same peak matrix: Partial Least Squares Discriminant Analysis (PLS-DA) and Support Vector Machine (SVM). Results were validated using k-fold cross validation method.
In the second step, a new peak matrix was performed in order to achieve a better discrimination of C. neoformans from the interspecies hybrids. A second dendrogram was performed using the above mentioned parameters. Again, PLS-DA and SVM were applied to this second peak matrix to replicate the classification. The k-fold cross validation method was also applied. The two-step method was further improved by the exclusion from the peak matrix of peaks that did not provide enough discrimination.
Finally, in order to simplify the workflow, a one-step method was assayed so that the capacity of the algorithms to discriminate the three Cryptococcus species at the same time was tested. In this case, only one peak matrix with spectra from the three species was built and 5 species-specific biomarkers were included. The alignment and processing parameters were the same as in the two-steps approach. The one-step method was evaluated using the peak matrix generated as input data for PLS-DA and SVM analysis. Besides, the validation in both cases was performed using k-fold confusion matrix.

Ethic Statement
The hospital Ethics Committee approved this study and gave consent for its performance (Code: MICRO.HGUGM.2017-003). Since only microbiological samples were analyzed, not human products, all the conditions to waive the informed consent have been met.
Only two isolates (8.0%) were correctly identified at the species level with high-confidence score values (≥2.0) whilst 52.3% of the samples -23-were identified with low-confidence scores (>1.7) - Table 1-. Another 4 isolates were reliably identified to the species level, although with scores values ranging between 1.7 and 1.6 and, finally, 8 isolates obtained scores below 1.6. The latter can be considered as unreliable identifications. 1 Identified as C. neoformans complex (n=2); 2 Identified as C. neoformans complex (n=7); 3 Identified as C. neoformans complex (n=1), C. deneoformans (n=4) and C. neoformans (n=3); 4 Identified as C. neoformans complex (n=1) and C. deneoformans (n=3); 5 Identified as C. neoformans (n=7) Using the in-house library all C. neoformans and C. deneoformans isolates were correctly identified by MALDI-TOF MS at the species level (Table 1). Moreover, 21/25 of these isolates (84.0%) were identified with score values ≥2.0 which indicates a high-confidence level. The reliability of the identification was further demonstrated by the fact that the top 4-5 identifications were identical in all cases. In all but two cases these top reference isolates belonged to the HGM in-house library.
However, the implementation of the expanded HGM library only allowed the correct identification of 12/19 interspecies hybrids, 7 of them with score values above 2.0. The high closeness of the interspecies hybrids with the other two Cryptococcus species made it difficult for MALDI-TOF MS to discriminate among them and misidentified 7 interspecies hybrids as C. neoformans (Table 1).

Peak Analysis
To improve the identification of the interspecies hybrids and their discrimination from C. deneoformans and C. neoformans, peak analysis was performed. The search for species-specific biomarker peaks yielded a list of 10 peaks that allowed the differentiation of the Cryptococcus species analysed, with 5 of them showing higher discriminative power ( Table 2). The two-step method allowed correct differentiation of the interspecies hybrids which clustered distinctly in the dendrograms built using two different hierarchical clustering variations (Figure 2 and Figure S2). These dendrograms showed three different clusters where C. deneoformans isolates were clearly separated from C. neoformans and the interspecies hybrids. Accurate differentiation among the 3 Cryptococcus species was achieved using the peak matrix built upon the 5 most discriminative peaks, with only one spectrum from an interspecies hybrid misallocated in the C. neoformans cluster ( Figure  2B). C. neoformans and the interspecies hybrids showed close relatedness between them based on their protein spectra.
The validation of the method yielded a k-fold (k=10) score of 96.92% for PLS-DA performed over the peak matrix with 10 biomarkers and 98.46% for the analysis with 5 biomarkers. However, SVM algorithm achieved 100% discrimination in both cases when PCA was applied (Table S3).
A second dendrogram was performed using hierarchical clustering analysis. It showed two well-defined clusters for C. neoformans and the interspecies hybrids ( Figure S2). In this step only the 3 biomarkers to differentiate C. deneoformans from interspecies hybrids were used (5453.91, 5552.90 and 7103.00 m/z). Furthermore, this second dendrogram was validated by PLS-DA and SVM algorithms. K-fold (k=10) was applied achieving 95.55% efficacy in both analyses.
In the single-step method, the peak matrix built with 5 biomarkers was used as an input for PLS-DA and SVM analysis in order to achieve the discrimination of the 3 Cryptococcus species simultaneously. PLS-DA analysis could not classify correctly the three varieties at the same time due to the low k-fold (k=10) values obtained. However, PCA performance prior to SVM allowed 98.46% correct classification of the three Cryptococcus species (Figure 3). The efficacy of the method was tested by k-fold (k=10) cross validation analysis was above 95.0%. (Figure 3, Table S3) Table 2. List of the 10 representative mass peaks of Cryptococcus spp. Identified as potential biomarkers. These peaks were used for the construction of 1 dendrograms and PLS-DA and SVM models. The 5 peaks marked with asterisks (*) were selected for the simplified models. CV= Coefficient of Variation.    Table 3). The visual detection of these biomarker peaks could provide a rapid and accurate identification of the Cryptococcus species prior to a more in-depth peak analysis using ad-hoc software.

Discussion
Accurate identification of Cryptococcus species within the C. neoformans complex provides valuable information about their epidemiology, sensitivity to commonly used antifungal drugs or virulence. Our results show that discrimination among the three Cryptococcus species analyzed-C. deneoformans, C. neoformans and interspecies hybrids-can be performed successfully using MALDI-TOF MS and peak analysis.
The implementation of the in-house database built in our laboratory allowed 100% correct species-level identification of the 25 C. deneoformans and C. neoformans isolates used to challenge it. Apart from the reliable identification of the analyzed Cryptococcus species, the in-house library also provided high confidence identifications in 63.6% of the cases. Furthermore, these results showed consistency along the 10 top identifications provided by the mass spectrometry instrument, even for the hybrids. This fact is of great importance in the routine of the microbiology laboratory in order to transfer reliable information to the clinicians. The results obtained are in agreement with those obtained by other authors [21][22][23]. However, the in-house library did not provide enough discrimination between the above-mentioned species and the interspecies hybrids. This goal was only fulfilled completely when peak analysis was performed and the three Cryptococcus species analyzed in this study distinctively clustered together. Other authors have provided species-level discrimination in 98.1-100% of the cases [21,23,27]. Although some of these studies were performed on higher number of isolates, our results also reflect the improvements made on the commercial database during the last years.
The available commercial database has demonstrated to provide high species-level resolution for C. deneoformans and C. neoformans-76.0%-although score values <1.7 were obtained in 21.0% of the cases and species-level identification was not provided for 2 C. deneoformans isolates. These data supported the need of building expanded databases. However, even improvements in the reference databases proved not to be enough to differentiate the interspecies hybrids. This may be due to the algorithms used by the mass spectrometry instrument for species assignment and to the fact that the hybrids show peaks present of both parental species. Therefore, peak analysis using ad-hoc software was performed. A list of 10 biomarker peaks was achieved as the input for species classification. The implementation of PLS-DA analysis in a two-step approach allowed the discrimination of C. deneoformans isolates in the first place and, subsequently, the correct classification of C. neoformans isolates and the interspecies hybrids in 96.92% of the cases. Furthermore, the accuracy of this method increased when the number of biomarker peaks used was reduced to the five most discriminative ones (98.46%).
In order to simply the analysis, a one-step method was applied in order to classify the three species simultaneously. In this case, PLS-DA provided correct classification in less than 75.0% of the cases but the application of SVM after PCA analysis allowed 96.92% correct discrimination of the analyzed isolates. This analysis provided a set of species-specific peaks for the Cryptococcus species within the C. neoformans complex that may be detected by visual inspection, representing a rapid and inexpensive approach for their discrimination.

Conclusions
Our results demonstrate the usefulness of MALDI-TOF MS and peak analysis when applied in the microbiology laboratory for rapid and reliable identification of non-Candida yeasts. Although the updated commercial library provided correct species-level identification for a high number of C. deneoformans and C. neoformans isolates, the identification of these species was missing or not reliable in 20.5% 18.2% of the cases, respectively. Moreover, the detection of the interspecies hybrids is not possible with the Biotyper database. However, the expanded in-house library allowed correct species-level identification for all C. deneoformans and C. neoformans, either by conventional identification with MALDI-TOF MS or by peak analysis. The interspecies hybrids required hierarchical clustering for their correct identification since their close relatedness with the other species made it difficult for MALDI-TOF to differentiate them from the other two species in a routine manner. This approach and the detection of species-specific peaks are recommended for the reliable discrimination of the three analyzed species. . :   Table S1. Pre-treatment of raw data spectra for the biomarkers search. Table S2. Parameters applied for the construction of the peak matrix. Table S3. Score and validation of (A) the SVM analysis applied to the two-step model with 5 biomarkers and (B) PLS-DA applied to the one-step model with 5 biomarkers. K=10 Figure S1. ROC Curve of MS peaks of Cryptococcus spp. according to the criteria for biomarkers search. Figure S2. Discrimination between C. neoformans and the interspecies hybrids using the two-step classification method.