Multimodal Metabolomics Combining UPLC-qToF-MS and GC-MS Data in Plasma and Brain Tissue

Metabolomic analysis of biological fluids and tissues has become an increasingly routine tool in the biological toolbox. However, challenges remain to be overcome, including developing strategies to maximise coverage of the metabolome without requiring large sample volumes. Here we describe a multimodal strategy that combines data using both LC-MS and GC-MS from a unique vial with a sample of plasma (20μl) or a sample of brain tissue (3mg). Using a split phase extraction the non-aqueous phase was analyzed by reversed phase (RP) LC-MS, whilst the aqueous phase was analyzed using hydrophilic liquid interaction chromatography (HILIC)LC-MS, with both phases also analysed using GC-MS after derivatization of the extract. Analytical performance was assessed in 7 rat cerebellum samples and a pilot study of 40 plasma samples (20 vs. 20: AD vs. healthy controls). The method, which uses four hours of instrument time, measured 20,707 metabolite features in brain samples and 17,266 in plasma samples, from those 44.1% features displayed CV’s below 15% and 75.2% below 30%. The method has potential to resolve subtle biological differences and to correlate metabolite composition directly to clinical outcomes including MMSE, age and ADCS-ADL. This method can acquire in the order of 20K metabolic features when low volumes are available.


Introduction
Metabolomics is the unbiased analysis of the composition of small molecule metabolites in a given biological tissue or bio-fluid under a specific set of environmental conditions [1,2].In recent years the development of new methods has seen metabolomics progress from a novel analytical technique towards a mainstay of the biological toolbox.However, before this transformation can be completed a number of technical challenges remain to be overcome, foremost among these is maximizing coverage of the metabolome [3,4].A number of previous studies have done this by utilizing multiple complimentary techniques including liquid chromatography -mass spectrometry (LC-MS), nuclear magnetic resonance spectroscopy (NMR) and gas chromatography -mass spectrometry (GC-MS) [5][6][7].However, these strategies require individual sample preparation techniques which significantly increases the volume of sample and potentially variation and analysis time.
Dementia represents a major cause of morbidity and mortality worldwide, with Alzheimer's disease (AD) accounting for 60-80% of all dementia cases [8], which is characterized by a progressive cognitive decline, starting with problems with short term memory progressing to disorientation, mood swings and language problems [9,10].In 2006 it was estimated that worldwide 46 million people suffered from AD [11] with the estimated financial cost expected to reach $1 trillion by 2018 [12] With an increasing elderly population the prevalence of AD is expected to increase fourfold to 131.5 million sufferers by 2050, significantly increasing both the social and financial burden represented by AD.Alzheimer's disease is known to have a long prodromal phase to the disease in which pathology accumulates without the presence of symptoms, making the discovery of a panel of biomarkers that can identify individuals in this phase of the disease.Current efforts to do this have been unsuccessful but it is hoped by extending our coverage of the metabolome to include metabolites previously unstudied in AD will help to overcome this.
To this end we developed a strategy that enables 6 analytical assays (4 LC-MS and 2 GC-MS) to be applied to a single in-vial dual extraction utilising as little as 3mg of brain tissue [13] or 20µl of plasma [14].The method was designed to analyse samples in the future such as human brain, prioritising the information acquired over the length of the analysis.The robustness and utility of this strategy was assessed by combining all the data and applying the method to 7 rat cerebellum samples and a pilot experiment of plasma samples of 20 Alzheimer's patients versus 20 healthy age matched controls split into stable and declining individuals to allow us to assess the ability of the platform to detect both large and subtle metabolic differences.

Chemicals and Reagents
All solvents including acetonitrile, ammonium formate, formic acid, methanol, methyl tertiary butyl ether (MTBE), toluene and water were all LC-MS grade purchased from Sigma-Aldrich with the exception of acetonitrile which was purchased from VWR international.Three internal standards were added for LC-MS analysis, L-serine 13 C3 15 N (95%) and L-valine 13 C5 15 N (95%) for hydrophilic liquid interaction chromatography (HILIC) and Tripentadecanoylglycerol for reversed phase (RP).

Biological Material
Brain tissue samples were obtained from rat cerebellums which were harvested as per Schedule 1 of the Animal (scientific procedures) Act of 1986-King's College London.Plasma samples from 40 individuals, 20 Alzheimer's patients and 20 healthy controls were collected in Kupio (Finland) for the Addneuromed cohort were selected for analysis.Patient groups were balanced for age and gender, with mini-mental state examination score (MMSE) balanced between genders within each group (Table 1).MMSE was used as the primary cognitive measure, with Alzheimer's disease cooperative study scale -Ability of Daily Living (ADCS-ADL) being used as a secondary cognitive measure specifically for AD patients.

Sample Preparation
The in vial dual extractions (IVDE) were performed as previously described for plasma [14] and brain tissue [13], the workflow is summarised in.After all of the LC-MS analysis had been performed the remaining aqueous and non-aqueous phases were split into separate vials and dried down under a stream of nitrogen at 37oC.Samples where then re-suspended in a 1:1 solution of acetonitrile and the derivatising agents (BSTFA) with 1% TMCS, and were incubated at 37 o C for 1 hour.After incubation samples were again dried down under nitrogen and were subsequently re-suspended in 25µl of toluene for analysis.A graphical description of the analytical workflow used in this study is shown in Figure1.

Data Acquisition
The HILIC and reversed phase LC-MS analysis was performed on a Waters ultra-performance liquid chromatogrpahy (UPLC) system coupled to a quadrupole rime-of-flight (Q-ToF) mass spectrometer (Waters, Milford, MA, USA) as described previously [13,14].GC-MS analysis was carried out on a Shimadzu QP-2010 with an AOC-20S auto sampler and AOC-20i auto injector (Shimadzu, Kyoto, Japan).The aqueous phase was analyzed in split less mode with 4µl of sample injected on a BP5MS column (length 30m, thickness 0.25mm, diameter 0.25mm).The carrier gas (helium) pressure was 79.5Kpa, with a total flow of 125ml/min, a column flow of 1.18ml/min, a linear velocity of 40cm/sec and a purge flow of 6ml/min.The gradient temperature started at 80 o C and was held for 5 minutes followed by a linear increase of 10 o C per minute to 200 o C, where the rate of increase was slowed to with a scan speed of 833 and an event time of 0.7 seconds.Samples were analyzed in a randomized order with pooled samples (Quality control (QC) samples) being analyzed after every 6 injections.

Data Processing
Initially all raw data files were converted into an mzXML format, LC-MS files were converted using msConvert (ProteoWizard), whilst GC-MS files were converted using GCMS Solutions ® (Shimadzu).
Converted data files were analyzed using XCMS, performed in the open source software package R, picking was performed using a "massifquant" method for the GCMS data allows isotope trace feature detection and a ''centwave'' method for LCMS data, allows the deconvolution of closely eluting or slightly overlapping peaks.Metabolite features were defined as any peak with an average intensity 5 times higher in the analytical samples than is measured for this peak in the extraction blanks.
Measured metabolite features from all assays were combined into a single multimodal dataset that was analyzed using a range of multivariate tests including principle component analysis (PCA selection to create curated models was performed by iteratively removing variables using the variable influence to projections plot to achieve the fitted model with the optimal R2 and Q2 values [15].The predictive ability of the generated OPLS models were validated by using permutation test [16].The results showed that none of the permuted Q 2 values is higher than the one in the original model which confirms the reliability of the produced models.

Reproducibility test and data quality of the method
The total time of this method was four hours which is unusually long in metabolomics, where methods need to be high-throughput.The present method was designed to acquire as much information as possible from one sample and the instrument time and cost of analysis were increased to achieve this end.The quality of the generated data was assessed by looking at the compositional similarity of QC samples based on all metabolite features using PCA (Figure 2).QCs clustered apart from QC1 which was the first injection into the run (highlighted in a box), the other QCs clustered once the columns were equilibrated, hence showing a clustering lower than biological variance suggesting across run reproducibility and stability.In a PCA analysis of just the QC samples it could be seen that the analytical drift across the run accounted for 17.2% of the overall variance observed in the QC.The selectivity and reproducibility of the method was assessed in both brain tissue (Table 2) and plasma (Table 3) by determining the number of peaks measured and their relative variance.In the raw brain data, it can be seen that the peaks detected in the 4 LC-MS methods were a total of 9459 peaks (signal/noise > 5), and reproducible with 61.8% of the metabolite features with coefficients of variance (CV) of less than 15% and 84.0% with CV's below 30% (Table 2).The data generated by GC-MS was also shown to be   Having shown the method was reproducible the next stage in assessing its performance was to determine its ability to detect differences in a small pilot between biological classes using multimodal multivariate data analysis.
Curated models were generated using both raw and TIC normalized data from all of the 6 platforms combined into a single dataset to identify the metabolite features with the greatest predictive performance for discriminating diagnostic classes.The optimal model calculated between all control and AD samples from the combination of all LC-MS and GC-MS raw data was based on 314 metabolite features, and showed a significant class separation (R 2 X = 0.489, R 2 Y = 0.903, Q 2 = 0.794, CV-ANOVA = 1.418×10 -  11 ).However, in the same comparison a better separation was achieved by using the TIC normalized LC-MS and GC-MS data, with the final curated model based of 426 features (Figure 3A) (R 2 X = 0.585, R 2 Y = 0.933, Q 2 = 0.869, CV-ANOVA = 2.72×10 -14 ).Of the 426 metabolite features on which this model is based 95 were measured in HILIC positive, 84 were measured in HILIC negative, 63 came from RP positive, 72 from RP negative, with GC aqueous accounting for 69 and GC Non-aqueous for 43 of the features.In both the OPLS-DA (Figure 3A) and PCA (Figure 3B) analysis there is a visible separation between controls and AD, but also between the control samples themselves, the stable (control A) and declining control (B) samples.To investigate this further we performed hierarchical clustering analysis (HCA) (Figure 3C) to assess the compositional similarity of individual samples from the three sample groups.This analysis showed that the 'declining' control samples clustered in the same primary clade as the AD samples suggesting that they were more compositionally similar to the AD samples than they are to the 'stable' controls.Whilst these samples were diagnosed clinically as controls and showed no difference in cognitive ability at baseline, at a 12 month follow up these individuals had exhibited a significant decline in cognitive function (0.83 MMSE points, p = 0.019).When metabolite features were considered individually it could be seen that the 'declining' controls had a similar abundance to that which is observed in the AD samples or is intermediary between the 'stable' controls and AD samples (Figure 4).Of the metabolite features that were important for driving the separation between the two groups of controls we annotated the excitatory neurotransmitter glutamate (Figure 4) which has previously been linked to the pathology and progression of Alzheimer's disease [17][18][19][20][21].In this small pilot metabolic shifts potentially associated with subtle differences in biological phenotypes were observed.Hence, this multimodal method was able to discriminate 'stable', 'declining' controls and AD patients using a PCA, however promising this result, the number of samples does not warrant metabolite identification which will be performed once features are validated in a follow up larger scale study.
Having observed that this method could discriminate between defined biological classes we wanted to determine whether metabolite composition could be directly correlated with a range of relevant clinical measures.Determining the association between metabolite composition and clinical outcomes was done using OPLS analysis with MMSE, age and ADCS-ADL set as Y variables.Using SIMCA's inner relations plot it can be seen that metabolite composition correlates with MMSE (Figure 5A) (R 2 = 0.893), age (Figure 5B) (R 2 = 0.636) and ADCS-ADL (Figure 5C) (R 2 =0.634).This direct correlation of metabolite composition to clinical outcomes further demonstrated biological relevance of the data generated using this multimodal method.

Conclusions
This study illustrates the utility of combining metabolomics data to measure large numbers of metabolite features from small sample volumes with multivariate statistics, to detect metabolic differences to clinical outcomes.It is hoped that by improving our ability to measure the metabolome using multimodal data, including proteomics and genomics and increasing sample numbers, diagnostic and prognostic markers of AD pathology can be discovered.

Figure 1
Figure 1 Analytical workflow from sample preparation to multiplatform analysis using LC-MS and GC-MS and onto multimodal and multivariate analysis.
2 o C per minute to a final temperature of 225 o C where it was held for 4 minutes.Analysis of 4µl nonaqueous phase was performed in the split less mode on the same column.The carrier gas (helium) pressure was set to 86.2Kpa with a total flow of 122.8ml/min a column flow of 1.16ml/min, a linear velocity of 40cm/sec and a purge flow of 6ml/min.The gradient temperature started at 100 o C and was held for 5 minutes followed by a linear increase of 15 o C per minute to 250 o C, where the rate of increase was slowed to 2 o C per minute to a final temperature of 310 o C where it was held for 4 minutes.Mass spectral analysis of both phases was performed using electron impact ionisation between 50 and 600m/z with an ion source temperature of 200 o C, an interface temperature of 280 o C ) and orthogonal projections to latent structures -discriminant analysis (OPLS-DA) and hierarchical clustering analysis (HCA) performed in SIMCA 14.0 (Umetrics, Umeå, Sweden).The data in all models was logarithmically transformed (base10) and pareto scaled.The performance of the generated models was assessed based on the cumulative correlation coefficients (R2X[cum]) and predictive performance based on seven-fold cross validation (Q2[cum]), with the significance of the model assessed based on the ANOVA of the cross-validated residuals (CV-ANOVA).Feature

Figure 2 PCA
Figure 2 PCA score plot of plasma samples generated from the combined dataset from all 6 analytical methods combined.Plot showing controls, Alzheimer' and quality control samples, first QC is enclosed in a square.

Figure 4
Figure 4Boxplots showing examples of features discriminating between all three diagnostic groups.The feature name is the mode of chromatography followed by the ionization mode, then the feature mass and retention time in minutes.AD; Alzheimer's disease, HILIC; hydrophilic liquid interaction chromatography, QC; quality control.

Table 1 Characteristics of study cohort. a
male/female, b number of APOE4(genetic risk factor for AD) positive patients, c Alzheimer's disease Cooperative Study scale for Activities of Daily Living.

preprints.org) | NOT PEER-REVIEWED | Posted: 14 April 2017 doi:10.20944/preprints201704.0080.v1 15%
and 30% respectively.Normalization of the GC-MS data also improved reproducibility from 16.5% to 30.9% with CV's below 15% and 30%.Inspection of the non-aqueous phase data which shows the least number of features with CV's <30% compared to LC-MS and GC-MS aqueous method.However, during method development the non-aqueous phase achieved CV's (data not shown) in line with the brain experiments shown in Table2.The aqueous phase showed 34.9% of features with CV's below 15%, and 63.3% below 30% also in line with the brain experiment.
measuring 11248 (signal/noise > 5) metabolite features, however the reproducibility of the data is poor with only 1.5% of metabolite features showing CV's of less than 15% and 12.3% below 30% CV's.To improve the reproducibility, the data was normalized to total ion count (TIC), with this normalization normalized to TIC, produced a modest increase in the reproducibility of the RP data from 33.3% and 74.0% to 50.7% and 75.2% and HILIC data from 41% and 60.5% to 50.5% and 71% with CV's below Preprints (www.

Table 2 Reproducibility of RAW and TIC normalized data generated by each analytical method from brain tissue samples. Variability of metabolite features measured in 7 independent IVDE's from rat cerebellum
. a peak numbers for raw data b peak numbers for TIC normalized data c coefficient of variance of peak intensity between runs.GC; gas chromatography, HILIC; hydrophilic liquid interaction chromatography, Neg; negative, Pos; positive, RP; reversed phase.

Table 3 Reproducibility of RAW and TIC normalized data generated by each analytical method from plasma samples. Variability of metabolite features measured in 8 pooled QC samples.
a peak numbers for raw data b peak numbers for TIC normalized data c coefficient of variance of peak intensity between runs.GC; gas chromatography, HILIC; hydrophilic liquid interaction chromatography, Neg; negative, Pos; positive, RP; reversed phase.