Results
Sample Demographics and Clinical Characteristics
The study enrolled 78 participants, comprising 49 patients with IBS and 29 HCs. Demographic analysis revealed comparable age distributions between groups (median age: IBS = 34 years, controls = 33 years). Female participants predominated in both cohorts, representing 77.6% (38/49) of the IBS group and 69.0% (20/29) of the control group, reflecting the typical gender distribution observed in IBS populations.
Symptom severity, quantified using the IBS Symptom Severity Scale (IBS-SSS), demonstrated clear differentiation between groups. The IBS cohort exhibited predominantly moderate to severe symptomatology, while healthy controls reported minimal gastrointestinal symptoms, aligning with our inclusion criteria. Six participants (three from each group) had missing IBS-SSS data, which we addressed through multiple imputation stratified by group and gender to maintain statistical robustness. Detailed demographic and clinical characteristics are presented in
Table 2.
Replication Analysis of Skrobisz (2022) using the Bergen Cohort (with FS 6.0.1)
In our Bergen cohort, we sought to replicate the morphometric findings reported by Skrobisz et al. [
23] comparing IBS patients with healthy controls.
Table 3 presents our comparative analysis using identical methodological parameters: 35 estimated Total Intracranial Volume (eTIV)-normalized regional brain volumes derived from FreeSurfer 6.0, matching the analytical approach of the original study.
The volumetric comparison of brain structures between IBS patients and healthy controls across both cohorts reveals distinct patterns. While the Bergen cohort demonstrates systematically larger volumes (6-8% for global measures, reaching up to 35% for specific structures such as the nucleus accumbens), the within-cohort comparisons between IBS and healthy control groups show remarkable consistency in global brain eTIV-normalized volumes. Specifically, BrainSegVol values remain nearly identical within each cohort (Skrobisz: HC , IBS ; Bergen: HC , IBS ). Cortical measurements demonstrate similar stability, with total cortical volume (CortexVol) showing minimal between-group differences in both cohorts. In subcortical structures, we observed subtle variations, notably a slight trend toward volume reduction in IBS patients’ subcortical gray matter (SubCortGrayVol), though these differences remain within standard deviation bounds. White matter volumes maintain consistency between groups within cohorts, with an interesting pattern of white matter hypointensities emerging in the Bergen cohort. Corpus callosum segments exhibit relatively uniform volumes across all groups. Several methodological factors warrant consideration: the disparate cohort sizes (Skrobisz: HC , IBS ; Bergen: HC , IBS ), potential variations in FreeSurfer versions (6.0 versus 6.0.1), and differences in operating systems may contribute to the systematic volumetric differences observed between cohorts. While normalization to estimated total intracranial volume (eTIV) facilitates direct comparisons within cohorts by controlling for head size variation, it does not fully account for between-cohort differences.
Figure 3 presents a detailed reproducibility analysis, illustrating the differences in eTIV-normalized brain region volumes between HC and IBS across both cohorts. The plot contrasts effect sizes from the Skrobisz (2022) cohort (
x-axis) against the Bergen cohort (
y-axis), with the diagonal line representing perfect agreement. We employed Cohen’s d values for region-wise effect size calculations, as the availability of only parametric summary statistics from the Skrobisz study precluded non-parametric effect size measures. For each eTIV-normalized brain region volume and cohort, we calculated the pooled standard deviation as:
where
and
are the sample sizes, and
and
are the standard deviations of the two groups, IBS and HC, respectively. Cohen’s d effect size was then computed as:
where
and
are the means of the two groups. The 95% confidence interval for d was calculated using:
where the standard error term accounts for both sampling variance and uncertainty in the effect size estimate.
An overall reproducibility score (S) was developed for each brain region to quantify cross-cohort consistency through three complementary metrics: directional consistency (), confidence interval overlap (), and effect magnitude (). The score is computed as: , where the binary indicator equals 1 if the direction of effect is consistent between cohorts and 0 otherwise, the binary indicator equals 1 if the 95% confidence intervals overlap and 0 otherwise, and represents the minimum absolute effect size observed across cohorts.
This composite metric prioritizes brain regions exhibiting robust cross-cohort replication, with providing additional weight to stronger effects. Higher scores (S) indicate greater reproducibility of morphometric findings across independent study populations and analysis pipelines, thereby establishing a quantitative framework for identifying the most reliable neuroanatomical alterations in IBS.
The effect size comparison between cohorts revealed moderate correlation (r = 0.203, p = 0.243). Directional consistency analysis demonstrated that 51.4% of brain regions maintained consistent IBS versus HC differences across cohorts. Notably, all brain regions exhibited overlapping 95% confidence intervals between cohorts, indicating that despite differences in point estimates, the between-cohort variations did not reach statistical significance given measurement uncertainty. Five regions demonstrated particularly strong cross-cohort consistency, achieving the highest overall reproducibility scores (S): mid-anterior corpus callosum (CC_Mid_Anterior), Left-Pallidum, Left-Thalamus, Right-Pallidum, and Left-Amygdala. These structures showed overall scores ranging from 2.14 to 2.26, suggesting robust replication of IBS-related alterations. Conversely, several regions exhibited marked between-cohort divergence. White matter hypointensities demonstrated particularly discordant effects, while specific corpus callosum segments (CC_Posterior and CC_Mid_Posterior) showed stronger effects in the Bergen cohort. Cerebellar regions clustered near the origin, indicating consistently modest effects across both cohorts. The overall pattern suggests limited agreement between cohorts in IBS-related brain alterations. While specific structures show robust reproducibility, the widespread dispersion around the diagonal reference line, coupled with moderate correlation, indicates substantial heterogeneity in morphometric findings between these independent samples. This variability may reflect genuine biological heterogeneity in IBS-related brain alterations or methodological differences between studies.
Figure 4 plots a ranking of brain regions on how consistently they show similar patterns between the cohorts.
The reproducibility analysis revealed varying degrees of cross-cohort consistency in brain structural alterations associated with IBS. Several regions demonstrated robust reproducibility, with the Left-Pallidum, Left-Thalamus, and CC_Mid_Anterior achieving overall scores (S) exceeding 2.0. These high-scoring regions exhibited both directional consistency and complete confidence interval overlap, coupled with substantial effect magnitudes, suggesting reliable IBS-related volumetric alterations across independent samples. Conversely, regions including the Right-Caudate, Right-Cerebellum-Cortex, Left- and Right-Hippocampus, CC_Mid_Posterior, and Left-Cerebellum-Cortex showed lower reproducibility (scores approximately 1.1). While these regions maintained confidence interval overlap, they lacked directional consistency between cohorts, suggesting greater variability in IBS-related effects. Despite systematic between-cohort differences in eTIV-normalized volumes, certain regions demonstrated consistent relative patterns of alteration. However, our attempt to replicate the specific morphometric differences reported by Skrobisz et al. (2022) yielded limited success. This suggests that structural brain alterations in IBS may be more heterogeneous than previously recognized, potentially reflecting the complex nature of IBS pathophysiology or methodological variations across studies.
To assess the robustness of brain morphometry measurements in IBS research, we conducted a comprehensive analysis of the Bergen cohort data using multiple FreeSurfer processing pipelines. This systematic evaluation examined the stability of morphometric measurements and IBS versus healthy control (HC) group differences across different analytical approaches: FreeSurfer versions (6.0.1 versus 7.4.1) and processing streams within FreeSurfer 7.4.1 (cross-sectional versus longitudinal). Our interventional study design enabled the application of the longitudinal processing stream, providing an additional dimension for assessing measurement reliability. Unlike our previous replication analysis of the Skrobisz (2022) cohort, which relied on summary statistics, this comparison utilized complete morphometric data from all participants, allowing for more detailed assessment of measurement consistency.
Cross-Version Comparison of FreeSurfer Morphometric Measurements
We examined the consistency of volumetric measurements between FreeSurfer versions 6.0.1 and 7.4.1 (cross-sectional stream) in quantifying brain structural differences between IBS patients and healthy controls (HC).
Table A2 in the Appendix presents group-wise summary statistics (mean and standard deviation) for both IBS patients and healthy controls, derived from the
aseg.stats files generated by each FreeSurfer version.
Figure 5 presents a scatter plot matrix illustrating version-wise comparisons for each brain region. Individual plots display FS 6.0.1 volumes against corresponding FS 7.4.1 measurements, with HC and IBS participants distinguished by blue and red markers, respectively. Reference identity lines facilitate direct assessment of cross-version measurement concordance.
The scatter plot matrix demonstrates varying degrees of consistency between FreeSurfer versions 6.0.1 and 7.4.1 across different brain regions. Subcortical regions, particularly the thalamus, caudate, putamen, and partly the hippocampus, show strong cross-version agreement with minimal deviation from the identity line. However, systematic differences emerge in several structures: the amygdala and the accumbens demonstrate moderate version-dependent variability, with data points showing systematic deviation from perfect concordance. Corpus callosum segments display region-specific variations in cross-version agreement, with CC_Anterior and CC_Mid_Anterior showing more pronounced differences compared to other segments. Importantly, the distribution patterns of IBS (red) and healthy control (blue) groups remain fairly consistent across versions, suggesting that while absolute volume estimates may differ between FreeSurfer versions, the relative group differences are largely preserved.
Notably, several regions exhibit strong correlations between versions but with systematic offsets from the identity line, indicating consistent biases between FreeSurfer versions 6.0.1 and 7.4.1. For example: The cortical measurements (lhCortexVol and rhCortexVol) and lh- and rhCerebralWhiteMatterVol and TotalGrayVol show a clear parallel offset above the identity line, indicating that FreeSurfer 6.0.1 consistently produces higher volume estimates compared to version 7.4.1. This systematic bias appears consistent across the full range of eTIV-normalized volumes and both subject groups. Similar parallel offsets are visible in Left- and Right-Cerebellum-Cortex and subcortical structures like the Left-Pallidum and Left-Caudate. Moreover, the eTIV shows systematic higher volumes in version 7.4.1 compared to version 6.0.1 measurements.
Several key structures exhibit individual outliers that warrant attention. In eTIV, a single measurement shows substantial deviation, suggesting potential segmentation challenges in this particular case. The Left- and Right-Hippocampus both show isolated outliers (visible as blue points) significantly deviating from the otherwise tight correlation pattern, indicating potential segmentation inconsistencies between versions for these specific control subjects. The Left-Thalamus displays a particularly notable outlier (blue point) that deviates substantially below the main correlation pattern, suggesting a case where version 7.4.1 produced a markedly lower volume estimate compared to version 6.0.1. Similar isolated discrepancies appear in both Left- and Right-Amygdala measurements, where single data points (again from the control group) deviate notably from the otherwise consistent version correlation. These individual outliers likely represent cases where the segmentation algorithms in the two FreeSurfer versions interpreted the anatomical boundaries differently, possibly due to image quality issues, anatomical variants, or differences in how the versions handle boundary cases. The fact that many of these outliers appear in the control group (blue points) suggests that these discrepancies are not specifically related to IBS pathology but rather to technical aspects of the segmentation process.
These observations underscore the importance of version consistency in morphometry-based classification studies and suggest that meta-analyses or multi-site studies should carefully account for FreeSurfer version effects in their analytical pipelines.
In this context,
Figure 6 depicts a scatter plot matrix comparing brain region volumes between two pipelines (cross-sectional and the longitudinal stream) using the
same FreeSurfer 7.4.1 version, highlighting potential discrepancies.
The comparison between FreeSurfer 7.4.1’s cross-sectional and longitudinal processing streams reveals distinct patterns of agreement and systematic variation across brain regions. Global measurements (BrainSegVol, BrainSegVolNotVent) demonstrate strong cross-stream consistency, with tight clustering along the identity line. However, substantial systematic differences emerge in several key structures. Most notably, cortical volumes (lhCortexVol, rhCortexVol) exhibit a clear systematic bias, with longitudinal processing consistently producing higher volume estimates compared to the cross-sectional stream. This pattern contrasts with Left- and Right-Cerebellum-Cortex, where longitudinal processing yields systematically lower estimates. Subcortical structures display varying degrees of processing stream sensitivity: the putamen and caudate show consistent offsets from the identity line, while pallidum and accumbens measurements demonstrate greater scatter. Corpus callosum segments (CC_Anterior, CC_Mid_Anterior, CC_Central) reveal processing stream-dependent variations that differ from those observed in other structures. Looking at the eTIV plot in the top-left panel, it shows remarkably high consistency between cross-sectional and longitudinal processing streams. The data points cluster tightly along the identity line across the full range of values (approximately 1.2-1.8 × 106 mm3), with minimal deviation. This strong agreement in eTIV estimations between processing streams is particularly noteworthy because eTIV serves as the normalization factor for all other volumetric measurements. The consistency suggests that any observed differences in other brain regions are not attributable to variations in total intracranial volume estimation between processing streams, but rather reflect genuine methodological differences in how the two streams segment specific structures.
Importantly, these systematic biases maintain consistency across both IBS and healthy control groups, as evidenced by the parallel patterns of red and blue markers. This indicates that while absolute volume estimates differ between processing streams, the relative group differences remain largely preserved. These findings underscore the critical importance of maintaining consistent processing stream selection when conducting cross-sectional comparisons or longitudinal analyses in clinical studies.
The summary statistics by the mean and standard deviation for Freesurfer v. 7.4.1 cross-sectional and v. 7.4.1 longitudinal stream, respectively, are shown in the Appendix as
Table A3.
Figure 7 illustrates the differential impact of FreeSurfer processing choices on IBS versus healthy control effect sizes across brain regions. Panel (a) compares effect sizes between FreeSurfer versions 6.0.1 and 7.4.1 (cross-sectional), while panel (b) contrasts effect sizes derived from FreeSurfer 7.4.1’s cross-sectional and longitudinal processing streams, enabling assessment of both version and pipeline-specific influences on group differences.
The scatter plots reveal distinct patterns in how FreeSurfer methodological choices affect IBS versus healthy control effect sizes across brain regions. Panel (a), comparing FreeSurfer versions 6.0.1 and 7.4.1 (cross-sectional), demonstrates moderate agreement with notable version-specific variations. Key corpus callosum segments (CC_Anterior, CC_Mid_Posterior) show the strongest positive effect sizes (approximately 0.4) and maintain relative consistency across versions. In contrast, the Left-Accumbens-area exhibits the strongest negative effect (approximately -0.4), with its magnitude varying between versions. Panel (b), comparing cross-sectional and longitudinal streams within FreeSurfer 7.4.1, shows that corpus callosum segments maintain their position as regions with the strongest positive effects, while the Left-Amygdala and Left-Accumbens-area show pronounced negative effects. Most subcortical structures cluster more tightly around the diagonal compared to the version comparison in panel (a). The longitudinal versus cross-sectional comparison demonstrates greater overall consistency than the version comparison, as evidenced by tighter clustering along the diagonal reference line. This suggests that processing stream selection within FreeSurfer 7.4.1 introduces less variability in effect size estimates than version changes. However, specific regions, particularly in the limbic system, show sensitivity to processing stream choice. This systematic comparison highlights that while both FreeSurfer version and processing stream selection affect effect size estimates, version differences generally introduce more variability than processing stream choices within the same version.
Figure 8 quantifies the reproducibility of IBS versus healthy control group differences across brain regions under different FreeSurfer methodological variants. Panel (a) ranks regions by their effect size consistency (
S) between FreeSurfer versions 6.0.1 and 7.4.1 (cross-sectional), while panel (b) presents regional rankings based on effect size stability between cross-sectional and longitudinal processing streams within FreeSurfer 7.4.1, enabling systematic assessment of both version and pipeline-dependent variations.
The regional consistency scores reveal distinct patterns in how FreeSurfer methodological choices affect the reproducibility of IBS versus healthy control differences. Panel (a), comparing FreeSurfer versions 6.0.1 and 7.4.1 (cross-sectional), shows a gradual distribution of consistency scores ranging from 1.0 to 2.5. Corpus callosum regions (
CC_Mid_Posterior,
CC_Posterior) demonstrate the highest consistency, while cerebellar structures show the lowest. Subcortical regions exhibit intermediate consistency, suggesting moderate stability across FreeSurfer versions. Panel (b), comparing cross-sectional and longitudinal streams within FreeSurfer 7.4.1, reveals a more distinct clustering pattern. The
CC_Anterior and
CC_Mid_Posterior maintain high consistency, but notably, limbic structures like the
Left-Amygdala and
Right-Thalamus show improved consistency compared to their version-wise rankings. This suggests that these regions are more sensitive to FreeSurfer version changes than to processing stream selection. The overall pattern indicates stronger methodological stability when varying processing streams within FreeSurfer 7.4.1 compared to cross-version analyses. Importantly, comparing these methodological variations within the same cohort yields higher consistency scores than the previous cross-cohort comparison (
Figure 4), highlighting the substantial impact of cohort-specific factors on brain morphometric findings in IBS research.
Multivariate Analyses: IBS versus HC
The multivariate normality of brain structural data was assessed across three FreeSurfer processing streams using Mardia’s test (examining skewness and kurtosis) and the Henze-Zirkler’s test. For FS 6.0.1, Mardia’s test revealed significant deviations in both skewness (, ) and kurtosis (, ) for the full sample, with similar patterns in the IBS group but different skewness characteristics in the HC group. For FS 7.4.1 cross-sectional, both groups showed significant non-normality, with particularly extreme values in the IBS group (kurtosis statistic = 153.63, ). The FS 7.4.1 longitudinal analysis also indicated significant departures from multivariate normality across all groups. The Henze-Zirkler’s test showed some numerical instability issues, evidenced by extreme values and negative test statistics, suggesting that its results should be interpreted with caution. Overall, these findings consistently indicate significant departures from multivariate normality across all FreeSurfer versions and subject groups, with particularly pronounced effects in the IBS group. This suggests that robust statistical methods should be employed for subsequent analyses of group differences in brain structure.
In this context, the robust Mahalanobis distance analysis was implemented to quantify the multivariate separation between IBS and HC groups across different FreeSurfer processing streams while accounting for potential outliers and non-normality in the neuroimaging data. The computation employs winsorization at the 10th and 90th percentiles to mitigate the impact of extreme values, followed by robust location estimation using medians instead of means. The analysis revealed decreasing Mahalanobis distances across FreeSurfer versions: FS 6.0.1 showed the largest separation (, , ), followed by FS 7.4.1 cross-sectional (, , ) and FS 7.4.1 longitudinal (, , ). However, none of these distances reached statistical significance (all ), suggesting that the multivariate brain volume differences between IBS and HC groups are not statistically meaningful across any of the FreeSurfer processing streams. The consistently high p-values and low F-statistics indicate that, despite the apparent numerical differences in Mahalanobis distances, there is insufficient evidence to conclude that the IBS and HC groups differ significantly in their multivariate brain volume profiles. This analysis, incorporating 35 brain regions and accounting for their covariance structure, suggests that the volumetric differences between IBS and HC groups are not robust enough to clearly distinguish between the groups in a multivariate framework.
To further investigate potential group differences beyond the initial Mahalanobis distance analysis, we employed a machine learning framework with cross-validation to assess IBS versus healthy control discriminability and identify the most diagnostically relevant brain structures. This complementary approach enables systematic evaluation of multivariate patterns while accounting for potential interactions between brain regions.
Machine Learning-Based Classification Using Brain Morphometry
We evaluated the discriminative power of brain morphometric features for IBS versus healthy control classification using the PyCaret machine learning library. Multiple classification algorithms were trained and compared (Appendix
Figure A1) using FreeSurfer 7.4.1 longitudinal stream measurements from the Bergen cohort (
Table 2). We applied a binary classification framework to distinguish between healthy controls (0) and IBS patients (1) based on brain morphometric features. The dataset comprised 78 participants characterized by 37 numerical features, partitioned into training (n=54) and test (n=24) sets. We employed stratified 10-fold cross-validation to maintain consistent class proportions across folds. Feature preprocessing included mean-based imputation and standardization to zero mean and unit variance, particularly crucial for features with widely differing scales (e.g., raw eTIV values
versus eTIV-normalized measures
). Given the modest dataset size, analyses were performed using CPU computation. All random processes were controlled through a fixed session identifier to ensure reproducibility.
Model performance evaluation across 15 classification algorithms revealed Extreme Gradient Boosting (XGBoost) as the superior approach for IBS versus healthy control discrimination based on brain morphometry (details in
Figure A1). XGBoost achieved the highest performance metrics: accuracy (0.72), AUC (0.68), recall (0.72), precision (0.74), and F1 score (0.71). The model’s Cohen’s Kappa (0.40) and Matthews Correlation Coefficient (0.42) indicate substantial improvement over chance-level classification. K-Nearest Neighbors demonstrated the second-best performance, while Logistic Regression and Support Vector Machines showed moderate discriminative ability. Several algorithms, including AdaBoost and Linear Discriminant Analysis, performed near chance level, as benchmarked against a dummy classifier baseline. XGBoost’s superior performance suggests its ability to capture complex, nonlinear relationships in brain morphometric features that distinguish IBS from healthy controls.
The best-performing model (XGBoost) demonstrated mixed classification performance on the hold-out test set, as shown in
Figure 9a. The model correctly identified 73% of IBS patients (11/15 cases; 8 female, 3 male; IBS-SSS: 245.7 ± 60.4; age: 33.2 ± 7.6). However, specificity was low at 11%, with 8 of 9 healthy controls misclassified as IBS (3 female, 5 male; IBS-SSS: 19.2 ± 19.6; age: 25.4 ± 5.7), yielding an overall accuracy of 50% (12/24). This asymmetric performance reveals systematic patterns: correctly classified IBS patients showed higher symptom severity scores (IBS-SSS), female predominance, and higher mean age compared to misclassified controls. The strong bias toward IBS classification suggests that while brain morphometric features contain discriminative information, additional refinement is needed for reliable diagnostic application.
Permutation importance analysis revealed the relative contribution of brain regions to IBS versus healthy control classification. The central corpus callosum (CC_Central) emerged as the most discriminative feature (), followed by white matter hypointensities () and the left nucleus accumbens (). A second tier of discriminative regions includes the mid-posterior corpus callosum () and left amygdala (), while cerebellar structures showed moderate importance (right cerebellar cortex (). Notably, several traditionally studied regions in IBS, including the hippocampus () and total intracranial volume (), demonstrated relatively lower discriminative power. This hierarchy suggests that white matter structures, particularly corpus callosum segments, may play a more prominent role in IBS-related brain alterations than previously recognized. However, the permutation importance ranking should be interpreted cautiously given the large standard deviations and the model’s modest classification performance (50% accuracy, 73% sensitivity but only 11% specificity). While the ranking identifies features that contribute most to the model’s decisions, these contributions come from a model that shows strong bias toward IBS classification and poor discriminative ability for healthy controls.
To gain deeper insight into how individual brain regions influence the model’s classification decisions, we employed SHAP (SHapley Additive exPlanations) analysis.
Figure 10 visualizes the contribution of each morphometric feature to individual predictions, with SHAP values indicating both the direction and magnitude of each feature’s impact. This analysis extends beyond traditional feature importance rankings by revealing how specific volumetric measurements drive classification outcomes on a case-by-case basis. High feature values (red) and low feature values (blue) can contribute differently to the model’s decisions, providing a more nuanced understanding of the relationship between brain morphometry and IBS classification than permutation importance alone. The figure reveals complex patterns in how morphometric features influence predictions. For example, high values (red) in the right caudate tend to push predictions toward IBS (positive SHAP values), while low values (blue) in this region tend to predict healthy control. This asymmetric impact of feature values suggests nonlinear relationships between brain structure volumes and IBS classification that may not be captured by simpler univariate analyses.
The SHAP analysis reveals more nuanced feature contributions than the permutation importance ranking, while also showing some notable consistencies. CC_Central ranks highest in permutation importance and shows meaningful SHAP values, but with complex patterns where both high and low values contribute to classification. Similarly, CC_Mid_Posterior shows similar importance in both analyses, with relatively consistent effects. White matter features, particularly WM-hypointensities, rank high in both analyses, suggesting robust importance, with SHAP patterns indicating that higher values tend to predict healthy controls. Among subcortical structures, the Left-Accumbens-area appears important in both analyses, with SHAP values showing that lower volumes tend to predict IBS. The Left-Amygdala shows moderate importance in both analyses, with high values generally predicting healthy controls. Notable differences emerge: the Right-Caudate shows strong SHAP value patterns but does not appear in the top permutation importance features, while the Right-Hippocampus ranks lower in permutation importance but shows distinct SHAP patterns. This comparison suggests that while some features (like corpus callosum regions and white matter hypointensities) show consistent importance across methods, the SHAP analysis reveals more complex relationships between feature values and model predictions. This richer characterization of feature contributions might explain some of the model’s classification biases, particularly given the observed asymmetric effects where high and low values of the same feature can have different impacts on predictions. However, these feature contribution analyses must (again) be interpreted in the context of the model’s modest classification performance (50% accuracy, 73% sensitivity, 11% specificity). The SHAP values and permutation importance rankings identify features that drive the model’s decisions, but given the strong bias toward IBS classification, these patterns may reflect systematic misclassification rather than truly discriminative neuroanatomical markers. The complex feature interactions revealed by SHAP analysis might partially explain the model’s poor specificity, suggesting that while consistent morphometric patterns exist, they are insufficient for reliable diagnostic classification without additional clinical information.
Multimodal Classification of IBS Using Brain Structure and Cognitive Measures
To evaluate whether combining brain morphometry with cognitive performance improves diagnostic classification, we implemented machine learning models using both feature types. We systematically compared classification performance between models trained on morphometric features alone versus those incorporating both morphometric and cognitive measures.
Figure 12a presents the detailed classification outcomes, while
Figure 12b shows the relative importance of combined features in the model’s decision-making.
Table 5 quantifies the impact of feature combination through multiple performance metrics. Results are shown for the XGBoost model (ranked 2nd best, after
knn).
The confusion matrix in
Figure 12a illustrates the XGBoost model’s classification performance using combined brain morphometry and cognitive features. The model demonstrates high sensitivity but poor specificity in IBS detection. Among IBS patients, 14 of 15 were correctly identified (93.3% sensitivity), with these true positives showing characteristic IBS-SSS scores (
) and female predominance (11F/3M). However, specificity was low (22.2%), with only 2 of 9 healthy controls correctly classified. The misclassification patterns reveal notable demographic and clinical features. The false positives (7 controls misclassified as IBS) show a male predominance (5M/2F) and lower age (
years) compared to true positives, despite normal IBS-SSS scores (
). The single false negative case presents distinct characteristics: male, older (
years), with substantial symptom severity (IBS-SSS:
). These classification outcomes suggest that while the combined morphometric and cognitive features enable sensitive IBS detection, they lack specificity. The gender-specific misclassification patterns and age-related differences in classification accuracy indicate potential demographic influences on the model’s performance. These findings highlight both the promise and limitations of multimodal classification approaches in IBS diagnosis.
Feature importance analysis (
Figure 12b) reveals the relative contributions of brain structural and cognitive measures to IBS classification. The right hippocampus emerges as the most discriminative feature (importance
), followed by the right pallidum and left cerebellar white matter (importance
, and
, respectively). Notably, cognitive performance, represented by the Recall Index and Verbal skills Index, ranks among the top discriminative features, suggesting that the integration of cognitive measures enhances classification performance. The ranking highlights a mixed contribution of structural and cognitive features, with subcortical structures (
Right-Hippocampus,
Right-Pallidum,
Left-Accumbens-area) showing particularly strong discriminative power. Global brain measures (
CortexVol,
BrainSegVol,
BrainSegVolNotVent) demonstrate minimal importance, suggesting that regional rather than global alterations better distinguish IBS from healthy controls. This importance ranking should be interpreted in the context of the model’s classification performance metrics, where despite improved sensitivity with combined features, specificity remains low. The prominence of memory-related structures and cognitive measures aligns with the observed group differences in RBANS scores, providing a potential neurobiological basis for cognitive alterations in IBS.
Table 5 quantifies the impact of incorporating cognitive measures into the morphometry-based classification through comprehensive performance metrics. The addition of cognitive features to brain morphometry (M ∪ C) substantially improved model performance across multiple dimensions: sensitivity increased from 73.3% to 93.3%, accuracy from 50.0% to 66.7%, and the F1 score from
to
. While specificity remained modest, it showed improvement from 11.1% to 22.2%. The Matthews Correlation Coefficient (MCC) shifted from
to
, indicating enhanced overall classification performance when combining both feature types.
SHAP analysis reveals the complex interactions between brain structure, cognitive performance, and IBS classification. The right hippocampus demonstrates the strongest feature impact, with higher volumes (red) generally predicting healthy control status and lower volumes (blue) predicting IBS. The Verbal skills Index emerges as the second most influential feature, showing a distinct pattern where lower scores tend to predict IBS classification. Among subcortical structures, the right caudate and putamen show notable but contrasting patterns. The right caudate exhibits a clustered distribution with clear value-dependent effects, while the right putamen shows more dispersed impact across participants. Left cerebellar white matter demonstrates moderate influence, with its effect direction varying based on volume. The overall pattern suggests a hierarchical organization of discriminative features, where both structural and cognitive measures contribute to classification decisions. Lower-ranked features, including global measures (TotalGrayVol, lhCortexVol) and white matter hypointensities, show minimal impact on model predictions, suggesting that regional rather than global alterations better characterize IBS-related brain differences.