Multi-ROI Multimodal 3D Vision Transformer for Alzheimer’s Disease Classification with Attention-Based Interpretability

Juan A. Castro-Silva; María N. Moreno-García; Diego H. Peluffo-Ordóñez

doi:10.20944/preprints202605.0910.v1

Submitted:

12 May 2026

Posted:

13 May 2026

You are already at the latest version

Abstract

Alzheimer’s disease (AD) is a progressive neurodegenerative disorder for which early and accurate diagnosis remains a critical challenge. In this work, we propose a Multi-ROI Multimodal 3D Vision Transformer for AD classification that integrates structural MRI data with clinical and volumetric biomarkers within a unified attention-based framework. The proposed approach leverages anatomically guided multi-region-of-interest (ROI) decomposition to focus on disease-relevant brain structures, including the hippocampus, entorhinal cortex, fornix, and major cortical lobes. Each ROI is encoded using 3D tubelet embeddings, while clinical and volumetric features are transformed into feature-wise tokens, enabling seamless multimodal fusion through self-attention mechanisms. A hemisphere-aware selection strategy is introduced to identify the most discriminative ROI representations, enhancing both performance and interpretability. The model is evaluated on a merged multi-cohort dataset combining ADNI, AIBL, and OASIS, using a 7-fold cross-validation protocol. Experimental results demonstrate that the proposed method achieves high classification performance, reaching an accuracy of 97.62% and an AUC of 0.9940, outperforming single-modality and whole-brain baselines. Furthermore, attention-based analysis provides interpretable insights into the relative importance of clinical and neuroanatomical features, revealing consistency with established AD biomarkers. These findings highlight the effectiveness of multimodal integration and ROI-based representation for robust and explainable AD classification.

Keywords:

Alzheimer’s disease

;

3D vision transformer

;

multimodal learning

;

multi-ROI decomposition

;

magnetic resonance imaging (MRI)

;

attention mechanisms

;

explainable artificial intelligence (XAI)

;

clinical data integration

;

volumetric biomarkers

;

deep learning

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Multi-ROI Multimodal 3D Vision Transformer for Alzheimer’s Disease Classification with Attention-Based Interpretability

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe