Introduction
Alzheimer's disease (AD) is a neurodegenerative disorder characterized by two primary neuropathological hallmarks: the aggregation of amyloid-beta (Aβ) plaques and hyperphosphorylated tau in neurofibrillary tangles. AD is known to be affected by multiple pathologies. Among various theories of the causes of AD, the amyloid-tau cascade is regarded as the most potent pathological pathway and potentially the most promising therapeutic target [
1]. The amyloid cascade hypothesis suggests that the AD pathology begins with intracellular Aβ accumulation, which subsequently triggers abnormal tau deposition and spread in the brain. As tau spreads in the brain, it triggers neurodegeneration, over-stimulated neuroinflammation with neural toxicity, structural brain atrophy, neuronal loss, and ultimately cognitive decline that leads to dementia [
2]. Unsurprisingly, pharmaceutical agents targeting amyloid and tau are regarded as the promising approaches for novel therapeutic development. The U.S. Food and Drug Administration (FDA) has currently fully approved two anti-amyloid drugs Lecanemab (Leqembi) and Donanemab (Kisunla) [
3,
4], and promising anti-tau and combination therapies are on the pipeline with some currently under clinical trials [
5]. As drug development advances, assessing these pathological changes becomes increasingly critical for both diagnostic and therapeutic strategies in clinical practice. Although amyloid appears earlier in the brain, tau deposition is increasingly recognized as a key indicator and the main driving force of neurodegeneration and cognitive decline [
6,
7,
8]. Unlike the simultaneously diffusive deposition of amyloid, tau pathology follows a well-defined Braak staging pattern and the specific stage of tau spread marks the individual’s advancement of tau pathology that is highly correlated to the subsequent cognitive loss [
9,
10].
In the current appropriate usage criteria of both Lecanemab and Donanemab, neither drug takes the tau staging information into consideration for determining the eligibility, which is mainly based on the cognitive symptoms, amyloid burden and other risk factors such as the risk of developing vascular side effects. This is perhaps attributed to two reasons. First, Phase III data of Donanemab submitted to FDA showed that, even though the efficacy in the high-tau patients is not as impressive as in the low-tau subjects, there was still a significant slowing of cognitive decline compared to the placebo group [
11]. Second, at this moment, there is not yet a drug or treatment option for high-tau patients that is nearing FDA approval or advancing far enough in their Phase III trials. As a result, both anti-amyloid drugs currently are being considered regardless of the tau status for an individual patient. This may be a sub-optimal therapeutic path because it is often hypothesized that, once there is advanced tau pathology in the brain, removal of amyloid has limited impact on the tau deposition and spread as well as its impact on neurodegeneration. Considering that anti-amyloid therapies are commonly associated with adverse effects, such as amyloid-related imaging abnormalities (ARIA), including brain edema and microhemorrhages which occurred in over 20% of patients in clinical trials [
12], patients with more advanced tau pathology may not effectively benefit from such therapies if the efficacy for cognitive preservation is not sufficiently high compared to the risks of vascular side effects. In addition to safety concerns, the financial burden of such therapies is substantial with approximately total costs of
$35000 to
$50000 to complete the therapy [
13,
14,
15]. Treatment regimens are also demanding, requiring biweekly or monthly infusions for extended periods—Lecanemab is designed for long-term use, while Donanemab recommends an 18-month course. As a result, there is an increasing advocacy for a more specific evaluation of the tau pathology in patients being considered for disease-modifying therapies of AD where tailored treatment regimens may maximize the treatment benefits while avoiding unnecessary risks.
Currently, there are two categories of in vivo staging tools for the tau pathology: PET-based and fluid-based biomarkers. Tau tracers have been FDA-approved for PET imaging to allow non-invasive measurements of the tau burdens and their spatial distribution. However, tau PET is not currently reimbursed and therefore is limited in the clinical accessibility. Fluid biomarkers, on the other hand, suffer from the incapability to directly measure and delineate the spatial distribution of tau in the brain. Emerging research explored blood-based biomarkers as a cost-effective and minimally invasive option, but these biomarkers are still in development and cannot replace cerebrospinal fluid (CSF) biomarkers or tau PET imaging [
16]. To address those challenges, it is desirable to develop classification models that provide tau staging by taking advantage of information and data that are routinely acquired during the evaluation process of anti-amyloid therapies to establish a surrogate biomarker for tau pathology.
There are previous attempts to establish surrogate biomarkers for tau staging without directly using tau PET or CSF biomarkers. Among them, the models built upon MR-based features are potentially the most competitive and appealing approach. First, almost all patients being considered for AD disease-modifying therapies will undergo an MRI scan prior to the treatment, mainly to rule out possibility of existing major or micro hemorrhages and major brain lesions such as trauma or tumors. If we may be able to predict or classify the tau staging with MR-derived features, the prediction/classification does not require additional exams and may directly utilize the individual’s existing MRI data to obtain the surrogate tau biomarkers. Second, several studies have identified significant correlations between structural MR-detected brain atrophy and tau deposition, particularly in amyloid-positive individuals [
17,
18,
19,
20]. Regions commonly associated with tau accumulation include the parietal, temporal, and medial temporal lobe (MTL), where cortical thinning and hippocampal atrophy are frequently observed. For example, Das et al. observed strong local correlations between MTL
18F-Flortaucipir uptake and longitudinal atrophy specifically in amyloid-positive individuals [
18]. Wang et al. reported a strong inverse association between hippocampal tau PET signal and MR-measured volume in amyloid-positive subjects, and found that tau uptake across AD cortical signature regions was linked to cortical thinning mirroring neurofibrillary tangle spread seen at autopsy, suggesting spatially ordered tau-related neurodegeneration [
19]. Timmers et al. showed that higher tau binding in Braak stages III/IV and V/VI was associated with reduced gray matter density in both local and broader distant cortical area [
20]. The relationship between tau deposition and brain structural changes extends beyond cortical regions, with subtle alterations also observed in hippocampal subfields and adjacent subregions. Das et al. identified significant correlations between MTL tau uptake and reduced volumes in the cornu ammonis 1 (CA1) and subiculum hippocampal subfields [
21]. Similarly, Berron et al. demonstrated that increased tau load is associated with local atrophy in the entorhinal cortex, Brodmann area 35, and the anterior hippocampus [
22]. These findings collectively suggest that tau pathology is either concurrent with atrophy in early brain regions or closely followed by structural brain changes, and the presence of amyloid pathology appears to strengthen this association. Moreover, studies have linked higher Aβ levels with more extensive and accelerated tau accumulation [
23,
24], supporting the notion that amyloid may amplify tau-driven structural degeneration in AD.
Given these associations, structural MRI metrics and amyloid burden emerge as strong candidate predictors for tau staging. While numerous studies have employed machine learning techniques to leverage multimodal data to predict amyloid status [
25,
26,
27], fewer have focused on predicting tau status. Kim et al. used a small cohort (n=64) to develop machine learning algorithms incorporating clinical data, neuropsychological tests, cortical thickness, and hippocampal volume to predict abnormal tau accumulation in Braak III/IV region in prodromal Alzheimer's disease, achieving an AUC of 0.86 [
28]. With a larger cohort (n=557), Lew et al. employed a deep learning approach directly using T1-weighted MRI, neuropsychological tests, and hippocampal volume to predict tau PET positivity in the meta temporal region, reporting an AUC of 0.73 with high sensitivity (96%) but lower specificity (31%) [
29]. The imbalanced performance may stem from the dataset’s skewed distribution (15% tau-positive) due to a high proportion of cognitively normal individuals without strict inclusion criteria for amyloid status or cognitive diagnosis. Karlsson et al. developed a machine learning model to predict tau positivity in Braak I-IV regions using MRI-derived features, achieving AUCs around 0.87-0.89 [
30]. Two potential challenges exist with the current models. First, these models were trained with patients that could be either amyloid-positive or negative. In the current amyloid-targeting therapies’ workflow, the tau status is more relevant for the patients who are amyloid-positive as the patients with both amyloid- and tau-positivity may benefit less from the amyloid-targeting therapies. A model specifically trained with amyloid-positive subjects’ data may be more useful for clinical applications. Second, considering the patients being considered for anti-amyloid therapies usually would have amyloid PET data, including the amyloid PET information for the amyloid burden in the models may further improve the model performances, as the amyloid burden is known to be associated with tau staging [
23,
31]. These findings showed opportunities to further improve classification models as current models require improved predictive power and/or evaluation over the amyloid-positive cohort. More clinically relevant, diverse cohorts and rigorous validation strategies are essential to enhance model robustness and clinical applicability.
In this study, we investigate whether surrogate biomarkers based on MR, amyloid PET, cognitive tests and patient demographic information can be developed with machine learning methods for classification of the tau status in amyloid-positive subjects. Data from ADNI and OASIS were used for model training and validation. Model performances under different data availability scenarios were tested to evaluate what prediction performances can be expected under these scenarios.
Materials and Methods
Datasets
This study utilized two publicly available datasets. The Alzheimer's Disease Neuroimaging Initiative (ADNI-3, adni.loni.usc.edu) data were used for model training and cross-validation [
32], and the Open Access Series of Imaging Studies (OASIS-3) dataset served as an external validation dataset [
33]. The inclusion criteria included: (1) All subjects must have standard T1-weighted MRI and PET imaging data, including amyloid (
18F-Florbetaben,
18F-Florbetapir) and tau (
18F-Flortaucipir) scans, collected within 6 months of the T1-weighted MRI. (2) All subjects must be amyloid-positive based on the amyloid PET results. Amyloid positivity was determined using a threshold of 20 Centiloid (CL) [
34]. Centiloid levels were made available by the UC Berkeley Study group for ADNI. OASIS-3 database also provided the analyzed and processed Centiloid values on their website. (3) Hippocampal-sparing subjects were excluded from this study as their tau spread pattens significantly deviate from the typical tau pathology. We applied Risacher et al.’s method to identify hippocampal-sparing subtypes in our cohort [
35]. Specifically, bilateral cortical total volume (CTV) was calculated by summing gray matter volumes from the lateral frontal, superior temporal, and lateral parietal cortices in both hemispheres, similarly for hippocampal volume (HV). Both CTV and HV were preadjusted for intracranial volume (ICV), scanner strength, age, and sex using regression-derived β coefficients from amyloid-negative, cognitively stable controls in ADNI. The residual values were then used to compute the HV:CTV ratio. Finally, hippocampal-sparing atrophy was defined as HV:CTV ratio greater than its 75th percentile in the amyloid-negative, cognitively stable controls, with HV residual > 0 and CTV residual < 0. The cohort exclusion process was illustrated in the
Figure 1. (4) Other than imaging studies, all subjects must have demographic information of age, gender, years of education, and APOE4 genotypes. (5) All subjects must have cognitive tests of Clinical Dementia Rating (CDR) and Mini-Mental State Examination (MMSE) within one year of the T1-weighted MRI study. Diagnosis of the cognitive state was made by ADNI but not in OASIS, so we did not take the diagnosis as part of the inclusion criteria. Included subjects could be cognitively normal, with mild cognitive impairment or with clinical dementia.
Features Used in the Machine Learning Model
Four categories of features were used in the development and validation of our machine learning models: (1) features extracted from standard T1-weighted structural MR, (2) amyloid PET Centiloid values, (3) cognitive tests and (4) subject demographic information.
To extract the structural features from T1-weighted MRI, MRIs were processed using FreeSurfer version 7.3.2 for brain parcellations using the Desikan–Killiany atlas with FreeSurfer’s ‘recon-all’ function, resulting in a feature set that included cortical volumes and thickness for 34 predefined cortical regions, 16 subcortical and 6 ventricular regions [
36]. Hippocampal subfield segmentation was performed with the standard T1-weighted MR using FreeSurfer’s ‘segmentHA_T1.sh’ function that generates 24 hippocampal subfield volumes. All volumetric features were normalized by individual total intracranial volume (ICV). PET Centiloid values were taken from ADNI and OASIS as provided on their websites. The cognitive tests included neuropsychological assessments of the CDR-Sum of Boxes (CDR-SOB) and the MMSE that both are available from ADNI and OASIS. Patient demographic information included age, gender, years of education, and APOE ε4 genotype.
Determination of the Tau Status
The reference tau staging based on tau PET was performed under the standardized uptake value ratio (SUVR) computed with the reported pipeline in ADNI [
32]. Tau SUVR in ADNI was computed and made available by the UC Berkeley group and we used the provided values [
37]. We then replicated the ADNI processing pipeline to process the OASIS-3 tau PET scans. Brain parcellation and cerebellar segmentation was performed using FreeSurfer and the SUIT (Spatially Unbiased Atlas Template of the cerebellum and brainstem) MATLAB toolbox [
38] to be applied to the tau PET images coregistered to MR. Partial volume correction was performed using the geometric transfer matrix approach provided by the PETPVC toolbox [
39]. After extracting the partial volume-corrected SUVs, the SUVR of the FreeSurfer-segmented regions were calculated using the inferior cerebellar gray matter as the reference region.
Our study defined the high tau burden based on tau-PET SUVRs measured in Braak III/IV composite regions, calculated as volume-weighted averages across predefined regions, including the parahippocampal, fusiform, lingual, amygdala, middle temporal, insula, caudal anterior cingulate, rostral anterior cingulate, posterior cingulate, isthmus cingulate, inferior temporal, and temporal pole. The cutoff (1.516) was chosen as the mean SUVR (1.185) plus two standard deviations (0.165) from cognitively normal participants from ADNI-3 dataset as the reference group [
40,
41].
Feature Selection
Feature selection was performed to identify the most informative brain regions from four sets of MRI-derived features (number of features in each set): cortical volume (34), cortical thickness (34), hippocampal subfield volume (24), and subcortical/ventricle volume (22). Only data from ADNI were included in the feature selection process. For each feature group, models were first trained for predicting tau positivity using combinations of five regions within the respective group. All possible combinations of five regions within one group under the three classifiers including logistic regression, support vector machine (SVM), and random forest were tested. Machine learning was done with scikit-learn machine learning package in Python [
42]. All the resultant models were then ranked based on the area under the receiver operating characteristic (ROC) curve (AUC). Within each feature group, the top 30 models with the highest AUCs were selected. The frequency of each region’s occurrence in these top 30 models was then calculated, resulting in a ranked list of regions’ importance amongst the group. This process was repeated for each feature group—cortical volume, cortical thickness, hippocampal subfield volume, and subcortical/ventricle volume. The final output of this feature selection process was four ranked lists of regions, which will be used for selecting the most useful regions in the subsequent training that incorporates additional clinical and demographic variables.
Model Training
We trained our prediction model using the ADNI dataset, employing machine learning models including logistic regression, support vector machine (SVM), and random forest. To identify the most suitable feature combination for this specific task, we employed a data-driven exploratory approach. We perform model training iterations with all combinations of clinical features (age, gender, APOE4), cognitive tests (CDR-SOB, CDR-O, and MMSE), Centiloid and MRI features (cortical, subcortical/ventricle volume, cortical thickness, and hippocampal subfield volume).
Model Evaluation through cross-validation and external validation
Internal cross-validation was performed with the ADNI data using leave-one-out cross-validation. Model performance metrics within ADNI was calculated, including AUC, accuracy, sensitivity, specificity, and F1-score under the cross-validation. We examined the best performing model (or more precisely, the best feature combination) under the following data availability scenarios: (1) full availability of all features, (2) All features but the CDR-SOB is available, and (3) All features but the Centiloid level is available. In each scenario, the model with the best AUC was chosen. If there are more than one models with the identically leading AUC, the model with the highest F1 was chosen. For the chosen feature combination with best AUC/F1, a model was trained with all the ADNI subjects without the leave-one-out holding. This model was then applied to the OASIS dataset to predict the tau positive/negative status as an external validation. Again, model performance metrics were generated for AUC, accuracy, sensitivity, specificity, and F1-score. OASIS-3 dataset was completely unseen during the feature selection and model training phase to ensure there is no data leakage. We reported the best performing model within each data availability scenario.
Model Performance Across Centiloid Ranges
We also examined the models’ performances in the sub-cohorts of medium and high amyloid burdens in order to evaluate whether a higher amyloid burden is related to the classification difficulties of the machine learning models to classify the tau staging status. Literature shows that a higher amyloid burden is associated with a higher likelihood of tau pathology [
23,
31] , However, whether tau prediction based on MR features is easier or more difficult in the two groups of medium versus high amyloid groups remains unclear. We used the same approach in model training and validation described previously to separately evaluate the model performances in those two sub-cohorts. Centiloid of 20-70 is regarded as the medium amyloid burden group while Centiloid greater than 70 forms the high amyloid burden group [
31,
43].
Statistical Analysis
For the comparison of demographic and clinical data, a two-sample t-test was used for continuous variables, and a chi-square test (χ2) was used for categorical variables including sex and APOE genotypes.
Results
Demographics and Clinical Characteristics of Participants
The demographic information of the study cohort is summarized in
Table 1. In ADNI, the tau-positive group showed a higher number of years of education (p = 0.003) and a higher frequency of APOE4 carriers (p < 0.001) than the tau-negative. In OASIS, the tau-positive group are older compared to the tau-negative group (p < 0.001), but there were no differences in education and frequency of APOE4 carriers between groups. Across datasets, tau-positive group had significantly worse scores in all the cognitive assessments (MMSE, CDR-SOB and CDR-O) and higher Centiloid.
Feature Selection on FreeSurfer MRI Variables
We have identified the key brain regions from four MRI-derived feature groups of hippocampal subfield volume, cortical thickness, cortical volume, and subcortical/ventricle volume through the feature selection procedure. The top three regions within each group, based on their frequency in the highest-performing models, were:
o Hippocampal Subfield Volume: Fissure, body, and tail
o Cortical Thickness: Entorhinal, inferior parietal, and postcentral cortex
o Cortical Volume: Middle temporal, inferior temporal, and medial orbitofrontal cortex
o Subcortical/Ventricle Volume: Inferior lateral ventricle, hippocampus, and amygdala
T-tests comparing tau-positive and tau-negative groups were summarized in
Table 2 and showed statistically significant differences (p < 0.001) in almost all these top-ranked regions, supporting their relevance for tau staging prediction. The exceptions without significant difference include the left- and right-fissure and the right-medial orbitofrontal cortex.
Model Performance in Predicting Tau Braak III/IV Status
To address different real-world clinical scenarios requiring tau staging estimation, we developed multiple predictive models tailored for distinct clinical applications. We found that the best classification performance was achieved when all data were available with MR, amyloid Centiloid, and CDR-SOB. Model 1, which includes all available variables, demonstrated the best overall performance with an AUC of 0.89 and 84.2% accuracy in the ADNI dataset and an AUC of 0.90 with 90.8% accuracy in the external validation dataset with OASIS. Model 2, which excludes CDR-SOB but keeps the Centiloid, achieved comparable AUC values (0.89 in ADNI and 0.90 in OASIS) but had slightly lower accuracy (81.8% in ADNI and 83.3% in OASIS). Model 3, which excludes Centiloid but keeps CDR-SOB, showed a further decline in sensitivity (71.3% in ADNI and 74.3% in OASIS) and a slightly lower AUC (0.88 in ADNI and 0.87 in OASIS). Overall, Model 1 exhibited the highest specificity (87.9% in ADNI, 95.3% in OASIS) and the best F1 score (74.4% in ADNI, 83.6% in OASIS), making it the most reliable choice when all input features are available. The high F1 score indicates that Model 1 achieves a good balance between sensitivity and precision for an effective detection of positive cases with a lower false positive rate. The drop in performance in Model 3 highlights the importance of Centiloid in tau staging estimation, suggesting that amyloid burden remains a crucial predictive factor.
Table 3.
Model Performance for Tau Staging Prediction in Internal (ADNI) and External (OASIS) Datasets.
Table 3.
Model Performance for Tau Staging Prediction in Internal (ADNI) and External (OASIS) Datasets.
| |
ADNI (n=380, 115 pos / 265 neg) |
OASIS (n=120, 35 pos / 85 neg) |
| |
SENS |
SPEC |
PREC |
F1 |
ACC |
AUC |
SENS |
SPEC |
PREC |
F1 |
ACC |
AUC |
| Model 1 |
75.7 |
87.9 |
73.1 |
0.744 |
84.2 |
0.89 |
80.0 |
95.3 |
87.5 |
0.836 |
90.8 |
0.90 |
| Model 2 |
78.3 |
83.4 |
67.2 |
0.723 |
81.8 |
0.89 |
80.0 |
84.7 |
68.3 |
0.737 |
83.3 |
0.90 |
| Model 3 |
71.3 |
83.4 |
65.1 |
0.681 |
79.7 |
0.88 |
74.3 |
87.1 |
70.3 |
0.722 |
83.3 |
0.87 |
| Model Type & Input Feature |
Model 1 (SVM): Our best model with highest AUC/F1 score Age, CDR-O, Centiloid, Top4 subcortical/ventricular volume, Top3 hippocampal subfield volume, Top2 cortical thickness, Top1 cortical volume Model 2 (logistic regression): Best model without CDR-SOB Age, sex, Centiloid, Top3 hippocampal subfield volume, Top2 cortical thickness, Top1 cortical volume Model 3 (SVM): Best model without Centiloid Age, sex, MMSE, Top3 hippocampal subfield volume, Top2 cortical thickness, Top1 cortical volume |
| *SENS, sensitivity (%); SPEC, specificity (%); PREC, precision (%); ACC, accuracy (%); AUC, Area under the receiver operating characteristic curve. |
Comparison of Model Performance Across Centiloid Ranges
To further assess the impact of Centiloid on tau staging prediction, we compared the performance of Model 1 (with the full set of features including Centiloid) and Model 3 (without Centiloid) between medium and high amyloid cohorts in both ADNI and OASIS cohorts. In the medium Centiloid group (20–70 Centiloid), Model 1 exhibited imbalanced performance that highly favors the specificity with reduced sensitivity in both datasets. In ADNI (23pos/166neg), it achieved 95.8% specificity but only 47.8% sensitivity, while in OASIS (8pos/61neg), specificity reached 100%, but sensitivity dropped further to 37.5%. This suggests that Model 1 tends to classify cases with lower Centiloid as tau-negative, possibly over-relying on Centiloid values in decision-making. In contrast, Model 3, which does not include Centiloid, demonstrated better sensitivity (65.2% in ADNI and 50% in OASIS), suggesting that it may depend more on MRI-derived structural features and cognitive measures, leading to a more balanced sensitivity-specificity trade-off in this group. Balanced accuracy was slightly higher for Model 3 (76.9% in ADNI and 70.1% in OASIS) compared to Model 1 (71.8% in ADNI and 68.8% in OASIS) in both datasets, reinforcing its ability to better detect tau pathology in this range.
In the high Centiloid group (≥ 70 Centiloid), Model 1 consistently outperformed Model 3 in both datasets. In ADNI (92pos/99neg) where Model 1 achieved 82.6% sensitivity and an F1 score of 78.8%, compared to Model 3’s 72.8% in both sensitivity and F1 score. The pattern was even more pronounced in OASIS (27pos/24neg) where Model 1 reached 92.6% sensitivity, and 83.3% specificity compared to 81.5% and 79.1% in Model 3. This indicates that when Centiloid levels are high, Model 1 takes advantage of Centiloids and strengthens its predictive power, allowing it to better identify tau-positive cases with high sensitivity while maintaining competitive specificity and overall classification performance.
Overall, these findings suggest a trade-off between the two models depending on Centiloid levels. Model 3's greater dependence on MRI variables allows it to better capture nuanced structural changes associated with tau pathology, leading to higher sensitivity without over-relying on Centiloid in the medium CL range. In contrast, Model 1 outperforms Model 3 in the high Centiloid range, achieving the highest sensitivity and F1 scores. This suggests that Centiloid is not only a binary marker for amyloid positivity but also carrying additional and valuable information for predicting tau pathology through its quantitative levels of amyloid burdens.
Table 4.
Comparison of Model 1 and Model 3 Performance Across Centiloid Ranges in ADNI and OASIS Cohorts.
Table 4.
Comparison of Model 1 and Model 3 Performance Across Centiloid Ranges in ADNI and OASIS Cohorts.
| |
|
ADNI |
OASIS |
| |
|
SENS |
SPEC |
F1 |
ACC |
AUC |
SENS |
SPEC |
F1 |
ACC |
AUC |
Medium CL |
Model 1 |
47.8 |
95.8 |
0.536 |
90 |
0.82 |
37.5 |
100 |
0.546 |
92.8 |
0.71 |
| Model 3 |
65.2 |
88.6 |
0.526 |
85.7 |
0.88 |
50 |
90.2 |
0.444 |
85.5 |
0.73 |
High CL |
Model 1 |
82.6 |
74.8 |
0.788 |
78.5 |
0.85 |
92.6 |
83.3 |
0.893 |
88.2 |
0.90 |
| Model 3 |
72.8 |
74.8 |
0.728 |
73.8 |
0.84 |
81.5 |
79.2 |
0.815 |
80.4 |
0.88 |
| *CL, Centiloid; SENS, sensitivity (%); SPEC, specificity (%); PREC, precision (%); ACC, accuracy (%); AUC, Area under the receiver operating characteristic curve. |
Discussion
The emergence of amyloid-targeting therapies, such as Lecanemab and Donanemab, has reshaped the therapeutic landscape for AD. However, their efficacy may vary due to the disease progression, especially the advancement of tau pathology. An accurate stratification of patients for their tau status may be helpful and desirable in clinical pre-treatment evaluation of AD patients. Notably, results from the TRAILBLAZER-ALZ 2 clinical trial demonstrated that Donanemab provided greater clinical benefit when administered to patients with lower tau burden, as determined by tau-PET imaging [
11]. Given the high cost and limited availability of tau-PET, MRI-based biomarkers could serve as a cost-effective stratification tool to identify patients most likely to benefit from anti-amyloid therapy. In addition, there are several promising candidates of anti-tau therapies in phase III clinical trials such as E2814 and LMTM [
44,
45]. Combinational therapies that target both amyloid and tau are also being explored to address the multifaceted nature of AD. For example, the Alzheimer’s Tau Platform (ATP) is evaluating the combined effects of amyloid- and tau-targeting interventions, with preliminary findings suggesting a multi-modal approach may enhance disease modification compared to monotherapies [
46]. Although tau-PET remains the gold standard for identifying patients most suitable for these therapies, its limited accessibility and high costs pose a significant barrier for being widely used as a routine exam. A surrogate tau marker could serve as a valuable pre-screening tool, facilitate better clinical trial design and direct these therapies towards the right patient populations, ultimately paving the way for a more personalized treatment paradigm for AD.
Our study aimed to investigate the effectiveness of machine learning models based on amyloid-PET and MRI-derived imaging features—both routinely available during the pre-treatment evaluation of anti-amyloid therapies—to predict tau-PET staging status in amyloid-positive patients as a surrogate marker for tau pathology without requiring tau-PET imaging or lumbar puncture for CSF tau tests. This focus aligns with the current anti-amyloid pre-treatment evaluation procedures and is specifically designed to target the amyloid-positive patients that are potentially eligible for anti-amyloid therapies. We leveraged two independent datasets, ADNI (n=380) and OASIS (n=120) that provided a sufficiently large sample size for model training and validation. Our best-performing model, a SVM classifier utilizing features of age, CDR-SOB, Centiloid, and selected subcortical and cortical measurements, achieved a strong performance with 75.7% sensitivity, 87.9% specificity, and AUC of 0.89 in ADNI cross-validation, and 80% sensitivity, 95.3% specificity, and AUC of 0.90 in external validation with OASIS, as detailed in
Table 1. If CDR-SOB is unavailable, our best performing model still reached nearly the same AUC with a slight drop in accuracy of roughly 4% from Model 1. However, if amyloid Centiloid is unavailable, a more significant performance drop was observed with a drop in AUC by 0.01 to 0.03 and accuracy drop of approximately 7%. Overall, our results suggest that it is feasible to achieve a satisfactory classification performance to predict the tau status with structural MRI, amyloid PET and optionally available cognitive test of CDR. No additional information or scans are required in our models. It is also noteworthy that our method only requires the standard T1-weighted structural images without needing the high-resolution T2 images for hippocampal subfield segmentation. Although the current model performance may yet be clinically ready to stage the tau spread, such models may serve as screening tools for personalized treatment planning. For example, patients that are classified as high tau likelihood may be considered for tau PET scans as a verification of the tau pathology, while patients predicted to be most likely with low or moderate tau pathology may proceed with the anti-amyloid treatment. The proposed model and feature sets may also serve as a good backbone for future model development of similar classification tasks.
Compared to prior studies by Lew et al. [
29] and Karlsson et al. [
30], the main difference of our work is in the study design, where we specifically targeted only the amyloid positive subjects. The key rationale for our study design is to mimic the scenarios encountered in the pre-treatment evaluations of anti-amyloid therapies. Only 15% of Lew et al.’s cohort were tau-positive, limiting the model's ability to generalize to more relevant clinical contexts in the amyloid-positive dementia patients where tau positivity rates can be higher. Kim et al. targeted amyloid-positive patients but relied on a small sample (n=64) [
28]. In contrast, our model specifically targets amyloid-positive patients with a 30% tau-positive rate, resembling the patient distribution in the donanemab TRAILBLAZER-ALZ 2 Phase 3 study (where 30.3% are tau positive). Methodologically, while Lew et al. employed resource-intensive deep learning approaches, we utilized lightweight machine learning with feature selection to enhance model efficiency and to reduce the overfitting risk, making our model potentially easier for clinical applications. Another advantage of our model is that we generated the MR features with T1-weighted images and FreeSurfer, one of the most popular and commonly used research open-source tool for processing brain MRI, so that future deployment of the developed model can be easier and more efficient in other institutions. Performance-wise, our model outperforms Kim et al.’s (AUC 0.86) and Lew et al.’s (AUC 0.73, specificity 31%), demonstrating higher AUCs and a more balanced sensitivity-specificity profile across datasets. A recent work of Karlsson et al. demonstrated that MR-based tau classification is feasible with a large sample size (n>1300) [
30]. Similar model performances were found between their work and ours as the best-performing model reaches 0.89 AUC. However, similar to Lew et al.’s work, this study did not specifically target the amyloid-positive cohort. In terms of the included features for machine learning, Karlsson et al.’s work utilized FreeSurfer volumes, surface areas and thicknesses of cortical regions, volumes of subcortical brain regions, total white matter hyperintensities, whereas we have additionally included the amyloid burden and hippocampal subfields. It will be desirable to further expand our datasets for model training and validation by including more sources of data, so that a more comprehensive training and testing for the proposed models may be achieved in the future.
Our predictive modeling analysis revealed important trade-offs between models with and without amyloid burden quantification made through the Centiloid. In the complete cohort, Model 1 (with Centiloid) outperformed other models in overall accuracy, specificity, and F1 score, making it the most reliable choice when all input features are available. However, in the medium Centiloid sub-cohort (20–70 Centiloid), Model 1 exhibited lower sensitivity, likely due to an over-reliance on Centiloid values, which may lead to the misclassification of tau-positive cases as false negatives. In contrast, Model 3 (without Centiloid) demonstrated better sensitivity and balanced accuracy in this sub-cohort, suggesting that relying more on MRI-derived structural features and cognitive measures better detected high tau patient in lower Centiloid range. In contrast, for the high Centiloid sub-cohort (≥70 Centiloid), Model 1 showed a superior performance, achieving the highest overall accuracy across datasets. This suggests that Centiloid level is particularly valuable in distinguishing tau positivity when amyloid burden is high. Inclusion of Centiloid significantly impacted model performance, indicating that an adaptive approach that adjusts Centiloid weighting based on tau burden may further improve classification accuracy. It also suggests that high-amyloid and medium-amyloid subjects may own different trajectories for developing the tau pathology. Future work should explore integrating dynamic weighting mechanisms for Centiloid and MRI-derived features to optimize sensitivity and specificity across different tau burdens.
There are two limitations of this study. First, there is not a standardized definition or consensus for determining the tau status with tau-PET. We defined tau burden levels using tau-PET SUVR in Braak III/IV regions, as Braak I/II is primarily age-related and not specific to AD, while Braak V/VI may represent a stage too advanced for effective anti-amyloid intervention [
47,
48]. However, there is no established consensus on defining tau-PET SUVR cutoffs across different Braak stages. We applied a cutoff set at the mean SUVR plus two standard deviations, a commonly used criteria in the literature [
40,
41], using cognitively normal participants from the ADNI-3 dataset as the reference group. The observed tau positivity rate (~30%) in our study is similar with findings from previous Donanemab trials. The other limitation lies in the sample size of the study. Although we were able to achieve a moderate sample size of 500 cases for about 2:1 training/validation data ratio, a larger sample size will indeed benefit the model generalizability and reliability. Future expansion of the datasets shall be beneficial for the proposed model, especially if cohorts with greater ethnic and racial diversities may be included.