Functional-vs-Cognitive Decline Heterogeneity in Alzheimer’s Disease: LLM-Curation of Decades of Clinical Notes and GLP1RA Trial Design

Karthik Murugadoss; A.J. Venkatakrishnan; Gowtham Varma; Sunil Kumar Ravi; Gourab Saha; Venky Soundararajan

doi:10.20944/preprints202606.0711.v1

Submitted:

08 June 2026

Posted:

09 June 2026

You are already at the latest version

Abstract

EVOKE and EVOKE+ trials showed no population-level slowing of early symptomatic Alzheimer’s disease (AD) with semaglutide, motivating an assessment of whether clinical-note phenotyping over decades of routine care could identify distinct AD trajectories. Here we analyzed de-identified EHRs from a 29 million-patient federated network using LLMs to extract standardized cognitive, functional, global severity, and neuropsychiatric note-derived scores with high physician-adjudicated accuracy. Among 131,824 AD patients, 42,242 had ≥1 extracted assessment and 341 initiated semaglutide after diagnosis. Composite scores declined from 0.99 at 25.1 years before diagnosis to 0.64 at 4.9 years after diagnosis. Compared with matched non-AD controls, AD patients had lower ADCS-MCI-ADL scores at year 4 (28.1 versus 47.0 points; P<0.001) and higher FAQ impairment across follow-up. In paired score-change analyses, semaglutide-treated patients had more favorable MMSE change than matched controls (+0.4 versus −2.3 points; P=0.022) and more favorable MoCA change (−0.3 versus −2.2 points; P=0.049). Among 2,169 patients with both decline domains, cognitive decline preceded functional decline (“cognitive decline-first”) in 1,123 patients (51.8%), functional decline preceded cognitive decline (“functional decline-first”) in 741 patients (34.2%). Cognitive decline-first trajectories were more prevalent, with 631 patients (29.1%) showing cognitive decline at least 12 months before functional decline, compared with 327 patients (15.1%) showing functional decline at least 12 months before cognitive decline. Cognitive decline-first patients also had higher medication burden and greater note-derived symptom burden. Functional decline-first patients were enriched for APOE ε3/ε3 (OR 2.82; P=0.023) motivating the need for trajectory-aware and genetically stratified approaches to Alzheimer’s disease trial design.

Keywords:

semaglutide

;

Alzheimer's disease

Subject:

Medicine and Pharmacology - Endocrinology and Metabolism

Introduction

Alzheimer’s Disease (AD) is a progressive neurodegenerative disorder in which memory, cognition, behavior and functional independence decline over time [1]. Although amyloid- and tau-directed biology have dominated therapeutic development, AD does not unfold in isolation from systemic physiology [2]. Diabetes, obesity, insulin resistance, vascular disease, neuroinflammation and frailty are increasingly recognized as clinically relevant contexts in which neurodegenerative vulnerability and resilience may differ across patients [3,4,5]. This creates an opportunity to identify heterogeneity among AD patient trajectories, including patients who may demonstrate slowed progression, stabilization or apparent improvement following their diagnosis.

In real-world clinical practice and clinical trials, the progression of AD is typically monitored using standardized instruments such as the Mini-Mental State Examination (MMSE), Montreal Cognitive Assessment (MoCA), and Clinical Dementia Rating (CDR) [6,7]. Prior prospective observational studies in AD patients have demonstrated that cognitive and functional decline are often correlated and that correlations cognitive decline tends to predict subsequent functional decline in AD patients. However, large-scale analyses of the temporal patterns of cognitive versus functional decline at the individual patient level are lacking. Historically, these patterns have been difficult to ascertain retrospectively from electronic health records (EHRs), at least in part due to the documentation of standardized instrument scores in unstructured clinical notes rather than easily accessible structured data tables. Large language models (LLMs) and related natural language processing approaches address this barrier by providing scalable means by which these metrics and other AD-relevant content can be extracted from clinical notes [8,9].

In addition to their well established effects on glycemic control and weight management, GLP-1 receptor agonists have emerged as an intriguing therapeutic class for brain health. Indeed, multiple real-world studies in patients with type 2 diabetes mellitus (T2DM) have demonstrated associations between GLP-1 receptor agonist exposure and lower risk of dementia or AD diagnosis [10,11]. First, in a large target-trial emulation using nationwide U.S. electronic health records, semaglutide was associated with a substantially lower risk of first-time AD diagnosis compared with multiple other antidiabetic medications [10]. Second, a target-trial emulation also reported lower risk of AD and other dementias among T2DM patients taking GLP-1 receptor agonists or SGLT2 inhibitors compared to those taking other glucose lowering drugs [11]. The mechanistic basis for potential roles of GLP-1 receptor signaling in modulating neurocognitive function are not delineated, but its connections to diverse pathways including insulin sensitivity, vascular function, inflammation, mitochondrial homeostasis, cellular stress responses and neuroimmune modulation provide plausible hypotheses [12,13].

The recent EVOKE and EVOKE+ phase 3 trials provide an essential counterpoint. In participants with early symptomatic, amyloid-confirmed AD, oral semaglutide 14 mg did not significantly slow clinical progression compared with placebo [14]. These findings underscore the need to better understand how semaglutide is used and what outcomes are observed in routine clinical practice, where patient populations, treatment patterns, and disease trajectories differ substantially from those represented in randomized trials. In this context, it is important to understand whether a subgroup of semaglutide-exposed AD patients display clinical stabilization or apparent improvement that can be reproducibly identified and then interrogated with deeper causal and biological analyses.

Here we used de-identified health record data from a federated network of more than 29 million patients to characterize the natural history of AD and to investigate individuals who initiated semaglutide after AD diagnosis. We first compare the trajectories of standardized assessment scores over multiple decades in patients with AD versus a matched non-AD control population. We then compare repeated measures of standardized assessments in propensity matched AD cohorts with versus without semaglutide exposure. Our results highlight heterogeneity in the order of onset for cognitive and functional decline in AD and support a possible role for GLP-1 receptor agonism in supporting neurocognitive function.

Results

Study Cohort Demographics and Extracted Clinical Outcome Assessments

From a federated electronic health record network of approximately 29 million patients, 131,824 patients with a structured AD diagnosis were identified, of whom 42,242 had at least one documented clinical disease assessment (Figure 1A, Table 1). Clinical assessments were extracted using LLMs from unstructured clinical notes (see Methods; Table 2). Cognitive assessments included the Mini-Mental State Examination (MMSE) and Montreal Cognitive Assessment (MoCA); functional assessments included the Functional Activities Questionnaire (FAQ), Alzheimer’s Disease Cooperative Study Activities of Daily Living (ADCS-ADL), and ADCS Mild Cognitive Impairment Activities of Daily Living (ADCS-MCI-ADL); global disease severity assessments included Clinical Dementia Rating Global Score (CDR Global) and Clinical Dementia Rating Sum of Boxes (CDR-SB); and neuropsychiatric assessments included Neuropsychiatric Inventory Total (NPI Total), Neuropsychiatric Inventory Questionnaire Severity (NPI-Q Severity), and Neuropsychiatric Inventory Questionnaire Distress (NPI-Q Distress). Physician adjudication of LLM-extracted assessment scores showed high reliability in the main reportable instrument set, with 140 of 154 extractions correct (90.9% accuracy; Table 3, Table 4, Table 5, Table 6 and Table 7; see Supplementary Methods).

The AD cohort had a mean event age of 78.3 years (SD 7.4) (Table 1). Among patients with recorded sex, the cohort was predominantly female (23,808 [56.4%]). The cohort was largely White (36,002 [85.2%]), with Black or African American patients representing 4,165 patients (9.9%); ethnicity was recorded as not Hispanic or Latino for 35,040 patients (83.0%), Hispanic or Latino for 763 patients (1.8%), and unknown for 6,439 patients (15.2%). Comorbidities recorded within the five years prior to the index AD diagnosis were assessed, with essential hypertension being the most prevalent, present in 20,782 patients (49.2%), followed by amnesia (16,801 [39.8%]) and hyperlipidemia (16,088 [38.1%]). Metabolic and cardiovascular conditions were common, with type 2 diabetes mellitus affecting 7,978 patients (18.9%), coronary artery disease affecting 6,430 patients (15.2%), and mixed hyperlipidemia affecting 7,623 patients (18.0%).

Biomarker characterization was available for subsets of the AD cohort (Table 1). Among the 42,242 AD patients with score-bearing documentation, 12,635 (29.9%) had clinical documentation referencing amyloid testing or amyloid status, of whom 4,461 (35.3%) were confirmed amyloid positive. APOE genotype data were available for 2,477 patients. The most common genotypes were APOE ε3/ε4 (951 patients, 38.4%), APOE ε3/ε3 (570 patients, 23.0%), and APOE ε4/ε4 (419 patients, 16.9%). Overall, 1,470 of 2,477 genotyped patients (59.3%) carried at least one APOE ε4 allele, consistent with the known enrichment of APOE ε4 among patients with AD. These biomarker data provided additional molecular characterization of the study cohort.

Longitudinal AD Trajectories Reveal Multidomain Decline That Is Incompletely Represented in Clinical-Trial Endpoints

To characterize the natural history of AD as captured in routine clinical documentation, we analyzed longitudinal cognitive, functional, and neuropsychiatric assessments from 42,242 individuals with AD (Figure 1B, Figure S1). Scores for each instrument were normalized to a common 0-1 severity-oriented scale, where 0 indicated greatest impairment and 1 indicated least impairment, enabling comparison across instruments with different native scoring ranges (see Methods). Across all available measures, the aggregate trajectory demonstrated progressive decline beginning more than two decades before the first recorded AD diagnosis and continuing throughout follow-up. This is consistent with prior longitudinal studies showing that cognitive impairment can be detected up to 18 years before clinical AD dementia diagnosis [15]. The composite normalized score decreased from 0.99 at 25.1 years before diagnosis to 0.64 at 4.9 years after diagnosis, with the steepest diagnosis-proximal 1-year decline observed from 12–9 months before diagnosis to 0–3 months after diagnosis (0.720 to 0.677; change, −0.043 normalized units per year). These findings indicate a prolonged EHR-visible prediagnostic period followed by accelerated deterioration around clinical recognition of disease.

A particularly notable pattern in this longitudinal comparison of assessment trajectories was the apparent temporal separation between functional or activity-linked decline and neurocognitive decline. The FAQ trajectory, which captures impairment in instrumental activities of daily living, was measurable 7.9 years before diagnosis at 0.795, remained similar at 5.1 years before diagnosis at 0.808, and then declined more sharply to 0.625 in the 3 months before diagnosis, 0.479 at 9-12 months after diagnosis, and 0.274 at 4.9 years after diagnosis. By contrast, MMSE declined from 0.894 at 10.1 years before diagnosis to 0.850 at 5.1 years before diagnosis, then to 0.801 at 2.1 years before diagnosis and 0.722 near diagnosis; MoCA showed a similar diagnosis-proximal decline, from 0.676 at 5.1 years before diagnosis to 0.612 at 2.1 years before diagnosis and 0.557 near diagnosis. The aggregate normalized trajectory declined from 0.868 at 10.1 years before diagnosis to 0.808 at 5.1 years before diagnosis, 0.749 at 2.1 years before diagnosis, 0.686 near diagnosis, and 0.639 at 4.9 years after diagnosis. Although these estimates are descriptive cohort-level trajectories, they suggest that activity-associated deterioration is detectable years before diagnosis, with the steepest aggregate deterioration concentrated from approximately 2 years before diagnosis through the first 2 years after diagnosis.

Functional assessments showed the greatest deterioration: FAQ declined from 0.625 near diagnosis to 0.274 at 4.9 years and 0.236 at 6.6 years after diagnosis, ADCS-ADL declined from 0.492 near diagnosis to 0.339 at 4.9 years and 0.278 at 7.1 years, and ADCS-MCI-ADL declined from 0.672 near diagnosis to 0.501 at 4.9 years and 0.362 at 7.6 years. Measures of global disease severity also worsened, with CDR Global declining from 0.739 near diagnosis to 0.494 at 4.9 years and CDR-SB declining from 0.795 near diagnosis to 0.532-0.537 at 3.9-4.9 years after diagnosis. Cognitive measures declined more gradually after diagnosis: MMSE decreased from 0.722 near diagnosis to a post-diagnostic minimum of 0.663 at 2.9 years, whereas MoCA was lower near diagnosis at 0.557 and reached a post-diagnostic minimum of 0.520 at 5.6 years. By contrast, neuropsychiatric outcomes remained comparatively stable, with NPI Total remaining above 0.929 through 8 years after diagnosis, NPI-Q Severity ranging from 0.789 to 0.827, and NPI-Q Distress ranging from 0.855 to 0.891 over the same post-diagnostic interval. Together, these trajectories show that AD progression in routine care is not captured by cognition alone: functional measures showed the greatest post-diagnostic deterioration, cognitive measures showed more diagnosis-proximal decline, and neuropsychiatric measures followed more stable or heterogeneous trajectories.

We also evaluated longitudinal outcome trajectories among 341 patients with AD who initiated semaglutide after diagnosis (Figure 1C, Figure S2). The median time between AD diagnosis and semaglutide initiation was 1.8 years (IQR 0.8, 3.8). Before treatment initiation, the composite normalized score declined over eight years, consistent with progressive disease. Following semaglutide initiation, the composite trajectory remained comparatively stable, fluctuating between approximately 0.68 and 0.81 from initiation through follow-up (median 2 years, IQR 1-5 years). Although modest decline was observed during the first two years after initiation, trajectories subsequently plateaued without evidence of sustained acceleration. Cognitive outcomes appeared relatively preserved, with post-initiation MMSE values ranging from 0.77 to 0.93 and MoCA values ranging from 0.59 to 0.70 through 8 years after treatment initiation. MMSE values remained largely within the range of 0.80-0.92 throughout follow-up, while MoCA trajectories stabilized after earlier decline and remained predominantly between 0.60 and 0.70. Functional measures demonstrated greater variability, with FAQ scores worsening during the first 1–2 years after initiation before partial recovery and subsequent stabilization. Neuropsychiatric outcomes (NPI Total, NPI-Q Severity and Distress) remained high without sustained post-initiation deterioration in the available trajectory data: NPI Total ranged from 0.94 to 0.99, NPI-Q Severity from 0.82 to 0.84, and NPI-Q Distress from 0.83 to 0.88 following semaglutide initiation. Overall, post-initiation trajectories were characterized by relative stability across multiple outcome domains, particularly cognitive and neuropsychiatric measures. We next asked whether the multidomain decline captured in routine-care trajectories was reflected in AD clinical trial outcome selection (Figure 1D). Among 864 AD trials with matched outcome instruments, cognitive measures were most frequently represented, with MMSE used in 469 trials and MoCA in 212 trials. Global staging, functional, and neuropsychiatric instruments were less consistently represented, including CDR-SB in 225 trials, ADCS-ADL in 181 trials, and NPI Total in 261 trials. These findings suggest that AD trials remain weighted toward cognitive endpoints, despite routine-care trajectories showing substantial decline across multiple non-cognitive domains.

Functional and Cognitive Decline Timing Differs Across Patients

Decline was defined from repeated raw score measurements as the first confirmed worsening episode within a domain: for each instrument, serial scores were ordered over time, worsening was assigned according to the instrument direction (Table 2), and the confirmation date was the first later score date at which a worse value was actually observed. Functional decline was measured using the Functional Activities Questionnaire (FAQ; higher is worse), Alzheimer’s Disease Cooperative Study-Activities of Daily Living (ADCS-ADL; lower is worse), and ADCS Mild Cognitive Impairment Activities of Daily Living (ADCS-MCI-ADL; lower is worse). Cognitive decline was measured using the Mini-Mental State Examination (MMSE; lower is worse) and Montreal Cognitive Assessment (MoCA; lower is worse). Among 3,467 patients with 2+ repeated functional and/or cognitive score measurements, 2,883 (83.2%) had confirmed functional decline and 2,535 (73.1%) had confirmed cognitive decline, with both functional and cognitive decline observed in 2,169 (62.6%) patients. Within this group, cognitive decline preceded functional decline in 1,123 patients (51.8%), functional decline preceded cognitive decline in 741 patients (34.2%), and both declines were confirmed on the same date in 305 patients (14.1%). The median gap, defined as the functional decline confirmation date minus the cognitive decline confirmation date, was 28 days, indicating a slight overall shift toward earlier cognitive decline.

However, the distribution was heterogeneous: 327 patients (15.1%) had functional decline at least 12 months before cognitive decline (“early functional decline”, Figure 1E), 1,211 patients (55.8%) had both declines documented within 12 months of each other (Figure 1F), and 631 patients (29.1%) had cognitive decline at least 12 months before functional decline (“early cognitive decline”, Figure 1G). In the group with functional decline documented at least 12 months before cognitive decline, the median gap was -1.66 years, whereas the median gap was 2.05 years in the group with cognitive decline documented at least 12 months before functional decline. The two groups were demographically similar: mean age was 74.5 versus 74.2 years in the early functional decline and early cognitive decline groups, respectively; the female proportion was 58.0% versus 58.8%, and race distributions were broadly comparable, with White patients comprising 72.8% versus 78.9%. This indicates that both temporal patterns occur in routine-care data, but cognitive decline more commonly preceded functional decline by more than 12 months than the reverse.

Medication burden was higher when cognitive decline preceded functional decline, including in the two years before structured AD diagnosis (median 10 versus 8 prescriptions; P = 0.0029) and during the interval between functional and cognitive decline (median 12 versus 9 prescriptions; P < 0.001) (Figure 2A; See Supplementary Methods). During this interval, the early cognitive decline group also showed higher exposure to several medication classes, including opioids, corticosteroid/immunomodulating therapies, osteoporosis medications, antimicrobials, anticoagulant/antiplatelet agents, GI medications, non-opioid analgesics, and antipsychotics (Figure 2B-C, Table S1). Note-derived phenotypes showed a similar pattern: the early cognitive decline group had a higher median number of unique target phenotypes per patient during the interval between cognitive and functional decline (5 [IQR 2-8] versus 3 [IQR 1-5]; P < 0.001) (Figure 2D). Seven phenotypes remained significant after false-discovery-rate correction, all enriched in the early cognitive decline group, including headache, Parkinsonism, vomiting, diarrhea, rigidity, nausea, and tremor.

APOE Genotype Patterns Suggest Functional Decline-First AD Is Not an APOE ε4-Enriched Trajectory

Among 2,477 patients with available APOE testing, APOE ε3/ε3 was more frequent in the functional decline-first group than in the cognitive decline-first group (15/40 [37.5%] vs 13/74 [17.6%]; OR 2.82, 95% CI 1.17–6.76; Fisher’s exact P=0.023). APOE ε3/ε4 frequency was similar between groups (15/40 [37.5%] vs 30/74 [40.5%]; OR 0.88, 95% CI 0.40–1.94; P=0.842). Other APOE genotype strata were suppressed because of small cell counts. These exploratory analyses suggest that functional decline-first AD is not enriched for APOE ε4 heterozygosity and may represent a decline-order phenotype less dominated by canonical APOE ε4-associated Alzheimer genetic risk.

Functional and Cognitive Impairment Diverges from Matched Controls Before AD Diagnosis

To contextualize disease progression against background aging and routine-care measurement patterns, we compared raw longitudinal outcome trajectories between individuals with AD and matched non-AD controls aligned to the index date (Figure 3). Functional outcomes exhibited the largest and most sustained post-index separation from matched controls. ADCS-MCI-ADL declined among AD patients from 43.5 points at 4.9 years before the index date to 25.1 points at 4.9 years after the index date, whereas matched controls remained higher over the same window; at the year-4 statistical landmark, scores were 28.1 versus 47.0 points for AD and controls, respectively (P < 0.001). ADCS-ADL showed a similar post-index separation, declining from 42.0 to 23.7 points in AD patients between 4.9 years before and 4.9 years after index, with significant differences from 1 year after index onward (P = 0.004 at year 1; P < 0.001 at years 2-4). FAQ scores, where higher values indicate worse function, increased from 5.3 to 20.9 points in AD patients across the same window, compared with 2.3 to 9.6 points in controls, with significant separation from the index window onward (all P < 0.001 through year 4).

Cognitive measures also diverged before and after index, but with smaller absolute post-index separation than the functional measures. MMSE declined from 25.7 to 20.7 points in AD patients while remaining comparatively preserved in controls at 26.3 to 24.4 points, with significant separation beginning 3 years before index and persisting through year 4 (all P < 0.001). MoCA showed earlier separation, with AD patients declining from 20.1 to 16.6 points while controls remained near 20.6-20.9 points; differences were already significant 4 years before index (P = 0.035) and remained significant thereafter (P < 0.001 across all time). Neuropsychiatric measures showed smaller and less consistent divergence, with NPI-Q Severity significant at index, year 1, and year 4 (P = 0.026, P = 0.003, and P < 0.001, respectively) and NPI-Q Distress significant at years 1 and 4 (both P < 0.001). CDR-SB increased in AD patients from 1.3 points at 3.9 years before index to 4.1 points near index and 7.5 points at 4.1 years after index, consistent with progressive global disease severity. Collectively, these analyses indicate that measurable functional and cognitive impairment diverges from matched controls before formal AD diagnosis in routine clinical practice, whereas neuropsychiatric separation is more modest and heterogeneous

In the semaglutide-treated AD cohort, matched comparisons with non-semaglutide AD controls showed peri-index cognitive differences in MMSE and MoCA, as well as lower FAQ impairment around the index year (P = 0.017), whereas other functional and neuropsychiatric comparisons were less consistent and limited by smaller sample sizes (Figure S3; See Supplementary Methods).

Longitudinal Anthropometric, Metabolic, and Laboratory Trajectories Following AD Diagnosis and Semaglutide Initiation

To investigate systemic physiological changes accompanying AD progression, we analyzed longitudinal anthropometric, metabolic, nutritional, endocrine, and laboratory trajectories relative to the first recorded AD diagnosis and compared them with non-AD controls (Figure 4). For the non-AD controls, an index date was defined to anchor these temporal analyses based on the age of the matched AD patient at their time of diagnosis. The most pronounced changes were observed in anthropometric measures. Individuals with AD experienced progressive reductions in body weight and BMI beginning years before diagnosis, with weight declining by approximately 7–8% during the five years preceding diagnosis and continuing to decrease thereafter (p<0.001 at all timepoints). Similar patterns were observed for BMI. Both albumin and serum hemoglobin concentrations declined gradually in both groups during the five years leading up to diagnosis. However, these levels continued to decline over the subsequent five years after AD diagnosis, whereas they remained stable after the index date in the control group (p<0.001 at all time points). Lipid profiles were altered around the index date, most notably characterized by reductions in total cholesterol that were similar between groups (p>0.05 at all post-index timepoints) and a reduction in HDL cholesterol that was unique to the AD cohort (p < 0.001). Vitamin D concentrations generally increased over time in both cohorts, and hemoglobin A1c trajectories showed only modest changes that were similar between groups. Liver-associated laboratory values displayed divergent patterns, with AST increasing transiently around diagnosis and ALT remaining relatively stable. Together, these findings indicate that AD progression is accompanied by multiple metabolic and physiologic changes that are not explained by normal aging alone.

In semaglutide-treated AD patients, matched comparisons with AD controls without semaglutide exposure showed an average additional 4.8 percentage-point reduction in body weight and 3.7 percentage-point reduction in BMI across the first three post-index years, relative to controls (yearly comparisons: weight, P ≤ 0.002; BMI, P ≤ 0.037), whereas non-anthropometric laboratory biomarkers showed limited separation apart from a nominal hemoglobin difference at year 2 (−0.66 versus −0.05 g/dL; P = 0.029) (Figure S4; See Supplementary Methods).

Paired Cognitive, Functional, and Neuropsychiatric Outcome Changes Following Semaglutide Initiation

To assess within-patient changes following treatment initiation, we compared repeated measures (pre-index and post-index) of outcome scores among semaglutide-treated AD patients and matched non-semaglutide AD controls (Figure 5 and Figure S5). Paired pre-index and post-index score-change analyses included up to 21 semaglutide-treated AD patients and up to 99 matched non-semaglutide AD controls, with sample size varying by instrument. The requirement for paired pre-index and post-index measurements was applied after cohort matching, so only matched patients with repeated measurements for a given instrument contributed to that instrument-specific analysis. Across instruments, the median timing of contributing score measurements was similar between groups, with pre-index scores centered at 10.5 months before index and post-index scores centered at 10.5 months after index in both semaglutide-treated patients and matched controls. Semaglutide-treated patients demonstrated modest improvement in MMSE scores between the pre-index and post-index periods (mean change approximately +0.4 points), whereas matched controls declined by approximately 2.3 points (SD 4.7), resulting in a significant between-group difference (p=0.022). Similarly, MoCA scores were not significantly changed among semaglutide-treated patients (mean change of −0.3 points, SD 2.7 points) but declined by approximately 2.2 points among controls (p=0.049). In contrast, changes in measures of global disease severity, functional impairment, and neuropsychiatric burden did not differ significantly between the semaglutide-treated and control groups, although these analyses were limited by small sample sizes. Overall, semaglutide-treated patients had higher observed cognitive scores than matched controls at several peri-index timepoints, while functional and neuropsychiatric comparisons were limited by smaller sample sizes and did not show consistent between-group separation.

In exploratory analyses stratified by time from AD diagnosis to semaglutide initiation, the most favorable cognitive changes were observed among patients initiating semaglutide 2-4 years after AD diagnosis, with MMSE increasing by 1.63 points and MoCA increasing by 2.00 points between the pre-index and post-index periods, whereas patients initiating within 2 years showed declines in both measures and those initiating ≥4 years showed stable MMSE but modestly lower MoCA (Figure S6; See Supplementary Methods). Functional and neuropsychiatric patterns were more variable: FAQ worsened in the 0–2 year and 2–4 year initiation groups but improved in the ≥4 year group, while NPI measures generally improved in the 0–2 year group, with small sample sizes limiting interpretation.

Discussion

This study provides a longitudinal multifaceted characterization of real-world AD patient trajectories, capturing standardized testing across multiple domains (e.g., cognitive, functional, psychiatric) as well as changes in laboratory and other physiologic measurements. Key results include the heterogeneity in temporal patterns of different standardized tests leading up to and following the diagnosis of AD, the existence of patient subsets with evidence of functional decline significantly preceding evidence of cognitive decline (and vice versa), and the relatively stable pattern of MoCA and MMSE scores among patients who initiated semaglutide after their AD diagnosis contrasted with progressive decline in propensity-matched controls. These findings build on existing literature and motivate further research to better understand the mechanistic basis for diverse clinical phenotypes and trajectories that are present among AD patients, toward the development of therapies that can be tailored to specific patient subgroups.

In the longitudinal evaluation of standardized assessment scores surrounding the time of AD diagnosis, multiple interesting patterns emerged. First, there were clear downward trajectories for multiple cognitive and functional tests that were most evident close to the diagnosis date but began to emerge five to ten years before the first formal diagnosis. These findings are compatible with prior studies, including a recent meta-analysis reporting an average time from symptom onset to diagnosis of 3.6 years in AD and 3.5 years across all types of dementia [16]. This could represent an opportunity for improved early surveillance and diagnostic algorithms. Second, in contrast to these downward cognitive and functional trajectories, the neuropsychiatric testing scores (NPI variants) remained relatively stable throughout the study period. This finding is consistent with previously reported data from over 1,000 patients in the Amsterdam Dementia Cohort, which showed minimal change in NPI scores but uniform decline across cognitive domains after AD diagnosis [17]. This should not be taken as evidence that neuropsychiatric symptoms are minimal or unchanged during the course of AD. Indeed, it is well established that neuropsychiatric symptoms (e.g., depression, anxiety, apathy, psychosis) are highly prevalent in AD, cause significant patient distress, and can worsen as the disease progresses [18,19]. These symptoms also correlate with AD biomarkers, cognitive status, and patient prognosis [20,21]. The relative stability of NPI scores in our study population could reflect selection bias or limited sensitivity of the scores to detect meaningful change, in addition to the possibility that neuropsychiatric symptoms truly do show less variability over time in comparison to cognitive and functional status.

An important aspect of this study, which is critical to interpret the trends described above, is the physician validation of the extracted standardized assessment scores [22].

This study identifies an interesting real-world subset of AD patients who initiated semaglutide after diagnosis and subsequently showed stable or improved clinical trajectories rather than continued deterioration. In a disease usually defined by progressive cognitive and functional decline [23], 218 of 713 post-diagnosis semaglutide initiators showed stable or improved trajectories, representing 30.6% of all treated patients and 58.0% of patients with sufficient longitudinal note evidence for trajectory classification. This group, which was identified from LLM-enabled curation of routine-care documentation at scale, could represent a distinct phenotype which may reveal biological or clinical features associated with unexpectedly slowed, stabilized or improved trajectories.

These data do not prove that semaglutide reverses or stabilizes AD but rather identify a semaglutide-associated, real-world, LLM-derived trajectory phenotype that is compelling enough to warrant further investigation. These findings suggest that AD progression after metabolic intervention may be more heterogeneous than conventional trial-level averages can capture. Although the oral semaglutide 14 mg did not significantly slow clinical progression in early symptomatic, amyloid-confirmed AD at the population level in the EVOKE and EVOKE+ trials [14,24], this does not exclude the possible existence of a responder or resilience phenotype in broader routine-care populations. Indeed, our findings suggest that heterogeneity of AD progression after metabolic intervention could make conventional trial population-level averages difficult to interpret. Real-world semaglutide initiators also likely differ from trial participants in baseline comorbidity status (e.g., higher diabetes, obesity, and cardiovascular burdens), treatment timing, and frailty. In this light, our study reframes the follow-up to the EVOKE trials, placing less emphasis on whether semaglutide universally modifies AD and more on whether a reproducible subgroup of semaglutide-treated patients demonstrates clinical stabilization or apparent improvement after diagnosis.

The natural-history trajectories also raise a second, timing-focused hypothesis. In Figure 1, functional or activity-associated impairment appears to begin years before the steepest decline in conventional cognitive scores, suggesting that AD may include an earlier systemic, metabolic-functional, or frailty-associated phase before the period when cognitive instruments most clearly deteriorate. This observation should be interpreted cautiously because FAQ and related functional measures may capture a mixture of early executive dysfunction, mobility limitation, sarcopenia, depression, sleep disruption, caregiver reporting, vascular disease, and general medical vulnerability, rather than metabolic dysregulation alone. Nevertheless, the temporal pattern is biologically relevant to GLP-1 receptor agonist intervention: if activity decline marks an early window of systemic vulnerability, then semaglutide exposure before or near this inflection point might plausibly have different effects than initiation after established neurocognitive decline. This may be one reason post-diagnosis treatment effects appear heterogeneous. The key follow-up question is therefore not only whether semaglutide-treated patients differ from controls after AD diagnosis, but whether patients who initiate GLP-1 receptor agonist therapy before, during, or soon after the first detectable functional decline show slower subsequent MMSE, MoCA, CDR, FAQ, or composite trajectory deterioration. Such investigation would test whether metabolic intervention is most informative as an early resilience strategy rather than as a late-stage cognitive rescue therapy.

The APOE analysis provides a useful counterweight to a purely neurodegenerative interpretation of functional-first decline. Although limited by sparse genotype availability, the enrichment of ε3/ε3 and lack of ε3/ε4 enrichment in functional-first patients suggest that earlier functional decline may not simply mark more genetically driven Alzheimer biology. Instead, this pattern is consistent with a phenotype in which functional reserve, mobility, frailty, systemic vulnerability, or multimorbidity may shape the timing of observable decline before conventional cognitive deterioration becomes dominant.

Our initial findings derived from an EHR-wide analysis of clinical features (e.g., disease diagnoses, documented symptoms, flowsheet entries) demonstrated a mixed signal, with reductions in multiple neurocognitive phenotypes but increases in several vulnerability-associated signals. Prevalence of disease coding and other EHR entries can be difficult to interpret, as these factors are influenced not only by disease trajectories but also variability in patient follow-up and clinical documentation. That said, these results could be consistent with the framework described above, wherein heterogeneous disease trajectories following semaglutide initiation result in fewer clinical encounters for neurocognitive diagnoses in some patients but higher rates of progressive AD sequelae in others. That GLP-1 receptor agonism may impact AD progression is plausible given that this axis is the intersection of metabolic, vascular, inflammatory and neuroimmune biology [12,13]. GLP-1 receptor agonists have been linked to insulin signaling, mitochondrial function, oxidative stress, autophagy, microglial activation, vascular biology and neuroinflammatory pathways that are relevant to neurodegeneration [12,13,25,26]. Large real-world target-trial emulation studies have also reported lower incident AD and related dementia risk among patients treated with semaglutide or other GLP-1 receptor agonists [4,5]. These prior studies primarily address the risk of developing dementia, whereas the present analysis focuses on patients who already carried an AD diagnosis before semaglutide initiation. The distinction is important, as prevention, delayed onset, and post-diagnosis stabilization may reflect overlapping but non-identical biology.

To extend beyond the cohort-level phenotypic changes, we introduced here a new LLM-enabled methodology to aid in longitudinal severity grading of neurocognitive dysfunction from clinical notes. Namely, we prompted a LLM to extract and summarize the overall burden of disease-relevant patterns from documentation of memory, orientation, language, executive function, activities of daily living, caregiver dependence, behavioral symptoms and supervision needs. This method, intended to serve as a qualitative imputed proxy for standardized cognitive or functional testing scores, was particularly important given the scarcity of explicit documentation of such scores in the EHR. Our finding that the same method applied to non-AD patients yielded a significantly higher fraction of patients with minimal severity cognitive dysfunction supports the face validity of this workflow, although further rigorous validation is needed. Beyond its utility in this study, the method presented here also advances the field of clinical AI phenotyping. Recent work has shown that LLMs can facilitate electronic health record phenotyping algorithm generation [8], and our study extends this principle to longitudinal reconstruction of disease trajectories over specified time intervals. This extension is particularly important given that most diseases tend to evolve in one or more ways (e.g., resolve, progress, wax and wane) over time.

This study has limitations. First, it is a retrospective study of real-world EHR data and thus is prone to multiple sources of confounding and bias including selection, ascertainment, and documentation bias. Second, the study describes inferred cognitive trajectories in a single population of AD patients who subsequently initiated semaglutide. Comparison of these trajectories to relevant control populations, including AD patients who initiated non-GLP-1 anti-diabetic medications or the broader population of AD patients overall, will be important to contextualize the trends reported in this study. Third, the LLM workflow that was utilized to characterize these trajectories requires additional rigorous validation. While the application of this workflow to a cohort of non-AD patients supported the directional validity of the outputs, a more extensive comparison of model outputs to standardized testing scores across a larger dementia cohort is needed. Fourth, in the cohort-level analysis of changes in structured EHR variables, the prevalence of disease diagnosis codes was compared across two time intervals of equal length (approximately 1.5 years before and after semaglutide initiation). However, the analysis did not account for differences in available follow-up time, relating to either differences in calendar time of semaglutide initiation or loss to follow-up. This could result in underestimating the rates of phenotypes in the follow-up period, although we did find that multiple vulnerability associated phenotypes actually increased after treatment initiation. Finally, the cohort definition relied on the presence of at least one semaglutide prescription, but this does not ensure dose escalation or treatment adherence. Adherence to GLP-1 receptor agonists is known to be highly variable in the real-world setting with frequent treatment pauses, discontinuation, and drug switching. In our study population, the percentages of patients who achieved high-dose semaglutide prescriptions was quite low, which could suggest limited adherence. That said, the cohort-level changes in cardiometabolic measurements does support the presence of at least some treatment effect in this population. Importantly, this is not inferred from weight loss alone (which could be related to AD progression rather than an intended consequence of treatment) but also reductions in other laboratory and vital sign parameters.

In conclusion, this real-world LLM-derived clinical trajectory analysis identified a subset of AD patients with documentation of stable or improved cognitive or functional status after semaglutide initiation. Methodologically, this study demonstrates that routine clinical notes contain recoverable longitudinal AD severity information that is missed by sparse standardized score documentation. Clinically, the potential subset of AD patients identified here warrants prospective validation, physician-adjudicated review, and causal testing in matched comparator designs. If this subset is indeed reproducible, it could serve as a discovery lens to help define which patients, biological states, and treatment contexts are most compatible with metabolic protection of the aging brain.

Methods

Study Design and Cohort Definition

This retrospective study used de-identified longitudinal electronic health record data from the nSights federated platform [27,28,29,30,31,32]. Alzheimer’s disease (AD) was defined by at least one structured diagnosis code corresponding to ICD-10 G30 or F00, or ICD-9 331.0. The first qualifying AD diagnosis was assigned as the AD index date. The primary AD outcome cohort included patients with at least one extracted standardized cognitive, functional, global severity, or neuropsychiatric assessment documented in clinical notes. Semaglutide exposure was identified from longitudinal medication records. The semaglutide-treated AD cohort was restricted to patients whose first recorded semaglutide prescription occurred after the first recorded AD diagnosis. For analyses comparing semaglutide-treated patients with AD controls, patients in the control pool were required to have AD but no recorded semaglutide exposure at any time.

Baseline Clinical and Biomarker Characterization

Demographic characteristics were summarized at the AD index date. Comorbidities were identified from structured diagnosis records during the five years before the AD index date and summarized as patient-level binary indicators. Amyloid status was assessed among patients with clinical notes containing the term “amyloid”; up to three amyloid-containing note excerpts per patient were evaluated using an LLM prompt with gpt-oss-20b [33] to classify the patient as amyloid positive, amyloid negative, or indeterminate based only on explicit patient-attributed amyloid biomarker evidence (see Supplementary Methods). APOE genotype was similarly extracted from available clinical documentation using an LLM prompt and harmonized to APOE ε2, ε3, and ε4 allele combinations (see Supplementary Methods).

Clinical Outcome Score Extraction

Clinical documents were screened for explicit mentions of standardized cognitive, functional, global disease severity, and neuropsychiatric assessment instruments. The extracted instruments included the Montreal Cognitive Assessment (MoCA), Mini-Mental State Examination (MMSE), Clinical Dementia Rating Global Score (CDR Global), Clinical Dementia Rating Sum of Boxes (CDR-SB), Alzheimer’s Disease Assessment Scale-Cognitive Subscale 11-item, 13-item, and 14-item versions (ADAS-Cog 11, ADAS-Cog 13, and ADAS-Cog 14), Functional Activities Questionnaire (FAQ), Alzheimer’s Disease Cooperative Study-Activities of Daily Living (ADCS-ADL), Alzheimer’s Disease Cooperative Study-Mild Cognitive Impairment Activities of Daily Living (ADCS-MCI-ADL), Neuropsychiatric Inventory Total Score (NPI Total), Neuropsychiatric Inventory Questionnaire Severity score (NPI-Q Severity), and Neuropsychiatric Inventory Questionnaire Distress score (NPI-Q Distress). Only explicit patient-attributed numeric scores were retained. Extracted values were required to fall within prespecified instrument-specific ranges: MoCA and MMSE, 0-30; CDR Global, 0, 0.5, 1, 2, or 3; CDR-SB, 0-18; ADAS-Cog 11, 0-70; ADAS-Cog 13, 0-85; ADAS-Cog 14, 0-90; FAQ, 0-30; ADCS-ADL, 0-78; ADCS-MCI-ADL, 0-53; NPI Total, 0-144; NPI-Q Severity, 0-36; and NPI-Q Distress, 0-60. Scores outside these ranges, scores not clearly attributed to the patient, and ambiguous values without sufficient instrument context were excluded (see Supplementary Methods).

Physician Validation of Extracted Outcome Scores

A stratified sample of extracted cognitive, functional, global severity, and neuropsychiatric assessment scores was manually adjudicated by physician reviewers. The validation interface displayed the source clinical note with the extracted score text highlighted in context. Each extraction was labelled correct when the instrument name, numeric score value, and supporting text were all judged to be accurate and sufficiently supported by the note; otherwise it was labelled incorrect. One reviewer adjudicated the full validation set, and a second reviewer independently adjudicated an overlapping subset to estimate inter-reviewer agreement using Krippendorff alpha for nominal labels. Final extraction accuracy was calculated against the full adjudication set from the primary reviewer as the proportion of correct extractions among correct plus incorrect labels. Validation results were stratified by instrument family to distinguish low-sample instruments, sufficiently sampled but low-performing extraction families, and the main reportable instrument set. Full validation counts, inter-reviewer agreement, and instrument-level accuracy are reported in the Supplementary Methods.

Longitudinal AD Outcome Trajectories

For the overall AD population, extracted outcome scores were aligned relative to the first AD diagnosis date. Time was binned into 3-month intervals. For each patient, instrument, and time interval, repeated scores were collapsed to the median value. Cohort-level trajectories were calculated as the mean patient-level score within each bin, with standard errors used to summarize uncertainty. Raw score trajectories were retained separately for each instrument. For normalized displays, each instrument was transformed to a fixed 0-1 health-oriented scale, where 1 represented the least impaired end of the instrument range and 0 represented the most impaired end. Instruments where lower raw scores indicate worse status, including MMSE, MoCA, ADCS-ADL, and ADCS-MCI-ADL, were scaled in the positive direction. Instruments where higher raw scores indicate worse status, including CDR Global, CDR-SB, FAQ, NPI Total, NPI-Q Severity, and NPI-Q Distress, were reverse-scaled. A pooled all-outcome trajectory was calculated from available normalized instrument values, while individual instrument trajectories were retained for instrument-specific interpretation. Trajectories were displayed over the full available time horizon and, for the main figure, restricted to the interval through five years after the index date.

Clinical Trial Outcome-Instrument Analysis

To compare routine-care outcome trajectories with the endpoint architecture of AD clinical trials, interventional AD trials were identified from ClinicalTrials.gov with study start dates from 2015 through 2026. Trial outcome descriptions were screened for matched standardized instruments overlapping the trajectory analysis, including cognitive, functional, global disease severity, and neuropsychiatric measures. For each trial, matched instruments were counted once per metric, and trial-level counts were summarized for the selected outcome instruments shown in the figure. This analysis was descriptive and was intended to quantify the representation of outcome domains in contemporary AD trials rather than evaluate trial efficacy.

Functional and Cognitive Decline Temporality

Raw longitudinal score measurements were analyzed for AD patients with repeated functional or cognitive assessments. Functional instruments included FAQ, ADCS-ADL, and ADCS-MCI-ADL; cognitive instruments included MMSE and MoCA. Same-day repeated values for the same patient and instrument were collapsed using the median raw score. Measurements were ordered chronologically within each patient-instrument series and segmented into directional episodes, allowing flat intervals and small reversals when the cumulative opposite-direction movement did not exceed 20% of the dominant directional movement. Instrument-specific scoring direction was used to classify each episode as clinical worsening or improvement, and episodes from instruments within the same domain were merged into domain-level spans when they occurred within 90 days. For each patient and domain, the first confirmed decline date was defined as the earliest later score date within the first worsening span at which the raw score was worse than the span baseline. Functional-cognitive temporality was assessed among patients with confirmed decline in both domains using the difference between the functional and cognitive decline confirmation dates. Patients were further grouped as functional decline >12 months before cognitive decline, cognitive decline >12 months before functional decline, or both decline confirmations occurring within 12 months of each other or on the same date.

Medication Phenotyping in Functional-First and Cognitive-First Patients

Medication profiles were evaluated in patients with clearly separated functional-cognitive decline timing, defined as functional decline confirmed >12 months before cognitive decline or cognitive decline confirmed >12 months before functional decline. Medication exposure was represented using standardized medication concept names and summarized at the patient level. Medication burden was defined as the number of unique medication concepts observed per patient within prespecified time windows anchored to structured AD diagnosis and to the interval between the earlier and later domain decline confirmation dates. Between-group differences in medication burden were tested using two-sided Mann-Whitney U tests. Medication class analyses were performed during the interval between functional and cognitive decline. Each patient was classified as exposed or unexposed to each medication class based on whether at least one medication from that class was observed during this interval. Exposure prevalence was compared between groups using two-sided Fisher exact tests, with absolute risk differences, odds ratios, and confidence intervals calculated for each medication class. Multiple testing was controlled within medication-class families using Benjamini-Hochberg false discovery rate.

Note-Derived Phenotype Enrichment in Functional-First and Cognitive-First Patients

Using the same functional-first and cognitive-first patient groups, we evaluated note-derived phenotype profiles during the interval between functional and cognitive decline. Positive disease mentions from augmented clinical-note curation were restricted to high-confidence assertions and matched case-insensitively against a prespecified phenotype list spanning motor, frailty, autonomic, gastrointestinal, pain, anticholinergic, neuropsychiatric, and care-dependence concepts. Each patient was classified as exposed or unexposed to each phenotype based on whether at least one matching note-derived disease mention was observed during the interval between the earlier and later decline confirmation dates. Exposure prevalence was compared between groups using two-sided Fisher exact tests, with absolute risk differences, odds ratios, and confidence intervals calculated for each phenotype. Multiple testing was controlled across phenotypes using the Benjamini-Hochberg false discovery rate. Patient-level phenotype burden was defined as the number of unique target phenotypes observed during this interval and compared between groups using the Mann-Whitney U test.

Matched Non-AD Control Outcome Trajectories

A non-AD control population was constructed from patients without AD who had clinical documents containing explicit mentions of the same standardized outcome instruments. Candidate control documents were processed using the same extraction rules. Each AD patient was matched to a non-AD control patient on recorded sex/gender and age at the relevant index date. For control patients, the index date was assigned from an available score-assessment date. When multiple potential controls were available, patients with greater outcome-score availability were prioritized to improve longitudinal follow-up. Raw score trajectories were compared between AD patients and matched non-AD controls over the -5 to +5 year interval around the index date.

Measurement Trajectory Analyses

Structured measurement trajectories were evaluated for weight, body mass index, albumin, prealbumin, 25-hydroxyvitamin D, hemoglobin A1c, total cholesterol, low-density lipoprotein cholesterol, high-density lipoprotein cholesterol, triglycerides, hemoglobin, thyroid-stimulating hormone, alanine aminotransferase, and aspartate aminotransferase. Weight and body mass index were processed as anthropometric measurements, whereas laboratory measurements were processed separately and did not use the weight/BMI post-processing workflow. For each patient and measurement, baseline was defined as the patient-level median value within the -6 month to +15 day window around the relevant index date. Values were standardized to canonical units and filtered using broad physiologic plausibility ranges before trajectory construction. Weight and body mass index were plotted as percent change from baseline. Laboratory measurements were plotted as absolute change from baseline in canonical units. As with outcome scores, repeated measurements within a patient and 3-month interval were collapsed to the median before calculating cohort-level means and standard errors. These trajectories were evaluated relative to AD diagnosis and compared with matched non-AD controls.

Semaglutide-Treated AD Outcome Trajectories

For patients with AD whose first semaglutide prescription occurred after AD diagnosis, outcome-score trajectories were realigned relative to the first semaglutide prescription date. The same score extraction, raw-score retention, 3-month binning, patient-level median aggregation, and normalized scaling were applied. These analyses described cognitive, functional, global severity, and neuropsychiatric trajectories before and after semaglutide initiation in the post-diagnosis semaglutide-treated AD cohort. For the semaglutide trajectory figure, the distribution of AD diagnosis timing before semaglutide initiation was summarized and displayed as a background interval.

Matched Semaglutide Versus Non-Semaglutide AD Comparisons

To contextualize post-semaglutide trajectories, semaglutide-treated AD patients were matched to AD controls with no recorded semaglutide exposure. The semaglutide index date was the first semaglutide prescription after AD diagnosis. For non-semaglutide AD controls, the index date was assigned from an eligible outcome-score assessment date. Matching was based on recorded sex/gender, age at index, and time from first AD diagnosis to index date. When multiple candidate controls were available, controls with greater score availability were prioritized. Raw instrument-specific trajectories were compared between semaglutide-treated AD patients and matched non-semaglutide AD controls over the -5 to +5 year interval.

Semaglutide-Associated Measurement Trajectories

For structured anthropometric and laboratory analyses, semaglutide-treated AD patients were compared with matched AD controls without semaglutide exposure using the same measurement-processing workflow described above. The semaglutide index date was the first post-diagnosis semaglutide prescription, and matched controls were assigned an analogous index date. Weight and body mass index were analyzed as percent change from baseline, whereas laboratory measures were analyzed as absolute change from baseline. Between-group comparisons were performed at prespecified yearly landmarks after index.

Paired Pre-Post Outcome Score Change Analyses

For paired score-change analyses, each patient’s pre-index score was defined as the median raw score from 18 to 6 months before index, and the post-index score was defined as the median raw score from 6 to 18 months after index. Patients were included for a given instrument only if they had at least one valid score in both windows. The paired change was calculated as post-index minus pre-index score. For direction-adjusted displays, positive change was defined to consistently indicate better cognitive, functional, or neuropsychiatric status; therefore, raw deltas were sign-reversed for instruments where higher raw scores indicate worse disease. The paired-score requirement was applied after cohort matching, so sample size varied by instrument. Semaglutide-treated AD patients were compared with matched non-semaglutide AD controls using the distribution of patient-level paired deltas. Additional timing diagnostics summarized when the pre-index and post-index observations occurred within their respective windows.

Semaglutide Timing Subgroup Analyses

To evaluate whether clinical changes after semaglutide initiation differed according to treatment timing, semaglutide-treated AD patients with paired pre-index and post-index outcome scores were stratified by the interval between first AD diagnosis and first semaglutide prescription. Timing strata were defined as 0-2 years, 2-4 years, and at least 4 years after AD diagnosis. Within each stratum and instrument, paired raw score changes were calculated using the same pre-index and post-index windows described above. These analyses were considered exploratory because sample sizes were small after stratification and varied by instrument.

Statistical Analysis

For trajectory plots, cohort-level means were calculated from patient-level binned medians, and uncertainty bands represent standard errors. Pairwise trajectory comparisons used raw instrument scores for clinical outcome measures and baseline-relative change values for structured measurements. At prespecified yearly landmarks, between-cohort differences were tested using two-sided Welch tests based on the available mean, standard deviation, and sample size for each cohort at that time point. For paired pre-post score-change analyses, between-group differences in patient-level deltas were also evaluated using two-sided Welch tests. Categorical exposure comparisons used two-sided Fisher exact tests, and patient-level burden measures were compared using Mann-Whitney U tests. Where multiple related features were tested within an analysis family, false discovery rate was controlled using the Benjamini-Hochberg procedure. Analyses were descriptive and hypothesis-generating unless otherwise specified.

Data Source

This study analyzed de-identified EHR data from academic medical centers in the United States via the nference Federated Analytics Platform. Prior to analysis, all data underwent expert determination de-identification satisfying HIPAA Privacy Rule requirements (45 CFR §164.514(b)(1)), employing a multi-layered transformation approach for both structured data (cryptographic hashing of identifiers, date-shifting, geographic truncation) and unstructured clinical text (ensemble deep learning and rule-based methods with >99% recall for personally identifiable information detection) [17,18]. nference established secure data environments within each participating center, housing these de-identified patient data governed by expert determination. These de-identified data environments were specifically designed to enable data access and analysis without requiring Institutional Review Board oversight, approval, or exemption confirmation. Accordingly, informed consent and IRB review were not required for this study.

Data Availability

This study involves the analysis of de-identified Electronic Health Record (EHR) data via the nference Federated Clinical Analytics Platform. Data shown and reported in this manuscript were extracted from this environment using an established protocol for data extraction, aimed at preserving patient privacy. The data has been de-identified pursuant to an expert determination in accordance with the HIPAA Privacy Rule. Any data beyond what is reported in the manuscript, including but not limited to the raw EHR data, cannot be shared or released due to the parameters of the expert determination to maintain the data de-identification.

De-Identification and HIPAA Compliance Certification

Prior to analysis, all EHR data were de-identified under an expert determination consistent with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule (45 CFR §164.514(b)(1)). The de-identification methodology employed a multi-layered transformation approach to both structured and unstructured data fields [34]. In structured data, direct identifiers including patient names and precise geographic locations were excluded entirely, while indirect identifiers underwent specific transformations: patient identifiers, medical record numbers, and accession numbers were replaced with one-way cryptographic hashes using confidential salts to preserve linkage across patient encounters; all dates were shifted backward by patient-specific random offsets (1–31 days) to preserve temporal relationships while obscuring exact event timing; the ZIP codes were truncated to two-digit state-level resolution; and continuous variables including age, height, weight, and body mass index were thresholded to prevent identification of extreme values (for example, ages ≥89 years transformed to ‘89+’ and BMI >40 transformed to ‘40+’). In clinical text, an ensemble de-identification system that combines attention-based deep learning with rule-based methods achieved an estimated >99% recall for personally identifiable information (PII) detection, with detected identifiers replaced by plausible fictional surrogates [34].

Data Harmonization

To address heterogeneity in EHR data, we harmonized clinical variables including medications, anthropometric measurements, and diagnoses to standardized concepts. For medications, we first constructed a standardized drug concept database combining the nference knowledge graph with RXNorm (https://www.nlm.nih.gov/research/umls/rxnorm/index.html) hierarchies to capture ingredient, brand, and dose-specific information [35]. Medication records were matched using a hierarchical approach prioritizing RXNorm codes when available, followed by ingredient-level matching, and finally natural language processing and pattern matching on free-text medication orders when structured codes were absent. For anthropometric measurements (height, weight, BMI), we created a unified vocabulary from SNOMED (https://www.snomed.org/, https://athena.ohdsi.org) and LOINC (https://loinc.org/) and matched EHR measurement descriptions using standardized text matching algorithms with abbreviation expansion and synonym resolution; ambiguous mappings were resolved using OpenAI GPT-4o (https://platform.openai.com/docs/models/gpt-4o) with summary statistics as context, followed by manual verification. For diagnoses, we developed a hierarchical disease concept database from the knowledge graph and matched EHR diagnosis by identifying the most specific common child concept in the hierarchy. This approach enabled consistent identification of clinical entities while preserving granularity where available.

Supplementary Materials

The following supporting information can be downloaded at the website of this paper posted on Preprints.org.

Author Contributions

K.M. and V.S. conceived and designed the study. K.M. conducted data queries, LLM extraction, and statistical analyses with A.J and V.S inputs. GV and SKR performed the LLM evaluation. GS performed the APOE analysis. All authors interpreted results and wrote the manuscript.

Funding

This research received no external funding.

Code Availability

The analysis code is not publicly available. Please contact the corresponding author for details.

Acknowledgments

We thank Patrick Lenehan for critical study review and feedback on the manuscript. .

Conflicts of Interest Statement

The authors are employees of nference, inc., which conducts research collaborations with various biopharmaceutical companies whose therapeutic products are included in this study. None of these companies, nor any other nference collaborator, funded, supported, or had any role in the independent study design, data acquisition, analysis, interpretation, manuscript preparation, or the decision to submit this work for publication. All analyses were conducted by the authors using de-identified electronic health record data. The authors declare no additional competing interests.

References

Lui, F.; Tsao, J. W. Alzheimer Disease. In in StatPearls [Internet]; StatPearls Publishing, 2024. [Google Scholar]
Zhang, J.; et al. Recent advances in Alzheimer’s disease: mechanisms, clinical trials and new drug development strategies. Signal Transduct. Target. Ther. 2024, 9, 211. [Google Scholar] [CrossRef] [PubMed]
Kueck, P. J.; Morris, J. K.; Stanford, J. A. Current Perspectives: Obesity and Neurodegeneration - Links and Risks. Degener. Neurol. Neuromuscul. Dis. 2023, 13, 111–129. [Google Scholar] [CrossRef] [PubMed]
Patel, V.; Edison, P. Cardiometabolic risk factors and neurodegeneration: a review of the mechanisms underlying diabetes, obesity and hypertension in Alzheimer’s disease. J. Neurol. Neurosurg. Psychiatry 2024, 95, 581–589. [Google Scholar] [CrossRef]
Kinney, J. W.; et al. Inflammation as a central mechanism in Alzheimer’s disease. Alzheimers Dement (N Y) 2018, 4, 575–590. [Google Scholar] [CrossRef] [PubMed]
Folstein, M. F.; Folstein, S. E.; McHugh, P. R. ‘Mini-mental state’. A practical method for grading the cognitive state of patients for the clinician. J. Psychiatr. Res. 1975, 12, 189–198. [Google Scholar] [CrossRef]
Nasreddine, Z. S.; et al. The Montreal Cognitive Assessment, MoCA: a brief screening tool for mild cognitive impairment. J. Am. Geriatr. Soc. 2005, 53, 695–699. [Google Scholar] [CrossRef]
Yan, C.; et al. Large language models facilitate the generation of electronic health record phenotyping algorithms. J. Am. Med. Inf. Assoc. 2024, 31, 1994–2001. [Google Scholar] [CrossRef]
Shao, M.; Xie, Y.; Yang, C.; Lu, J. LLM-MINE: Large Language Model based Alzheimer’s Disease and Related Dementias Phenotypes Mining from Clinical Notes; 2026. [Google Scholar]
Wang, W.; et al. Associations of semaglutide with first-time diagnosis of Alzheimer’s disease in patients with type 2 diabetes: Target trial emulation using nationwide real-world data in the US. Alzheimer’s Dement. J. Alzheimer's Assoc. 2024, 20. [Google Scholar] [CrossRef]
Tang, H.; et al. GLP-1RA and SGLT2i Medications for Type 2 Diabetes and Alzheimer Disease and Related Dementias. JAMA Neurol. 2025, 82, 439–449. [Google Scholar] [CrossRef]
Müller, T. D.; et al. Glucagon-like peptide 1 (GLP-1). Mol. Metab. 2019, 30. [Google Scholar] [CrossRef]
Zheng, Z.; et al. Glucagon-like peptide-1 receptor: mechanisms and advances in therapy. Signal Transduct. Target. Ther. 2024, 9, 234. [Google Scholar] [CrossRef]
Cummings, J. L.; et al. Efficacy and safety of oral semaglutide 14 mg (flexible dose) in early-stage symptomatic Alzheimer’s disease (evoke and evoke+): two phase 3, randomised, placebo-controlled trials. Lancet 2026. [Google Scholar] [CrossRef] [PubMed]
Rajan, K. B.; Wilson, R. S.; Weuve, J.; Barnes, L. L.; Evans, D. A. Cognitive impairment 18 years before clinical diagnosis of Alzheimer disease dementia. Neurology 2015, 85, 898–904. [Google Scholar] [CrossRef]
Kusoro, O.; Roche, M.; Del-Pino-Casado, R.; Leung, P.; Orgeta, V. Time to Diagnosis in Dementia: A Systematic Review With Meta-Analysis. Int. J. Geriatr. Psychiatry 2025, 40, e70129. [Google Scholar] [CrossRef]
Eikelboom, W. S.; et al. Neuropsychiatric and Cognitive Symptoms Across the Alzheimer Disease Clinical Spectrum: Cross-sectional and Longitudinal Associations. Neurology 2021, 97, e1276–e1287. [Google Scholar] [CrossRef] [PubMed]
Pless, A.; et al. Understanding neuropsychiatric symptoms in Alzheimer’s disease: challenges and advances in diagnosis and treatment. Front Neurosci. 2023, 17, 1263771. [Google Scholar] [CrossRef]
Lanctôt, K. L.; et al. Neuropsychiatric signs and symptoms of Alzheimer’s disease: New treatment paradigms. Alzheimers Dement (N Y) 2017, 3, 440–449. [Google Scholar] [CrossRef] [PubMed]
Chatzikostopoulos, A.; Moraitou, D.; Papaliagkas, V.; Tsolaki, M. Mapping the Neuropsychiatric Symptoms in Alzheimer’s Disease Using Biomarkers, Cognitive Abilities, and Personality Traits: A Systematic Review. Diagnostics 2025, 15. [Google Scholar] [CrossRef]
Peters, M. E.; et al. Neuropsychiatric symptoms as predictors of progression to severe Alzheimer’s dementia and death: the Cache County Dementia Progression Study. Am. J. Psychiatry 2015, 172, 460–465. [Google Scholar] [CrossRef]
Wang, K.-Y.; et al. Extracting Cognitive Impairment Assessment Information From Unstructured Notes in Electronic Health Records Using Natural Language Processing Tools: Validation with Clinical Assessment Data. Clin. Epidemiol. 2025, 17, 353–365. [Google Scholar] [CrossRef]
Website. Available online: https://www.nia.nih.gov/health/alzheimers-and-dementia/what-alzheimers-disease.
Schneider, L. S. Semaglutide for Alzheimer’s disease after evoke and evoke. Lancet 2026. [Google Scholar] [CrossRef]
Zhou, Z. D.; et al. Glucagon-like peptide-1 receptor agonists in neurodegenerative diseases: Promises and challenges. Pharmacol. Res. 2025, 216, 107770. [Google Scholar] [CrossRef]
Gandhi, A.; Parhizgar, A. GLP-1 receptor agonists in Alzheimer’s and Parkinson's disease: endocrine pathways, clinical evidence, and future directions. Front Endocrinol. 2025, 16, 1708565. [Google Scholar] [CrossRef]
Venkatakrishnan, A. J.; Murugadoss, K.; Soundararajan, V. Decoding the hallmarks of GLP-1RA weight-loss super-responders. Biol. Methods Protoc. 2026, 11, bpag021. [Google Scholar] [CrossRef]
Murugadoss, K.; Varma, G.; Venkatakrishnan, A. J.; Gibson, C. M.; Soundararajan, V. Weight trajectories after last tirzepatide or semaglutide prescription across a federated health network. Biol. Methods Protoc. 2026, 11, bpag020. [Google Scholar] [CrossRef]
Venkatakrishnan, A. J.; et al. Semaglutide is associated with stiffness improvement and broad liver benefits with distinct dose- and weight-linked patterns. medRxiv 2026.04.14.26350891 2026. [Google Scholar] [CrossRef]
Murugadoss, K.; Venkatakrishnan, A. J.; Gregg, C.; Soundararajan, V. Semaglutide cardiovascular outcomes align more closely with attained dose than achieved weight loss. medRxiv 2026.04.02.26350077 2026. [Google Scholar] [CrossRef]
Venkatakrishnan, A. J.; Murugadoss, K.; Soundararajan, V. Weight Loss Dynamics and Health Burden Changes with Tirzepatide versus Semaglutide. medRxiv 2025, 2025.11.30, 25341294. [Google Scholar] [CrossRef]
Murugadoss, K.; Venkatakrishnan, A. J.; Soundararajan, V. Greater lean-body-mass decline with tirzepatide than semaglutide in routine care, revealed by body-composition digital phenotyping. medRxiv 2026.04.11.26350687 2026. [Google Scholar] [CrossRef]
OpenAI, et al. gpt-oss-120b & gpt-oss-20b Model Card. 2025. [Google Scholar]
Murugadoss, K.; et al. Building a best-in-class automated de-identification tool for electronic health records through ensemble learning. Patterns (N Y) 2021, 2, 100255. [Google Scholar] [CrossRef]
Venkatakrishnan, A. J.; et al. Clinical nSights: A software platform to accelerate real world oncology analyses. J. Clin. Oncol. 2024. [Google Scholar] [CrossRef]

Figure 1. Study cohort assembly and longitudinal score framework for semaglutide-treated Alzheimer’s disease. A, The study cohort funnel shows the progression from approximately 29 million patients in the federated EHR platform to Alzheimer’s disease patients with cognitive or functional assessment scores and the semaglutide-treated Alzheimer’s disease cohort. The longitudinal schematic illustrates alignment of cognitive and functional score measurements before and after Alzheimer’s disease diagnosis and semaglutide initiation, using functional scales including FAQ, ADCS-ADL, and ADCS-MCI-ADL, and cognitive scales including MMSE and MoCA. B, Normalized trajectories of cognitive, functional, and neuropsychiatric outcome measures from 42,242 individuals with Alzheimer’s disease, aligned to first recorded Alzheimer’s disease diagnosis at year 0 and restricted to the interval through 5 years after diagnosis. Scores were scaled from 0 to 1, with lower values indicating greater impairment. The thick black curve represents the aggregate trajectory across all available outcome metrics; colored curves represent individual instruments, including ADCS-MCI-ADL, ADCS-ADL, FAQ, CDR Global, CDR-SB, MMSE, MoCA, NPI-Q Distress, NPI-Q Severity, and NPI Total. Shaded regions denote uncertainty around smoothed trajectories. The full time horizon is shown in Figure S1. C, Normalized trajectories of cognitive, functional, and neuropsychiatric outcome measures from 341 Alzheimer’s disease patients who initiated semaglutide after diagnosis, aligned to first semaglutide prescription at year 0 and restricted to the interval through 5 years after semaglutide initiation. Scores were scaled from 0 to 1, with lower values indicating greater impairment. The thick black curve represents the aggregate trajectory across all available outcome metrics; colored curves represent individual instruments, including ADCS-MCI-ADL, ADCS-ADL, FAQ, CDR Global, MMSE, MoCA, NPI-Q Distress, NPI-Q Severity, and NPI Total. Shaded regions denote uncertainty around smoothed trajectories. The light gray vertical band denotes the interquartile range of first recorded Alzheimer’s disease diagnosis timing before semaglutide initiation. The full time horizon is shown in Figure S2. D, Outcome-domain representation across 864 Alzheimer’s disease clinical trials with matched outcome instruments. Bars show the number of trials using each outcome domain as either any matched outcome or a primary outcome. Instruments were grouped into cognitive, global staging/severity, functional/activities-of-daily-living, and neuropsychiatric/behavioral domains. Cognitive instruments were most frequently represented overall and as primary endpoints, whereas functional and neuropsychiatric instruments were less commonly represented despite the longitudinal patterns shown in panels B and C. E-G, Patient-level timing gap between first confirmed functional and cognitive decline in Alzheimer’s disease. Decline was defined from repeated raw score measurements within the same instrument. Functional decline was measured using FAQ, ADCS-ADL, and ADCS-MCI-ADL; cognitive decline was measured using MMSE and MoCA. For each patient, the first confirmed decline date was the earliest later score date at which the score was worse than the baseline of a directional worsening span. The x-axis shows the timing gap in years, defined as functional decline confirmation date minus cognitive decline confirmation date; negative values indicate earlier functional decline, positive values indicate earlier cognitive decline, and zero indicates same-day confirmation. Among 2,169 patients with both functional and cognitive decline, 327 had functional decline >12 months before cognitive decline (E), 1,211 had decline events within 12 months or on the same date (F), and 631 had cognitive decline >12 months before functional decline (G). The vertical dashed line marks no timing difference between domains.

Figure 2. Medication profiles and clinical notes-based phenotypic enrichments of functional decline-first versus cognitive decline-first Alzheimer's disease patient subsets. (A-C) Patients were grouped according to whether functional decline was confirmed >12 months before cognitive decline or cognitive decline was confirmed >12 months before functional decline. (A) This panel shows patient-level medication burden, measured as the number of unique medication concepts observed within clinically anchored time windows: 2 years before structured Alzheimer's disease diagnosis, 2 years after structured diagnosis, 2 years before first confirmed decline, 2 years after first confirmed decline, and the interval between the first and subsequent domain decline. P values compare medication burden between groups using two-sided Mann-Whitney U tests. (B) This panel shows differences in patient-level exposure prevalence for broad medication classes during the interval between functional and cognitive decline. (C) This panel shows the same comparison for gastrointestinal medication subclasses. Values are absolute percentage-point differences in exposure prevalence; negative values indicate higher exposure in the cognitive decline-first cohort, and positive values indicate higher exposure in the functional decline-first cohort. Significance stars denote FDR-adjusted class-level comparisons: *q < 0.05, **q < 0.01 and ***q < 0.001. (D) This panel shows note-derived phenotype enrichment during the interval between functional and cognitive decline. The forest plot compares prespecified note-derived disease phenotypes between patients whose first confirmed functional decline occurred >12 months before first confirmed cognitive decline and patients whose first confirmed cognitive decline occurred >12 months before first confirmed functional decline. For each patient, the analysis window was defined as the interval between the first confirmed decline in the leading domain and the first confirmed decline in the lagging domain. Phenotype exposure was counted once per patient if at least one high-confidence positive disease mention matching the prespecified phenotype list occurred during this interval. Points show the difference in exposure prevalence between the functional decline-first cohort and the cognitive decline-first cohort, expressed in percentage points; negative values indicate enrichment in the cognitive decline-first cohort, and positive values indicate enrichment in the functional decline-first cohort. Horizontal bars denote approximate 95% confidence intervals for the risk difference. Blue points indicate phenotypes more frequent in the cognitive decline-first cohort, and green points indicate phenotypes more frequent in the functional decline-first cohort. Row labels show the total number of patients with each phenotype across both cohorts; counts <11 are masked as N<11. Asterisks denote nominal significance from two-sided Fisher exact testing: *P<0.05, **P<0.01, ***P<0.001. Several gastrointestinal, pain, and parkinsonism-related phenotypes, including headache, Parkinsonism, vomiting, diarrhea, rigidity, nausea, and tremor, were significantly more frequent in the cognitive decline-first cohort after false-discovery-rate correction, whereas no phenotype was significantly enriched in the functional decline-first cohort.

Figure 3. Raw longitudinal outcome trajectories in Alzheimer’s disease versus non-Alzheimer’s disease controls. Raw outcome-score trajectories are shown for individuals with Alzheimer’s disease and non-Alzheimer’s disease controls across eight cognitive, functional and neuropsychiatric instruments, aligned to the index date, defined as year 0. Solid lines represent individuals with Alzheimer’s disease and dashed lines represent non-Alzheimer’s disease controls. Shaded regions denote uncertainty around smoothed trajectories. Vertical dotted lines mark yearly intervals, and the vertical dashed line marks the index date. Nominal P values comparing Alzheimer’s disease and control trajectories at each time interval are shown above each panel. Across activities of daily living, global cognitive performance and dementia-severity measures, Alzheimer’s disease cases showed progressive post-index worsening relative to controls, with marked separation after diagnosis for ADCS-MCI-ADL, ADCS-ADL, FAQ, MMSE, MoCA and CDR-SB. Neuropsychiatric measures showed more heterogeneous trajectories, with smaller or later separation for NPI-Q distress and severity. Sample sizes for each instrument and group are shown within each panel. .

Figure 4. Longitudinal cardiometabolic and nutritional trajectories relative to Alzheimer’s disease diagnosis. Longitudinal percentage change from baseline is shown for weight, body mass index, total cholesterol, HDL cholesterol and albumin among individuals with Alzheimer’s disease, aligned to the date of first recorded Alzheimer’s disease diagnosis, defined as year 0 and indicated by the vertical dashed line. Each panel shows smoothed trajectories for the indicated measure, with shaded regions denoting uncertainty around the fitted trajectory. Baseline was defined relative to each individual’s available pre-diagnosis measurements, and values are plotted as percentage change from baseline to enable comparison across measures with different native units. Weight and body mass index declined progressively before and after Alzheimer’s disease diagnosis, while lipid and albumin trajectories showed distinct pre- and post-diagnosis patterns, consistent with systemic metabolic and nutritional changes accompanying the Alzheimer’s disease disease course. Sample sizes for each measure are shown in the panel titles and legends.

Figure 5. Paired cognitive and functional score change after semaglutide initiation in Alzheimer’s disease. Mean within-patient raw score change from the pre-index period to the post-index period is shown for MMSE, MoCA, and FAQ among semaglutide-treated Alzheimer’s disease patients and matched non-semaglutide Alzheimer’s disease controls. Bars represent the mean post-index minus pre-index score change, and error bars denote uncertainty around the mean. Sample sizes are shown for each cohort within each panel. For MMSE and MoCA, higher scores indicate better cognitive status, so positive change reflects improvement or relative preservation. For FAQ, higher scores indicate greater functional impairment, so negative change reflects improvement or relative preservation. Nominal P values compare score-change distributions between semaglutide-treated patients and matched controls for each outcome measure.

Table 1. Demographic and clinical characteristics of 42,242 individuals with Alzheimer's disease and cognitive/functional testing performed. Values are presented as n (%) unless otherwise indicated. Comorbidities reflect the most prevalent diagnoses recorded prior to or at the time of Alzheimer's disease diagnosis, with corresponding ICD-10 codes.

Total patients, n	42,242
Female sex, n (%)	23,808 (56.4)
Age, mean (SD), y	78.3 (7.4)
Race, n (%)
White	36,002 (85.2)
Black or African American	4,165 (9.9)
Asian	345 (0.8)
Native American or Other Pacific Islander	112 (0.3)
Hispanic	79 (0.2)
Mixed Race	52 (0.1)
South East Asian	34 (0.08)
Asian Indian	22 (0.05)
Unknown	1,425 (3.4)
Ethnicity, n (%)
Not Hispanic or Latino	35,040 (83.0)
Hispanic or Latino	763 (1.8)
Unknown	6,439 (15.2)
Comorbidities (5 years pre-AD diagnosis), n (%)
Essential Hypertension (I10), n (%)	20,782 (49.2)
Amnesia (R41.3), n (%)	16,801 (39.8)
Hyperlipidemia (E78.5), n (%)	16,088 (38.1)
Mild Cognitive Impairment (G31.84), n (%)	8,364 (19.8)
Type 2 Diabetes Mellitus (E11), n (%)	7,978 (18.9)
Gastroesophageal Reflux Disease Without Esophagitis (K21.9), n (%)	7,849 (18.6)
Mixed Hyperlipidemia (E78.2), n (%)	7,623 (18.0)
Dyspnea (R06.0), n (%)	7,142 (16.9)
Low Back Pain (M54.5), n (%)	7,056 (16.7)
Dizziness and Giddiness (R42), n (%)	7,031 (16.6)
Coronary Artery Disease (I25.1), n (%)	6,430 (15.2)
Urinary Tract Infection (N39.0), n (%)	6,204 (14.7)
Hypothyroidism (E03.9), n (%)	6,144 (14.5)
Anemia (D64.9), n (%)	6,012 (14.2)
Dementia (F03.9), n (%)	5,994 (14.2)
Vitamin D Deficiency (E55.9), n (%)	5,969 (14.1)
Bone Disease (M89.9), n (%)	5,766 (13.6)
Amyloid Status
“Amyloid” mentioned in clinical notes	12,635 of 42,242 (29.9)
Positive	4,461 of 12,635 (35.3)
Negative	1,085 of 12,635 (8.6)
Unknown / Indeterminate	7,089 of 12,635 (56.1)
APOE Genotype
Result available	2,477 of 42,242 (5.8)
APOE ε3/ε4	951 of 2,477 (38.4)
APOE ε3/ε3	570 of 2,477 (23.0)
APOE ε4/ε4	419 of 2,477 (16.9)
APOE ε2/ε3	123 of 2,477 (5.0)
APOE ε2/ε4	101 of 2,477 (4.0)
APOE ε2/ε2	48 of 2,477 (1.9)

Table 2. Accepted standardized cognitive and functional assessments and valid score ranges used for formal score extraction. Scores outside these ranges were discarded.

Assessment	Accepted note mentions	Valid score range	Direction of worse severity
MoCA	MoCA, Montreal Cognitive Assessment	0 to 30 inclusive	Lower is worse
MMSE	MMSE, Mini-Mental State Examination	0 to 30 inclusive	Lower is worse
CDR global	Global CDR, Clinical Dementia Rating global score	0, 0.5, 1, 2, or 3 only	Higher is worse
CDR sum of boxes	CDR-SB, CDR SoB, CDR Sum of Boxes	0 to 18 inclusive	Higher is worse
ADAS-Cog11	ADAS-Cog 11	0 to 70 inclusive	Higher is worse
ADAS-Cog13	ADAS-Cog 13	0 to 85 inclusive	Higher is worse
ADAS-Cog14	ADAS-Cog 14	0 to 90 inclusive	Higher is worse
FAQ	FAQ, Functional Activities Questionnaire	0 to 30 inclusive	Higher is worse
ADCS-ADL	ADCS-ADL	0 to 78 inclusive	Lower is worse
ADCS-MCI-ADL	ADCS-MCI-ADL, ADCS-ADL-MCI	0 to 53 inclusive	Lower is worse
NPI total	NPI total, Neuropsychiatric Inventory total	0 to 144 inclusive	Higher is worse
NPI-Q severity	NPI-Q severity	0 to 36 inclusive	Higher is worse
NPI-Q distress	NPI-Q distress	0 to 60 inclusive	Higher is worse

Table 3. Inter-abstractor agreement on overlapping adjudications.

Metric	Value
Overlap reviewed by both abstractors, n	100
Agreement, n (%)	98 (98.0)
Krippendorff alpha, nominal	0.952
Physician 1 Y rate, %	70.0
Physician 2 Y rate, %	72.0

Table 4. Validation accuracy by reporting stratum.

Analysis set	n	Y	N	Accuracy, %
Low sample, n < 20	34	26	8	76.5
n >= 20 and accuracy < 70%	88	34	54	38.6
Main remainder, n >= 20 and accuracy >= 70%	154	140	14	90.9
The 90.9% estimate applies to sufficiently sampled, non-problematic score families. It should not be presented as global LLM score-extraction accuracy across all instruments. Low-accuracy instrument families require targeted extraction-rule revision and revalidation before inclusion in primary downstream analyses.

Table 5. Instrument-level accuracy among the main reportable remainder.

Instrument	n	Y	N	Accuracy, %
CDR global	22	21	1	95.5
CDR sum of boxes	22	21	1	95.5
MMSE	22	21	1	95.5
other_specified	22	21	1	95.5
FAQ	22	20	2	90.9
MoCA	22	20	2	90.9
ADCS-ADL	22	16	6	72.7

Table 6. Sufficiently sampled instruments with accuracy <70%.

Instrument	n	Y	N	Accuracy, %
NPI-Q severity	22	1	21	4.5
ADCS-ADL	22	9	13	40.9
NPI-Q distress	22	10	12	45.5
NPI total	22	14	8	63.6

Table 7. Low-sample instruments summarized descriptively only.

Instrument	n	Y	N	Accuracy, %
ADCS-MCI-ADL	9	2	7	22.2
ADAS-Cog13	6	5	1	83.3
ADAS-Cog14	5	5	0	100.0
CES_D	2	2	0	100.0
MINI_Cog	2	2	0	100.0
QOL_Cog	2	2	0	100.0

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.