Discussion
This study provides a longitudinal multifaceted characterization of real-world AD patient trajectories, capturing standardized testing across multiple domains (e.g., cognitive, functional, psychiatric) as well as changes in laboratory and other physiologic measurements. Key results include the heterogeneity in temporal patterns of different standardized tests leading up to and following the diagnosis of AD, the existence of patient subsets with evidence of functional decline significantly preceding evidence of cognitive decline (and vice versa), and the relatively stable pattern of MoCA and MMSE scores among patients who initiated semaglutide after their AD diagnosis contrasted with progressive decline in propensity-matched controls. These findings build on existing literature and motivate further research to better understand the mechanistic basis for diverse clinical phenotypes and trajectories that are present among AD patients, toward the development of therapies that can be tailored to specific patient subgroups.
In the longitudinal evaluation of standardized assessment scores surrounding the time of AD diagnosis, multiple interesting patterns emerged. First, there were clear downward trajectories for multiple cognitive and functional tests that were most evident close to the diagnosis date but began to emerge five to ten years before the first formal diagnosis. These findings are compatible with prior studies, including a recent meta-analysis reporting an average time from symptom onset to diagnosis of 3.6 years in AD and 3.5 years across all types of dementia [
16]. This could represent an opportunity for improved early surveillance and diagnostic algorithms. Second, in contrast to these downward cognitive and functional trajectories, the neuropsychiatric testing scores (NPI variants) remained relatively stable throughout the study period. This finding is consistent with previously reported data from over 1,000 patients in the Amsterdam Dementia Cohort, which showed minimal change in NPI scores but uniform decline across cognitive domains after AD diagnosis [
17]. This should not be taken as evidence that neuropsychiatric symptoms are minimal or unchanged during the course of AD. Indeed, it is well established that neuropsychiatric symptoms (e.g., depression, anxiety, apathy, psychosis) are highly prevalent in AD, cause significant patient distress, and can worsen as the disease progresses [
18,
19]. These symptoms also correlate with AD biomarkers, cognitive status, and patient prognosis [
20,
21]. The relative stability of NPI scores in our study population could reflect selection bias or limited sensitivity of the scores to detect meaningful change, in addition to the possibility that neuropsychiatric symptoms truly do show less variability over time in comparison to cognitive and functional status.
An important aspect of this study, which is critical to interpret the trends described above, is the physician validation of the extracted standardized assessment scores [
22].
This study identifies an interesting real-world subset of AD patients who initiated semaglutide after diagnosis and subsequently showed stable or improved clinical trajectories rather than continued deterioration. In a disease usually defined by progressive cognitive and functional decline [
23], 218 of 713 post-diagnosis semaglutide initiators showed stable or improved trajectories, representing 30.6% of all treated patients and 58.0% of patients with sufficient longitudinal note evidence for trajectory classification. This group, which was identified from LLM-enabled curation of routine-care documentation at scale, could represent a distinct phenotype which may reveal biological or clinical features associated with unexpectedly slowed, stabilized or improved trajectories.
These data do not prove that semaglutide reverses or stabilizes AD but rather identify a semaglutide-associated, real-world, LLM-derived trajectory phenotype that is compelling enough to warrant further investigation. These findings suggest that AD progression after metabolic intervention may be more heterogeneous than conventional trial-level averages can capture. Although the oral semaglutide 14 mg did not significantly slow clinical progression in early symptomatic, amyloid-confirmed AD at the population level in the EVOKE and EVOKE+ trials [
14,
24], this does not exclude the possible existence of a responder or resilience phenotype in broader routine-care populations. Indeed, our findings suggest that heterogeneity of AD progression after metabolic intervention could make conventional trial population-level averages difficult to interpret. Real-world semaglutide initiators also likely differ from trial participants in baseline comorbidity status (e.g., higher diabetes, obesity, and cardiovascular burdens), treatment timing, and frailty. In this light, our study reframes the follow-up to the EVOKE trials, placing less emphasis on whether semaglutide universally modifies AD and more on whether a reproducible subgroup of semaglutide-treated patients demonstrates clinical stabilization or apparent improvement after diagnosis.
The natural-history trajectories also raise a second, timing-focused hypothesis. In
Figure 1, functional or activity-associated impairment appears to begin years before the steepest decline in conventional cognitive scores, suggesting that AD may include an earlier systemic, metabolic-functional, or frailty-associated phase before the period when cognitive instruments most clearly deteriorate. This observation should be interpreted cautiously because FAQ and related functional measures may capture a mixture of early executive dysfunction, mobility limitation, sarcopenia, depression, sleep disruption, caregiver reporting, vascular disease, and general medical vulnerability, rather than metabolic dysregulation alone. Nevertheless, the temporal pattern is biologically relevant to GLP-1 receptor agonist intervention: if activity decline marks an early window of systemic vulnerability, then semaglutide exposure before or near this inflection point might plausibly have different effects than initiation after established neurocognitive decline. This may be one reason post-diagnosis treatment effects appear heterogeneous. The key follow-up question is therefore not only whether semaglutide-treated patients differ from controls after AD diagnosis, but whether patients who initiate GLP-1 receptor agonist therapy before, during, or soon after the first detectable functional decline show slower subsequent MMSE, MoCA, CDR, FAQ, or composite trajectory deterioration. Such investigation would test whether metabolic intervention is most informative as an early resilience strategy rather than as a late-stage cognitive rescue therapy.
The APOE analysis provides a useful counterweight to a purely neurodegenerative interpretation of functional-first decline. Although limited by sparse genotype availability, the enrichment of ε3/ε3 and lack of ε3/ε4 enrichment in functional-first patients suggest that earlier functional decline may not simply mark more genetically driven Alzheimer biology. Instead, this pattern is consistent with a phenotype in which functional reserve, mobility, frailty, systemic vulnerability, or multimorbidity may shape the timing of observable decline before conventional cognitive deterioration becomes dominant.
Our initial findings derived from an EHR-wide analysis of clinical features (e.g., disease diagnoses, documented symptoms, flowsheet entries) demonstrated a mixed signal, with reductions in multiple neurocognitive phenotypes but increases in several vulnerability-associated signals. Prevalence of disease coding and other EHR entries can be difficult to interpret, as these factors are influenced not only by disease trajectories but also variability in patient follow-up and clinical documentation. That said, these results could be consistent with the framework described above, wherein heterogeneous disease trajectories following semaglutide initiation result in fewer clinical encounters for neurocognitive diagnoses in some patients but higher rates of progressive AD sequelae in others. That GLP-1 receptor agonism may impact AD progression is plausible given that this axis is the intersection of metabolic, vascular, inflammatory and neuroimmune biology [
12,
13]. GLP-1 receptor agonists have been linked to insulin signaling, mitochondrial function, oxidative stress, autophagy, microglial activation, vascular biology and neuroinflammatory pathways that are relevant to neurodegeneration [
12,
13,
25,
26]. Large real-world target-trial emulation studies have also reported lower incident AD and related dementia risk among patients treated with semaglutide or other GLP-1 receptor agonists [
4,
5]. These prior studies primarily address the risk of developing dementia, whereas the present analysis focuses on patients who already carried an AD diagnosis before semaglutide initiation. The distinction is important, as prevention, delayed onset, and post-diagnosis stabilization may reflect overlapping but non-identical biology.
To extend beyond the cohort-level phenotypic changes, we introduced here a new LLM-enabled methodology to aid in longitudinal severity grading of neurocognitive dysfunction from clinical notes. Namely, we prompted a LLM to extract and summarize the overall burden of disease-relevant patterns from documentation of memory, orientation, language, executive function, activities of daily living, caregiver dependence, behavioral symptoms and supervision needs. This method, intended to serve as a qualitative imputed proxy for standardized cognitive or functional testing scores, was particularly important given the scarcity of explicit documentation of such scores in the EHR. Our finding that the same method applied to non-AD patients yielded a significantly higher fraction of patients with minimal severity cognitive dysfunction supports the face validity of this workflow, although further rigorous validation is needed. Beyond its utility in this study, the method presented here also advances the field of clinical AI phenotyping. Recent work has shown that LLMs can facilitate electronic health record phenotyping algorithm generation [
8], and our study extends this principle to longitudinal reconstruction of disease trajectories over specified time intervals. This extension is particularly important given that most diseases tend to evolve in one or more ways (e.g., resolve, progress, wax and wane) over time.
This study has limitations. First, it is a retrospective study of real-world EHR data and thus is prone to multiple sources of confounding and bias including selection, ascertainment, and documentation bias. Second, the study describes inferred cognitive trajectories in a single population of AD patients who subsequently initiated semaglutide. Comparison of these trajectories to relevant control populations, including AD patients who initiated non-GLP-1 anti-diabetic medications or the broader population of AD patients overall, will be important to contextualize the trends reported in this study. Third, the LLM workflow that was utilized to characterize these trajectories requires additional rigorous validation. While the application of this workflow to a cohort of non-AD patients supported the directional validity of the outputs, a more extensive comparison of model outputs to standardized testing scores across a larger dementia cohort is needed. Fourth, in the cohort-level analysis of changes in structured EHR variables, the prevalence of disease diagnosis codes was compared across two time intervals of equal length (approximately 1.5 years before and after semaglutide initiation). However, the analysis did not account for differences in available follow-up time, relating to either differences in calendar time of semaglutide initiation or loss to follow-up. This could result in underestimating the rates of phenotypes in the follow-up period, although we did find that multiple vulnerability associated phenotypes actually increased after treatment initiation. Finally, the cohort definition relied on the presence of at least one semaglutide prescription, but this does not ensure dose escalation or treatment adherence. Adherence to GLP-1 receptor agonists is known to be highly variable in the real-world setting with frequent treatment pauses, discontinuation, and drug switching. In our study population, the percentages of patients who achieved high-dose semaglutide prescriptions was quite low, which could suggest limited adherence. That said, the cohort-level changes in cardiometabolic measurements does support the presence of at least some treatment effect in this population. Importantly, this is not inferred from weight loss alone (which could be related to AD progression rather than an intended consequence of treatment) but also reductions in other laboratory and vital sign parameters.
In conclusion, this real-world LLM-derived clinical trajectory analysis identified a subset of AD patients with documentation of stable or improved cognitive or functional status after semaglutide initiation. Methodologically, this study demonstrates that routine clinical notes contain recoverable longitudinal AD severity information that is missed by sparse standardized score documentation. Clinically, the potential subset of AD patients identified here warrants prospective validation, physician-adjudicated review, and causal testing in matched comparator designs. If this subset is indeed reproducible, it could serve as a discovery lens to help define which patients, biological states, and treatment contexts are most compatible with metabolic protection of the aging brain.