Preprint
Review

This version is not peer-reviewed.

Blood-Based RNA Biomarkers and Machine Learning Approaches for Alzheimer’s Disease

Submitted:

12 May 2026

Posted:

13 May 2026

You are already at the latest version

Abstract
Alzheimer's disease (AD), a leading cause of dementia worldwide, is a neurological disorder characterized by progressive cognitive decline. AD is also considered a significant socioeconomic burden. While definitive diagnostic tools such as positron emission tomography (PET) imaging and cerebrospinal fluid (CSF) biomarker analysis offer high sensitivity and specificity, they are limited by high cost, invasiveness, and limited accessibility. Consequently, these gold standard approaches hinder their applicability for large-scale screening and longitudinal follow-up. Recent advances in blood-based biomarkers hold promise in capturing systemic molecular changes associated with AD. In particular, transcriptomic signatures derived from RNA sequencing (RNA-seq) are promising in capturing systemic molecular changes associated with AD. Gene expression profiles in peripheral blood reveal underlying pathological processes. These pathological processes can be listed as synaptic dysfunction, neuroinflammation, and metabolic dysregulation. Together with the high-dimensional datasets and AI approaches enable the identification of robust predictive models which has the assistance of estimating AD-related biomarker status. We further discussed the integration of multiple omics data, including genomics, proteomics, and metabolomics to improve biomarker robustness. We also addressed key challenges related to reproducibility, repeatibility, cohort heterogeneity, and clinical application. And we outline future directions of standardized, scalable, and clinically applicable diagnostic machineries.
Keywords: 
;  ;  ;  ;  ;  ;  ;  
Subject: 
Engineering  -   Other

1. Introduction

Alzheimer's disease (AD) is a progressive neurological disorder that leads to a continuous deterioration of cognitive function, impairing individuals' ability to think clearly and productively. Furthermore, AD is an aggressive and progressive disease, accounting for 60-80% of all dementia cases. According to the World Health Organization (WHO) reports for 2025, more than 55 million people worldwide are affected by AD. This number is projected to double by 2050, placing a significant burden on healthcare systems, families, and society [1,2]. Pathologically, it is characterized by extracellular amyloid-β (Aβ) plaques, intracellular hyperphosphorylated tau-associated neurofibrillary tangles, and widespread synaptic and neuronal loss [3,4]. Despite decades of research, a definitive and cost-effective cure for Alzheimer's disease has yet to be found. Till this date, only anti-amyloid monoclonal antibodies have demonstrated encouraging results in slowing the disease’s progression [5,6].
Modern medical diagnostic methods combine clinical testing with biomarker assessment. The "AT(N)" framework proposed by the NIA-AA stages Alzheimer's disease by considering amyloid, tau, and neurodegeneration components [4,5]. Definitive disease diagnosis is made by positron emission tomography (PET) scanning, which reveals biomarker accumulations such as Aβ or tau, or by measuring the levels of Aβ42 and phosphorylated tau (pTau) in cerebrospinal fluid (CSF) [7]. Generally, these approaches achieve sensitivity and specificity values beyond 90%, granting the separation of Alzheimer’s disease from additional types of dementia. While PET imaging is accurate, it is expensive, requires specialized facilities, and carries risks of radiation exposure [8]. Furthermore, lumbar puncture for CSF sampling is an invasive procedure and carries short- and long-term side effects for patients. Therefore, it is less suitable for population-level screening or longitudinal follow-up [9]. Consequently, extensive efforts have been made to discover minimally invasive biomarkers, particularly those derived from blood. New techniques through correlations with plasma pTau181, pTau217, Aβ42/40, and neurofilament light chain (NfL) levels, CSF, and PET measurements have been proposed [10,11]. Nevertheless, blood-based protein biomarkers raising concerns reproducibility and generalizability, oblige highly sensitive testing and may vary across different populations [12,13]. Advancements in high-throughput transcriptomics and computational biology have underlined the potential of RNA profiles in blood as diagnostic and prognostic signatures. Gene expression changes in blood may reflect systemic alterations or peripheral signatures of neurodegenerative processes. Numerous studies have identified differentially expressed genes (DEGs) in Alzheimer’s disease (AD) samples compared to cognitively unimpaired controls [14,15,16,17].
RNA-based biomarkers can be enhanced by combining them with machine learning (ML) and deep learning (DL) approaches [18]. These improve prediction accuracy, degrade dimensionality, and discover hidden patterns not apparent with conventional statistical techniques [19,20,21]. Classifier machine learning techniques, such as Random Forest (RF), Ranger, and Gradient Boosting Machines (GBM), have been shown to use RNA-seq-based data from CSF and blood samples that gene expression signatures can accurately predict AD biomarker status even with independent and unbalanced test sets [22,23,24]. Stochastic models such as Gaussian mixture modeling (GMM) and Hidden Markov model (HMM) have been prosperous in predicting continuous CSF biomarker levels from blood RNA profiles [25,26,27]. Moreover, these probabilistic models assisted in setting the stage for figuring out risks of continuous numbers.
We continue to describe the limitations and expenses of traditional imaging techniques. We also review the hardship of acquiring cerebrospinal fluid (CSF) biomarkers and discussing progress in blood-based transcriptomic biomarkers. After highlighting RNA-seq approaches, early microarray studies, and consolidative omics research, we then discuss the position of deep and machine learning algorithms in predicting non-coding RNA biomarkers. We also include a summary of feature selection, ensemble methods, and probabilistic simulations in biomarker innovation and classification.

2. Established Biomarkers and Their Limitations

The diagnosis of Alzheimer’s disease depends on neuroimaging, clinical evaluation, and cerebrospinal fluid (CSF) biomarkers so far. While these tools have enhanced our ability to detect disease pathology, they also have limitations that restrict their use in population-level screening and longitudinal monitoring [28,29]. Table 1 presents established Alzheimer’s disease biomarkers with their strengths and limitations.

2.1. CSF Biomarkers

CSF biomarkers are considered the most reliable way to detect Alzheimer’s disease. Key markers include amyloid-β42 (Aβ42) and phosphorylated tau (pTau), which are used in the A/T/N framework to categorize the disease. Reduced Aβ42 indicates amyloid plaque buildup in the brain, elevated pTau reflects tau tangles, and higher total tau or neurofilament light (NfL) signals neuronal damage. Studies to date have shown that CSF measurements can detect AD with high diagnostic accuracy, with sensitivity and specificity often exceeding 90% [30]. These biomarkers predict the progression from mild cognitive impairment (MCI) to AD and monitor disease severity. They can further distinguish AD from other dementias. Although they have high accuracy predicting the AD, CSF is a technique that demands an invasive and painful lumbar puncture procedure. Other disadvantages of CSF collection can be listed as short-term headaches, infection, and patient reluctance. In addition, variability in sample collection, ethical protocols and analysis methods between centers may compromise reproducibility. This may constrain its applicability for repeated measurements in primary care settings in long-term studies [9].

2.2. PET Imaging Biomarkers

Positron emission tomography (PET) technology is critical for establishing the diagnosis of AD based on both clinical symptoms and biological evidence. by transforming in vivo imaging of AD pathology. PET offers high specificity and sensitivity, while also allowing researchers to monitor disease progression and evaluate treatment responses in clinical trials. Furthermore, However, PET requires specialized imaging facilities costing thousands of dollars per scan. Limited tracer availability, regulatory approval processes, and radiation exposure further limit its use. Therefore, PET is largely confined to specialized memory clinics and research settings rather than a routine clinical application [31].

2.3. Blood-Based Protein Biomarkers

Plasma pTau181 and pTau217 have established high precision in differentiating AD from other neurodegenerative disorders and in predicting future cognitive decay [32,33]. Likewise, plasma Aβ42/40 ratio correlates with amyloid PET positivity, and plasma NfL reflects neurodegeneration across several disorders [34,35,36]. Single molecule array (Simoa) assays and immunoprecipitation mass spectrometry developments in ultrasensitive detection technologies have allowed the volume of these substances at the exceptionally low concentrations start in plasma [37]. Researchers have narrated that combinations of techniques of plasma pTau, Aβ42/40, and NfL accomplish diagnostic precision comparable to that of CSF and PET [33,38,39]. Moreover, blood protein biomarkers levels may be influenced by peripheral sources, comorbidities, and pre-analytical variables such as sample handling. Plasma Aβ is also produced in platelets and peripheral tissues, complicating interpretation [40]. Repeatability and reproducibility across diverse populations is still a challenge, as variances are observed amongst groups with distinct ethnicity, age, and comorbidities.

2.4. Limitations of Current Biomarker Strategies

CSF and PET biomarkers are highly sensitive, but their invasiveness, cost, and accessibility limit their widespread use as illustrated in Figure 1. Plasma protein biomarkers may also offer an alternative, but they present technical challenges in terms of standardization, generalizability, and biological specificity. Combining transcriptomic data with machine learning (ML) and deep learning (DL) techniques is another alternative approach. Together, they can produce predictive biomarkers from high-dimensional datasets and manage noise. Blood-derived transcriptomic signatures offer to capture gene expression changes reflecting dynamical disease processes. They have the potential to complement protein markers, increase diagnostic accuracy, and provide mechanistic insights.

3. Transcriptomic Biomarkers and Machine Learning Approaches

The combination cerebrospinal fluid (CSF) and positron emission tomography (PET) imaging is highly accurate but invasive and expensive. On the other hand, a minimally invasive, readily accessible, and suitable option for repeated sampling to detect early onset of AD is offered by the blood-based transcriptomics. The analysis of RNA-seq data reveals systemic molecular alterations associated with neurodegeneration, immune dysregulation, inflammation, oxidative stress, and metabolic disorders. Transcriptomic biomarkers are particularly suitable for screening and clinical trials. Their integration with machine learning (ML) methodologies assist the extraction of predictive signatures from multidimensional data [42]. Figure 2 illustrates the steps and methods utilizing blood transcriptomic data to predict AD status. Machine and deep learning research in transcriptomic biomarker discovery are summarized in Table 2.

3.1. Early Microarray Studies

In Table 3, we present a comprehensive overview of notable research conducted in the field of blood transcriptomic biomarkers as a means of investigating Alzheimer’s disease (AD). The initial transcriptomic biomarker studies employed microarrays. Naughton et al. (2014) identified dysregulated immune- and apoptosis-related pathways in Alzheimer’s disease (AD). Booij et al. (2011) utilized partial least squares (PLS) regression to classify AD and controls, revealing disease-associated signatures. Lunnon et al. (2013) reported alterations in mitochondrial and ribosomal pathways in both AD and mild cognitive impairment (MCI). Roed et al. (2013) developed predictive models for the progression of MCI to AD, underscoring their prognostic utility. Fehlbaum-Beurdeley et al. (2010) introduced AclarusDX™, a diagnostic panel with over 80% classification accuracy. Although informative, microarrays were limited by probe design, reduced sensitivity, and the inability to detect novel transcripts [43].

3.2. Transition to RNA Sequencing

RNA sequencing (RNA-seq) has transformed transcriptomics by providing unbiased, high-resolution detection of both coding and non-coding RNAs. Studies have identified immune, synaptic, and lipid metabolic pathways as dysregulated in AD, with improved reproducibility compared to microarrays [49,50]. Blood RNA-seq signatures, in which peripheral gene expression reflects central pathology, can be predicted through signatures such as pTau/Aβ42 or Aβ42/pTau biomarkers.

3.3. The Role of Non-Coding RNAs

Non-coding RNAs (ncRNAs) add another layer of biomarker potential. MicroRNAs (miRNAs), such as miR-16-5p and miR-34a-5p, play a vital role in regulating amyloid and tau levels. Long non-coding RNAs (lncRNAs) influence neuroinflammatory pathways and synaptic plasticity. Circular RNAs (circRNAs), stable and abundant molecules, are increasingly being explored as diagnostic markers. In order to advance classification, specificity, sensitivity, and mechanistic insights into pathological progression and AD diagnosis combinining non-coding RNAs into biomarker panels is important [51,52].

3.4. Machine Learning Integration

Transcriptomic datasets with at least 20,000 genes, proteins, or methylation features but low sample size (HDLSS) cause serious dimensionality and overfitting issues in machine learning. Methods such as dimensionality reduction (matrix factorization) and feature selection can overcome this challenge by enabling predictive modeling [53,54]. Filtering methods include correlation, DEGs, wrapper recursive algorithms, or machine learning algorithms, as well as feature elimination and embedding methods (LASSO, Ridge, ElasticNet), which provide the best accuracy rates. Studies such as Perera et al. [49] and Madar et al. [50] used compact gene panels to predict AD. In these studies, classification algorithms are used to predict categorical outcomes after identifying the features of interest. The most used models are random forest (RF), gradient boosting machines (GBM), support vector machines (SVM), and partial Least Squares Discriminant Analysis (PLS-DA).
RF builds ensembles of decision trees utilizing bootstrapped data. It later combines their predictions through averaging. The trees become more precise and less sensitive to overfitting or noise by combining many trees trained on varied data [55].
f R F x = 1 / B b = 1 B T b x
GBM consecutively train weak learners, typically decision trees, to correct the errors of prior models. GBM takes extremely non-linear and convoluted interactions, achieving high predictive accuracy by optimizing the gradient of the loss function [56].
F m x = F m 1 x + γ m h m x
where h m x is the weak learner trained to minimize the loss gradient, and γ m is the learning rate controlling the contribution of h m x ,     F m x is the model at iteration m.
SVM operate effectively on small yet complex datasets by constructing hyperplanes that maximize the margin between classes, thereby minimizing structural risk and enhancing generalization [57].
  f x = s i g n w x + b
where w defines the normal vector to the hyperplane, b is the bias term, and sign(·) determines the class label.
Partial Least Squares (PLS), particularly in its discriminant analysis variant (PLS-DA), is well-suited for multicollinear and high-dimensional data [58]. By projecting predictors and response variables into a shared latent space, it reduces dimensionality while retaining class-discriminative information.
X = T P + E , Y = U Q + F
where X is the predictor matrix, Y is the response matrix, T and U are score matrices, P and Q are loading matrices, and E and F are residuals. The discriminant variant (PLS-DA) uses class membership as Y. When combined with effective feature selection, ensemble-based approaches such as RF and GBM consistently demonstrate superior predictive performance in transcriptomic analyses [59]. A recent finding similarly indicates that RF and GBM models trained on blood RNA-seq data achieved strong predictive performance for the CSF pTau/Aβ42 ratio in independent test sets [60].

3.5. Overall Advantages and Challenges of Transcriptomic Biomarkers

Blood-based transcriptomic approaches offer a less invasive and more scalable alternative to CSF and PET methods, making them more suitable for widespread use and repeated testing. These advances provide insights into entire biological changes in the organism and help uncover underlying disease mechanisms at the pathway level. Practical gene panels that are easier to interpret, combined with machine learning, for clinical use. One of the disadvantages of blood gene expression data is that is highly dynamic and can be influenced by factors like diet, medications, and daily rhythms, introducing variability into the results. The other disadvantage is the differences in sequencing technologies and data analysis methods which might hinder consistency across studies and necessitating standardization. In addition, many studies rely on relatively small sample sizes, which increases the risk of overfitting and limits how well the findings apply to other populations. Another limitation is that signals from peripheral blood may not fully reflect the complex changes occurring in the brain. These approaches become more powerful together with ML and DL methods which are still advancing. Their success will depend on better integration with existing biomarkers and thorough validation across diverse and independent datasets.

4. Deep Learning and Advanced Models

Recent enhancements in deep learning (DL) have released novel opportunities for biomarker detection in Alzheimer’s disease (AD). Table 2 demonstrates different DL models and their strength and limitations. Based on neural networks (NN) with multiple layers, excel at capturing complex, non-linear interactions among features and can automatically learn higher-order representations from raw data.

4.1. Deep Neural Networks (DNNs)

Deep neural networks (DNNs) are multilayer architectures that learn nonlinear transformations from input data through a hierarchy of hidden representations [61]. Each layer performs the following operation:
h ( l ) = f ( W l h l 1 + b ( l ) ) ,   l = 1 , 2 . . L
where f ( ) denotes a nonlinear activation function, and W ( l ) and b ( l ) represent the learnable weight and bias parameters, respectively. At each layer, the activation function is applied to the linear combination of the previous layer’s outputs, producing a new latent feature representation that is propagated forward through the network. In the output layer, the model computes the final prediction as:
y ^ = σ ( W ( L ) h ( L 1 ) + b L )
where h ( L 1 ) denotes the output of the last hidden layer and σ ( ) is the activation function commonly used for classification, such as the sigmoid or softmax function. By producing class probabilities or continuous outputs based on the task, this final layer maps learned features to the prediction space.
DNNs, by incorporating multiple hidden layers, extend classical feedforward designs to learn hierarchical projections of gene expression data. DNNs can identify subtle nonlinear dependencies and complex gene interactions on Alzheimer’s disease (AD) transcriptomic data that might be hidden by shallow learning techniques. Recent studies have demonstrated that DNNs trained on selected DEGs can distinguish AD patients from healthy controls with high accuracy. Outstandingly, small gene panels were adequate to achieve high accuracy in DNN-based models. They often outperform traditional machine learning classifiers [62,63]. Thus, deep learning can efficiently downgrade large scale transcriptomic profiles into highly prognostic molecular signatures [64].

4.2. Convolutional Neural Networks (CNNs)

CNNs are architectures of DL constructed to process data such as images and extract hierarchical longitudinal landscapes automatically. Its primary purpose is to globally detect local and spatial dependencies in the data. A typical CNN architecture consists of a convolutional layer, a nonlinear activation layer, and a pooling layer. In each convolutional layer, feature maps are generated by convolving the input data X with learnable filters (weight matrices) W to identify locally correlated subregions. Because similar patterns may occur at different spatial positions, filters are applied across the entire input domain, allowing parameter sharing and reducing the number of trainable parameters, which improves training efficiency [65].
The convolution operation for a single output feature map can be expressed as:
Y i , j ( k ) = σ m = 1 M p P q Q W p , q , m k . X i + p , j + q , m + b ( k )
where X denotes the input value; m is the channel index; W and b are the filter weights and bias, respectively; and σ ( ) is the nonlinear activation function.
The pooling procedure downgrades the spatial dimensionality of feature maps and neglect translational invariance. It is defined as:
P i , j ( k ) = m a x ( p , q ) ϵ Ω Y i + p , J + q ( k )
Although CNNs are conventionally associated with image analysis, they have been adapted for transcriptomic data by representing gene expression vectors as structured inputs. CNNs can model local dependencies and interactions among groups of genes, analogous to how they identify spatial patterns in images [66]. Recent studies have shown that CNNs trained on blood RNA-seq data can achieve performance comparable to deep neural networks (DNNs), while offering the advantage of identifying localized gene interaction features that may correspond to biological pathways [67]. Convolutional neural networks (CNNs) which is known as a conventional method for image analysis. In recent years, it has been adapted for transcriptomic data utilizing gene expression vectors as structured inputs [66]. This allows CNNs to model genes and their interactions, likewise identification of spatial patterns in images. Recent research contributes that CNNs trained on blood RNA-seq data can perform comparably to DNNs [68].

4.3. Autoencoders and Representation Learning

Autoencoders are unsupervised learning models designed to compress input data into a low-dimensional code while minimizing reconstruction loss. They consist of two primary components: an encoder and a decoder. The encoder maps the input data X   into a compact latent representation Z through a function Z = f ( X ) , whereas the decoder reconstructs the input from this latent code using X ^ = g ( Z ) .
While the encoder transforms the input into a latent feature space, the decoder attempts to reconstruct the original data from this representation, typically through one or more linear or non-linear layers. The model parameters are optimized by minimizing the reconstruction loss, defined as:
L x , g ( f ( X ) ) = X X ^ 2 2
To prevent the network from learning trivial identity mappings, additional constraints are often imposed, such as limiting the latent code dimensionality or enforcing sparsity in the latent representation [69]. Autoencoders can compress high-dimensional gene expression profiles into lower-dimensional representations. These latent features can then be used for classification or clustering. In AD research, autoencoders have been applied to reduce noise, uncover hidden structure in transcriptomic data, and facilitate integration with other omics layers [70,71].

4.4. Ensemble and Hybrid Models

Ensemble approaches combining machine and deep learning models have gained popularity in recent years. It performs feature selection using LASSO followed by DL classification as shown in Table 2. Whilst combining predictions from RF, ranger, GBM, and DNN algorithms to improve robustness, stacking models integrates outputs from multiple algorithms for final decision-making. These approaches frequently yield the most accurate model predictions by leveraging the corresponding gains of each algorithm.

4.5. Bayesian Probabilistic Models

Bayesian Probabilistic Models are models that explicitly model uncertainty using Bayes' theorem. They update the probability of a parameter or hypothesis with observations, allowing it to be treated as a probability distribution. Because biomedical data is complex and noisy, previous studies or biological data can be defined as prior probabilities, and results can be evaluated as intervals, Bayesian probability models are important [72]. Probabilistic models such as Gaussian mixture modeling (GMM) and hidden Markov models (HMM) provide an alternative approach by modeling continuous biomarker distributions. GMM can estimate the probability of belonging to an AD-related group based on gene expression patterns, thereby capturing disease heterogeneity [73,74].
p x = k = 1 K π N ( x | μ , Σ ) "
where p ( x ) is the probability density of observation x ,   π is the mixing coefficient for component k (with Σ   π = 1 ), and N ( x   |   μ ,   Σ ) denotes a multivariate normal distribution with mean μₖ and covariance matrix Σ.
Kang et al. (2019) extended parametric HMMs to incorporate the functional impact of the hippocampus and age on cognitive decline across four distinct neurodegenerative states [75,76].
P X , Z = P Z 1 t = 2 T P Z t Z t 1 t = 1 T P X t Z t
where Zₜ represents the hidden state at time t , and X is the observed data conditioned on that state. Whilst binomial categorization constrains outcomes to fixed diagnostic labels, Bayesian and mixture models enabled continuous risk prediction, providing a more detailed understanding of biomarker status.

4.6. Benefits and Limitations of DL and Bayesian Models in Biomarker Research

One of the advantages of DL models over simpler models is the ability to capture complex and nonlinear relationships that simpler methods might miss. Deep learning leads to improved prediction performance, especially when working with high-dimensional transcriptomics data. DL models naturally learn fundamental features from data, reducing the need for manual feature selection. On the other hand, Bayesian probabilistic and mixture models allow more flexible estimations of continuous risk rather than just categorical values. DL models require big data to perform well. However, samples size of transcriptomic data in clinically diagnosed diseases are often relatively small. It is not always clear which genes or patterns are driving the predictions. There is also a risk of overfitting. These mean the model learns the training data too closely but fails to generalize on an independent dataset. Training DL models can be computationally expensive and time consuming compared to more traditional ML techniques.
Despite of these limitations, DL and other cutting-edge models are increasingly being integrated into AD biomarker investigation [77,78]. They have promising findings in extracting clinical signatures from complex transcriptomic data. Together with DL and Bayesian models can employ feature selection and ensembling techniques.

5. Multi-Layer Omics Integration for Alzheimer’s Disease Insight

Alzheimer's disease is a complex neurodegenerative disorder influenced by epigenetic, proteomic, and metabolic factors. Although transcriptomic signatures derived from blood provide valuable insights into disease-associated alterations, gene expression data alone offer a limited perspective on disease status without integration with additional biological layers such as proteins and metabolites. Integrative omics and systems biology approaches aim to overcome this limitation by combining multiple data types to achieve a more comprehensive understanding of disease mechanisms and to identify key regulatory or hub genes. In this context, Figure 3 shows the integration of genomics, transcriptomics, proteomics, and metabolomics facilitates the reconstruction of molecular networks that span different biological levels.
Genomic variation studies, particularly genome-wide association studies, have identified numerous risk loci associated with Alzheimer’s disease, including APOE, BIN1, CLU, and TREM2 [79] When these genetic findings are integrated with transcriptomic alterations, it becomes possible to link inherited risk factors to downstream functional consequences at the expression level. In addition, proteomic analyses of plasma have identified candidate biomarkers such as clustering and components of the complement system. This also leads overlapping features with transcriptomic signals [80]. Metabolomic profiling has revealed changes in lipid metabolism and energy pathways. They are recognized as central features of Alzheimer’s disease and can be correlated with gene expression alterations [81]. As a result, multi-omics approaches offer a more holistic understanding of disease biology. That also facilitates to identify cross-layer molecular signatures. This would eventually enable the accuracy and robustness of predictive models.

5.1. Network Biology and Regulatory RNA Integration in Alzheimer’s Disease

Identifying hub genes occupy central positions within disease-relevant pathways. To identify hub genes, systems biology techniques in Alzheimer's disease should be utilized. The protein–protein interaction networks can be studied on analytical platforms such as Cytoscape and its plugin cytoHubba [83] enable the prioritization of these hub genes using topological algorithms including degree, maximal clique centrality, and betweenness centrality. Another method is gene co-expression network analysis [84].
Findings from multiple RNA-seq studies consistently highlight key genes such as APP [85], MAPT (tau) [86], BDNF [85], and NTRK2 [87] as highly connected nodes, many of which are already well established in Alzheimer’s disease pathology. The fact that this gene sequence has been replicated using other datasets supports network-based techniques. This also enables the discovery of novel hub genes. These genes may represent previously unrecognized therapeutic targets. The integration of non-coding RNAs into these network structures offers an additional monitoring perspective which simplifies biological interpretation. To our best knowledge, the key role of non-coding, long non-coding, and microRNAs in modulating gene expression in Alzheimer’s disease is vital. Integration of these RNA types allow for the reconstruction of complex regulatory circuits. For instance, miR-16-5p and miR-34a-5p have been exhibited to regulate BACE1 and tau phosphorylation, directly combining them to amyloid processing and tau pathology [88]. Whilst miR-26a-5p suppresses tau phosphorylation through modulation of GSK-3β activity [89,90]. In addition, dysregulated long non-coding RNAs have been associated in synaptic plasticity and immune signaling pathways [91,92]. The integration of these regulatory RNA layers with mRNA expression data improves predictive model performance. This consolidative approach also deciphers hidden insights into the molecular and transcriptomics mechanisms of fundamental disease progression.

5.2. Pathway-Level Interpretation and Challenges in Integrative Omics

The pathway enrichment tools like GO, KEGG, and Reactome translated are essential for translating the hub gene sets into biological mechanisms and functional disease networks. These approaches enable the mapping of differential genes onto interpretable biological processes and signaling pathways. In blood-based Alzheimer’s disease studies, consistently enriched pathways include immune response and inflammation [93,94], oxidative stress and mitochondrial dysfunction [95,96], synaptic signaling and plasticity [97,98,99], and cell cycle and apoptosis [100,101]. Such findings highlight the systemic nature of the disease. And these results further reinforce the key role of transcriptomic data in bagging biologically related changes.
Integrative omics and systems biology advances are challenged by the need for even before imitating an RNA-seq study [102]. One of the primary difficulties lies in data integration, as different omics layers often vary substantially in scale, noise structure, and susceptibility to batch effects, complicating their joint analysis. In addition, the high cost and technical demands of multi-omics profiling frequently result in relatively small sample sizes, which can limit statistical power and generalizability. The computational burden associated with network inference and multi-layer integration also remains substantial, requiring sophisticated algorithms and significant computational resources. Furthermore, findings derived from systems-level analyses must ultimately be validated in independent cohorts and experimental models to ensure biological and clinical relevance. Nevertheless, the integration of transcriptomics with non-coding RNA data, proteomics, and metabolomics within systems biology and machine learning or deep learning frameworks represents a powerful and promising strategy for identifying robust, mechanistically informed biomarkers of Alzheimer’s disease.

6. Translational and Clinical Perspectives

The goal of biomarker discovery in Alzheimer’s disease (AD) is to translate findings from research into clinical tools that improve diagnosis, prognosis, and treatment as explained in Figure 4. Blood-based transcriptomic biomarkers together with machine learning (ML) and deep learning (DL) approaches can improve predictions of AD diagnosis and staging. The transition from discovery to clinical application should be rigorous to certify practicality, reproducibility, and implementation of regulatory frameworks.

6.1. Potential Clinical Applications

Blood RNA-seq transcriptomics can be utilized to detect Alzheimer’s disease early on, thanks to advanced clinical trial designs and personalized treatments. The researchers can detect significant top up- and down-regulated genes utilizing differentially expressed genes analysis tools [103,104]. The DE genes can later be utilized as signatures in ML/DL models to diagnose disease or treatment response. Finally, blood CSF and RNA-seq data can counterpart present biomarkers such as plasma proteins or imaging methods. This approach would improve the diagnostics performance in terms of model prediction’s sensitivity, specificity, and accuracy of the correct diagnostic framework.

6.2. Feasibility of Small Gene Panels

Small gene panels offer greater feasibility than whole-transcriptome signatures in a translational perspective. Clinical assays such as quantitative PCR and targeted sequencing enable the measurement of a limited number of genes at a reduced cost and with increased throughput. For instance, DL models have identified that panels revealing only five genes APP, MAPT (Tau), BDNF, NTRK2, and PSEN1 that can achieve perfect classification accuracy. Thus, these results demonstrate clinically relevant assays are achievable [105,106].

6.3. Reproducibility and Cohort Diversity

Blood-based RNA-seq signatures identified within a single dataset often fail to generalize across independent cohorts, raising important concerns regarding reproducibility and robustness. The differences are often driven by discrepancy across studies. These can be summarized such demographic factors as age, sex, and ethnicity. Variations in clinical characteristics like comorbidities and medication use is another challenge. In addition, technical hardship such as following RNA extraction protocols, sequencing platforms, and normalization strategies can introduce further variability. All these inevitable reasons in the end affect downstream analyses. To address these challenges, future studies should arrange retrospective validation techniques across multiple datasets. And ensure the inclusion of diverse and well-characterized populations, thus refining the repeatability and reproducibility of detected biomarkers.

6.4. Clinical Measures and Ethical Interest

Transcriptomic biomarkers should properly adapt in clinical practice and to fit into listed requirements. Standardizing blood sample collection, RNA isolation, sequencing, and data analysis is a crucial step. After that, the clinical procedure should follow implementation of rapid, cost-effective tests for clinical labs. And finally, developing tools to translate ML/DL predictions into clinical decisions. Following all these steps with clinical decision support systems and EHRs to increase real-world purpose competence.
Translation research further involves ethical dilemmas and monitoring challenges. Biomarker assays and technology must meet stringent both reproducibility and repeatability with applicable clinical standards. It is crucial to prevent disparities in diagnosis and treatment by ensuring biomarkers perform equally well across populations. Moreover, research using ML/DL models on big data raises concerns about patient confidentiality and data sharing.

6.5. Synergy with Therapeutic Development

Transcriptomic biomarkers may hasten remedial development beyond diagnostics using a list of strategies. One of them is identifying novel drug targets through hub gene and pathway analyses. Another one is identifying novel drug targets through hub gene and pathway analyses. Lastly, supporting drug repurposing by linking dysregulated pathways to existing pharmacological agents can be useful for integrating transcriptomics with therapeutic purposes. Thus, blood-based RNA biomarkers could not only improve clinical diagnosis but also facilitate precision medicine approaches in AD therapeutics.

7. Discussion

Alzheimer’s disease presently lacks effective and less expensive treatments. CSF measurements and PET imaging are key measures for ultimate diagnosis. However, their cost and invasiveness limit their clinical applications such as preventing disease progression and monitoring patients with AD. Developing alternative treatments also another challenge of AD. Precise and available biomarkers are urgently needed before symptoms show up for each AD candidate [28,77]. Plasma protein biomarkers like phosphorylated tau and neurofilament light offer important advances but still struggle with sensitivity, specificity, and reproducibility across populations [7,80]. Blood transcriptomic biomarkers complement this approach [7,107,108]. Researchers may find minimally invasive diagnostic methods by identifying systemic gene expression changes associated with Alzheimer’s disease pathology. As is known, the importance of gene expression was proven by the first microarray studies. RNA sequencing (RNA-seq) expanded the field of discovery to include hub genes and non-coding RNAs [52,92]. Besides, clinically feasible gene panels can be distilled from transcriptomic signatures.
Feature selection algorithms such as LASSO and Ridge regression help identify compact gene sets with strong predictive power. Also, classifiers such as random forests (RF) and gradient boosting machines (GBM) consistently achieve robust performance [73,109]. Furthermore, the utilization of machine learning (ML) and deep learning (DL) approaches has transformed biomarker novelty from descriptive analyses into predictive modeling [53,110]. Predictive Models suitable for data with time-varying and random heterogeneous mixtures, such as deep neural networks (DNNs), convolutional neural networks (CNNs), and Gaussian mixture modeling (GMM) or hidden Markov Models (HMMs), further improve prediction accuracy and provide continuous disease risk estimates [72,74,111,112]. The results of these predictive models can be combined with systems biology approaches such as network analysis, pathway enrichment, and integrative multi-omics.

8. Conclusions and Future Directions

These computational and biological techniques can contribute to the discovery of new diagnostic signatures. Despite of significant limitations such as small sample sizes, less population diversity, variability in etiquettes, the discovered genes may reserve to the breakthrough of new therapeutic developments. Techniques such as single-cell RNA sequencing, spatial transcriptomics, and federated learning will further enrich these field and open new avenues for discovery. Consequently, blood-based RNA biomarkers analyzed using machine learning and deep learning methods will necessitate rapid early diagnosis and treatment of Alzheimer's disease and other neurodegenerative diseases. These biomarkers can complement established biomarkers, enabling early diagnosis and precision medicine. To anticipate this potential, it is also necessary to bring together deep expertise from the fields of computational biology, neuroscience, clinical research, and regulatory multidisciplinary sciences. By combining artificial intelligence technology with blood biomarkers, RNA-based biomarker strategies could play a significant role in the diagnosis and treatment of Alzheimer's disease and other genetic diseases within the next decade and could reduce costs.

Author Contributions

Conceptualization, E.G. and M.A.; methodology, E.G., K.A., M.A., S.A.A, and A.K. ; software, E.G. and S.A.A.; validation, E.G.; investigation, E.G.; writing—original draft preparation, E.G., S.A.A, A.K., K.A., T.A., and M.A; writing—review and editing, E.G., K.A., S.A.A, A.K., T.A., and M.A.; visualization, E.G. and A.K.; supervision, E.G.; project administration, E.G. and M.A.; All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

We thank for the support of Düzce University for providing basic computational facilities.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

AD Alzheimer’s Disease
CSF Cerebrospinal Fluid
PET Positron Emission Tomography
AI Artificial Intelligence
ML Machine Learning
DL Deep Learning
WHO World Health Organization
NfL Neurofilament Light Chain
DEGs Differentially Expressed Genes
GBM Gradient Boosting Machines
GMM Gaussian Mixture Modeling
HMM Hidden Markov Model
pTau Phosphorylated tau
Aβ42 Amyloid-β42
PLS Partial Least Squares
PLS-DA Partial Least Squares Discriminant Analysis
MCI Cognitive Impairment
RNA-seq RNA Sequencing
ncRNAs Non-coding RNAs
lncRNAs Long non-coding RNAs
miRNAs MicroRNAs
circRNAs Circular RNAs
RF Random Forest
SVM Support Vector Machines
NN Neural Networks

References

  1. Almeida, Z.L.; Vaz, D.C.; Brito, R.M. Morphological and Molecular Profiling of Amyloid-β Species in Alzheimer’s Pathogenesis. Mol. Neurobiol. 2025, 62, 4391–4419. [CrossRef]
  2. Gustavsson, A.; Norton, N.; Fast, T.; Frölich, L.; Georges, J.; Holzapfel, D.; Kirabali, T.; Krolak-Salmon, P.; Rossini, P.M.; Ferretti, M.T. Global Estimates on the Number of Persons across the Alzheimer’s Disease Continuum. Alzheimers Dement. 2023, 19, 658–670. [CrossRef]
  3. Zhang, H.; Wei, W.; Zhao, M.; Ma, L.; Jiang, X.; Pei, H.; Cao, Y.; Li, H. Interaction between Aβ and Tau in the Pathogenesis of Alzheimer’s Disease. Int. J. Biol. Sci. 2021, 17, 2181. [CrossRef]
  4. Vos, S.J.; Gordon, B.A.; Su, Y.; Visser, P.J.; Holtzman, D.M.; Morris, J.C.; Fagan, A.M.; Benzinger, T.L. NIA-AA Staging of Preclinical Alzheimer Disease: Discordance and Concordance of CSF and Imaging Biomarkers. Neurobiol. Aging 2016, 44, 1–8. [CrossRef]
  5. Petersen, R.C.; Wiste, H.J.; Weigand, S.D.; Fields, J.A.; Geda, Y.E.; Graff-Radford, J.; Knopman, D.S.; Kremers, W.K.; Lowe, V.; Machulda, M.M. NIA-AA Alzheimer’s Disease Framework: Clinical Characterization of Stages. Ann. Neurol. 2021, 89, 1145–1156.
  6. Alkhalifa, A.E.; Al Mokhlf, A.; Ali, H.; Al-Ghraiybah, N.F.; Syropoulou, V. Anti-Amyloid Monoclonal Antibodies for Alzheimer’s Disease: Evidence, ARIA Risk, and Precision Patient Selection. J. Pers. Med. 2025, 15, 437. [CrossRef] [PubMed]
  7. Chen, M.; Xia, W. Proteomic Profiling of Plasma and Brain Tissue from Alzheimer’s Disease Patients Reveals Candidate Network of Plasma Biomarkers. J. Alzheimer’s Dis. 2020, 76, 349–368. [CrossRef]
  8. Saha, G.B. Basics of PET Imaging: Physics, Chemistry, and Regulations; Springer, 2005;
  9. Hampel, H.; Shaw, L.M.; Aisen, P.; Chen, C.; Lleó, A.; Iwatsubo, T.; Iwata, A.; Yamada, M.; Ikeuchi, T.; Jia, J. State-of-the-art of Lumbar Puncture and Its Place in the Journey of Patients with Alzheimer’s Disease. Alzheimers Dement. 2022, 18, 159–177. [CrossRef] [PubMed]
  10. Chatterjee, P.; Pedrini, S.; Doecke, J.D.; Thota, R.; Villemagne, V.L.; Doré, V.; Singh, A.K.; Wang, P.; Rainey-Smith, S.; Fowler, C. Plasma Aβ42/40 Ratio, P-tau181, GFAP, and NfL across the Alzheimer’s Disease Continuum: A Cross-sectional and Longitudinal Study in the AIBL Cohort. Alzheimers Dement. 2023, 19, 1117–1134. [PubMed]
  11. Rauchmann, B.S.; Schneider-Axmann, T.; Perneczky, R. Associations of Longitudinal Plasma P-Tau181 and NfL with Tau-PET, Aβ-PET and Cognition. J. Neurol. Neurosurg. Psychiatry 2021, 92, 1289–1295. [CrossRef]
  12. Thambisetty, M.; Lovestone, S. Blood-Based Biomarkers of Alzheimer’s Disease: Challenging but Feasible. Biomark. Med. 2010, 4, 65–79. [CrossRef]
  13. Solier, C.; Langen, H. Antibody-based Proteomics and Biomarker Research—Current Status and Limitations. Proteomics 2014, 14, 774–783.
  14. Donaghy, P.C.; Cockell, S.J.; Martin-Ruiz, C.; Coxhead, J.; Kane, J.; Erskine, D.; Koss, D.; Taylor, J.-P.; Morris, C.M.; O’Brien, J.T. Blood mRNA Expression in Alzheimer’s Disease and Dementia with Lewy Bodies. Am. J. Geriatr. Psychiatry 2022, 30, 964–975.
  15. Puthiyedth, N.; Riveros, C.; Berretta, R.; Moscato, P. Identification of Differentially Expressed Genes through Integrated Study of Alzheimer’s Disease Affected Brain Regions. PloS One 2016, 11, e0152342.
  16. Bottero, V.; Potashkin, J.A. Meta-Analysis of Gene Expression Changes in the Blood of Patients with Mild Cognitive Impairment and Alzheimer’s Disease Dementia. Int. J. Mol. Sci. 2019, 20, 5403. [CrossRef]
  17. Yoon, S.; Kim, S.E.; Ko, Y.; Jeong, G.H.; Lee, K.H.; Lee, J.; Solmi, M.; Jacob, L.; Smith, L.; Stickley, A. Differential Expression of MicroRNAs in Alzheimer’s Disease: A Systematic Review and Meta-Analysis. Mol. Psychiatry 2022, 27, 2405–2413.
  18. Lan, K.; Wang, D.; Fong, S.; Liu, L.; Wong, K.K.; Dey, N. A Survey of Data Mining and Deep Learning in Bioinformatics. J. Med. Syst. 2018, 42, 139.
  19. Baldi, P.; Brunak, S. Bioinformatics: The Machine Learning Approach; MIT press, 2001; ISBN 0-262-02506-X.
  20. Kashyap, H.; Ahmed, H.A.; Hoque, N.; Roy, S.; Bhattacharyya, D.K. Big Data Analytics in Bioinformatics: A Machine Learning Perspective. ArXiv 2015. [CrossRef]
  21. Diaa, N.M.; Abed, M.Q.; Taha, S.W.; Ali, M. Machine Learning and Traditional Statistics Integrative Approaches for Bioinformatics. J. Ecohumanism 2024, 3, 335–352. [CrossRef]
  22. Abbasi, A.F.; Naveed, S.; Asim, M.N.; Sajjad, M.; Dengel, A.; Vollmer, S. Artificial Intelligence Powered Biomarker Discovery: A Large-Scale Analysis of 236 Studies Across 19 Therapeutic Areas and 147 Diseases. bioRxiv 2025, 2025–08. [CrossRef]
  23. Zhang, Y.; Shen, S.; Li, X.; Wang, S.; Xiao, Z.; Cheng, J.; Li, R. A Multiclass Extreme Gradient Boosting Model for Evaluation of Transcriptomic Biomarkers in Alzheimer’s Disease Prediction. Neurosci. Lett. 2024, 821, 137609.
  24. Shigemizu, D.; Mori, T.; Akiyama, S.; Higaki, S.; Watanabe, H.; Sakurai, T.; Niida, S.; Ozaki, K. Identification of Potential Blood Biomarkers for Early Diagnosis of Alzheimer’s Disease through RNA Sequencing Analysis. Alzheimers Res. Ther. 2020, 12, 87. [CrossRef] [PubMed]
  25. Wang, Y.; Zhu, T.; Cheng, Q.; Cui, X.; Zhang, P.; Lu, Z.; Alzheimer’s Disease Neuroimaging Initiative (ADNI)* Predicting Brain Health in Community-Dwelling Elderly Populations by Integrating Gaussian Mixture Model and Plasma Biomarkers. J. Alzheimers Dis. Rep. 2025, 9, 25424823251331110.
  26. Leng, N.; Li, Y.; McIntosh, B.E.; Nguyen, B.K.; Duffin, B.; Tian, S.; Thomson, J.A.; Dewey, C.N.; Stewart, R.; Kendziorski, C. EBSeq-HMM: A Bayesian Approach for Identifying Gene-Expression Changes in Ordered RNA-Seq Experiments. Bioinformatics 2015, 31, 2614–2622. [CrossRef] [PubMed]
  27. Bonizzoni, M.; Dunn, W.A.; Campbell, C.L.; Olson, K.E.; Dimon, M.T.; Marinotti, O.; James, A.A. RNA-Seq Analyses of Blood-Induced Changes in Gene Expression in the Mosquito Vector Species, Aedes Aegypti. BMC Genomics 2011, 12, 82. [CrossRef] [PubMed]
  28. Blennow, K. A Review of Fluid Biomarkers for Alzheimer’s Disease: Moving from CSF to Blood. Neurol. Ther. 2017, 6, 15–24. [CrossRef]
  29. Blennow, K.; Dubois, B.; Fagan, A.M.; Lewczuk, P.; De Leon, M.J.; Hampel, H. Clinical Utility of Cerebrospinal Fluid Biomarkers in the Diagnosis of Early Alzheimer’s Disease. Alzheimers Dement. 2015, 11, 58–69.
  30. Barthélemy, N.R.; Salvadó, G.; Schindler, S.E.; He, Y.; Janelidze, S.; Collij, L.E.; Saef, B.; Henson, R.L.; Chen, C.D.; Gordon, B.A. Highly Accurate Blood Test for Alzheimer’s Disease Is Similar or Superior to Clinical Cerebrospinal Fluid Tests. Nat. Med. 2024, 30, 1085–1095. [CrossRef]
  31. Ou, Z.; Pan, Y.; Li, Y.; Xie, F.; Guo, Q.; Shen, D. Synthesizing Aβ-Pet via an Image and Label Conditioning Latent Diffusion Model for Detecting Amyloid Status.; IEEE, 2024; pp. 6610–6614.
  32. Cecchetti, G.; Agosta, F.; Rugarli, G.; Spinelli, E.G.; Ghirelli, A.; Zavarella, M.; Bottale, I.; Orlandi, F.; Santangelo, R.; Caso, F. Diagnostic Accuracy of Automated Lumipulse Plasma pTau-217 in Alzheimer’s Disease: A Real-World Study. J. Neurol. 2024, 271, 6739–6749.
  33. Lewczuk, P.; Łukaszewicz-Zając, M.; Kornhuber, J.; Mroczko, B. Clinical Significance of Plasma Candidate Biomarkers of Alzheimer’s Disease. Neurol. Neurochir. Pol. 2024, 58, 363–379.
  34. Koivumäki, M.; Ekblad, L.; Lantero-Rodriguez, J.; Ashton, N.J.; Karikari, T.K.; Helin, S.; Parkkola, R.; Lötjönen, J.; Zetterberg, H.; Blennow, K. Blood Biomarkers of Neurodegeneration Associate Differently with Amyloid Deposition, Medial Temporal Atrophy, and Cerebrovascular Changes in APOE Ε4-Enriched Cognitively Unimpaired Elderly. Alzheimers Res. Ther. 2024, 16, 112. [CrossRef] [PubMed]
  35. Weber, D.M.; Taylor, S.W.; Lagier, R.J.; Kim, J.C.; Goldman, S.M.; Clarke, N.J.; Vaillancourt, D.E.; Duara, R.; McFarland, K.N.; Wang, W. Clinical Utility of Plasma Aβ42/40 Ratio by LC-MS/MS in Alzheimer’s Disease Assessment. Front. Neurol. 2024, 15, 1364658.
  36. Vrillon, A.; Bousiges, O.; Götze, K.; Demuynck, C.; Muller, C.; Ravier, A.; Schorr, B.; Philippi, N.; Hourregue, C.; Cognat, E. Plasma Biomarkers of Amyloid, Tau, Axonal, and Neuroinflammation Pathologies in Dementia with Lewy Bodies. Alzheimers Res. Ther. 2024, 16, 146. [CrossRef]
  37. Dong, R.; Yi, N.; Jiang, D. Advances in Single Molecule Arrays (SIMOA) for Ultra-Sensitive Detection of Biomolecules. Talanta 2024, 270, 125529. [CrossRef]
  38. Chen, Y.; Wang, Y.; Tao, Q.; Lu, P.; Meng, F.; Zhuang, L.; Qiao, S.; Zhang, Y.; Luo, B.; Liu, Y. Diagnostic Value of Isolated Plasma Biomarkers and Its Combination in Neurodegenerative Dementias: A Multicenter Cohort Study. Clin. Chim. Acta 2024, 558, 118784. [CrossRef] [PubMed]
  39. Doecke, J.D.; Bellomo, G.; Vermunt, L.; Alcolea, D.; Halbgebauer, S.; in’t Veld, S.; Mattsson-Carlgren, N.; Veverova, K.; Fowler, C.J.; Boonkamp, L. Diagnostic Performance of Plasma Aβ42/40 Ratio, P-tau181, GFAP, and NfL along the Continuum of Alzheimer’s Disease and non-AD Dementias: An International Multi-center Study. Alzheimers Dement. 2025, 21, e14573. [CrossRef]
  40. Cadoni, M.P.L.; Coradduzza, D.; Congiargiu, A.; Sedda, S.; Zinellu, A.; Medici, S.; Nivoli, A.M.; Carru, C. Platelet Dynamics in Neurodegenerative Disorders: Investigating the Role of Platelets in Neurological Pathology. J. Clin. Med. 2024, 13, 2102. [CrossRef]
  41. Team, R.C. RA Language and Environment for Statistical Computing, R Foundation for Statistical. Computing 2020.
  42. Naughton, B.J.; Duncan, F.J.; Murrey, D.A.; Meadows, A.S.; Newsom, D.E.; Stoicea, N.; White, P.; Scharre, D.W.; Mccarty, D.M.; Fu, H. Blood Genome-Wide Transcriptional Profiles Reflect Broad Molecular Impairments and Strong Blood-Brain Links in Alzheimer’s Disease. J. Alzheimer’s Dis. 2014, 43, 93–108. [CrossRef] [PubMed]
  43. Murphy, D. Gene Expression Studies Using Microarrays: Principles, Problems, and Prospects. Adv. Physiol. Educ. 2002, 26, 256–270. [CrossRef]
  44. Booij, B.B.; Lindahl, T.; Wetterberg, P.; Skaane, N.V.; Sæbø, S.; Feten, G.; Rye, P.D.; Kristiansen, L.I.; Hagen, N.; Jensen, M. A Gene Expression Pattern in Blood for the Early Detection of Alzheimer’s Disease. J. Alzheimer’s Dis. 2011, 23, 109–119. [CrossRef]
  45. Lunnon, K.; Sattlecker, M.; Furney, S.J.; Coppola, G.; Simmons, A.; Proitsi, P.; Lupton, M.K.; Lourdusamy, A.; Johnston, C.; Soininen, H. A Blood Gene Expression Marker of Early Alzheimer’s Disease. J. Alzheimer’s Dis. 2013, 33, 737–753. [CrossRef] [PubMed]
  46. Roed, L.; Grave, G.; Lindahl, T.; Rian, E.; Horndalsveen, P.O.; Lannfelt, L.; Nilsson, C.; Swenson, F.; Lönneborg, A.; Sharma, P. Prediction of Mild Cognitive Impairment That Evolves into Alzheimer’s Disease Dementia within Two Years Using a Gene Expression Signature in Blood: A Pilot Study. J. Alzheimer’s Dis. 2013, 35, 611–621.
  47. Fehlbaum-Beurdeley, P.; Jarrige-Le Prado, A.C.; Pallares, D.; Carrière, J.; Guihal, C.; Soucaille, C.; Rouet, F.; Drouin, D.; Sol, O.; Jordan, H. Toward an Alzheimer’s Disease Diagnosis via High-Resolution Blood Gene Expression. Alzheimers Dement. 2010, 6, 25–38. [CrossRef]
  48. Huan, T.; Tran, T.; Zheng, J.; Sapkota, S.; MacDonald, S.W.; Camicioli, R.; Dixon, R.A.; Li, L. Metabolomics Analyses of Saliva Detect Novel Biomarkers of Alzheimer’s Disease. J. Alzheimer’s Dis. 2018, 65, 1401–1416. [CrossRef]
  49. Perera, S.; Hewage, K.; Gunarathne, C.; Navarathna, R.; Herath, D.; Ragel, R.G. Detection of Novel Biomarker Genes of Alzheimer’s Disease Using Gene Expression Data.; IEEE, 2020; pp. 1–6.
  50. Madar, I.H.; Sultan, G.; Tayubi, I.A.; Hasan, A.N.; Pahi, B.; Rai, A.; Sivanandan, P.K.; Loganathan, T.; Begum, M.; Rai, S. Identification of Marker Genes in Alzheimer’s Disease Using a Machine-Learning Model. Bioinformation 2021, 17, 348. [CrossRef]
  51. de Gonzalo-Calvo, D.; Karaduzovic-Hadziabdic, K.; Dalgaard, L.T.; Dieterich, C.; Perez-Pons, M.; Hatzigeorgiou, A.; Devaux, Y.; Kararigas, G. Machine Learning for Catalysing the Integration of Noncoding RNA in Research and Clinical Practice. EBioMedicine 2024, 106. [CrossRef]
  52. Chowdhary, A.; Satagopam, V.; Schneider, R. Long Non-Coding RNAs: Mechanisms, Experimental, and Computational Approaches in Identification, Characterization, and Their Biomarker Potential in Cancer. Front. Genet. 2021, 12, 649619. [CrossRef]
  53. Dhal, P.; Azad, C. A Comprehensive Survey on Feature Selection in the Various Fields of Machine Learning. Appl. Intell. 2022, 52, 4543–4581. [CrossRef]
  54. El-Hasnony, I.M.; Barakat, S.I.; Elhoseny, M.; Mostafa, R.R. Improved Feature Selection Model for Big Data Analytics. IEEE Access 2020, 8, 66989–67004. [CrossRef]
  55. Syam, N.; Kaul, R. Random Forest, Bagging, and Boosting of Decision Trees. In Machine Learning and Artificial Intelligence in Marketing and Sales: Essential Reference for Practitioners and Data Scientists; Emerald Publishing Limited, 2021; pp. 139–182.
  56. Spasov, S.; Passamonti, L.; Duggento, A.; Lio, P.; Toschi, N.; Alzheimer’s Disease Neuroimaging Initiative A Parameter-Efficient Deep Learning Approach to Predict Conversion from Mild Cognitive Impairment to Alzheimer’s Disease. Neuroimage 2019, 189, 276–287. [CrossRef]
  57. Morra, J.H.; Tu, Z.; Apostolova, L.G.; Green, A.E.; Toga, A.W.; Thompson, P.M. Comparison of AdaBoost and Support Vector Machines for Detecting Alzheimer’s Disease through Automated Hippocampal Segmentation. IEEE Trans. Med. Imaging 2009, 29, 30–43. [CrossRef]
  58. Chevallier, S.; Bertrand, D.; Kohler, A.; Courcoux, P. Application of PLS-DA in Multivariate Image Analysis. J. Chemom. J. Chemom. Soc. 2006, 20, 221–229. [CrossRef]
  59. Paplomatas, P.; Krokidis, M.G.; Vlamos, P.; Vrahatis, A.G. An Ensemble Feature Selection Approach for Analysis and Modeling of Transcriptome Data in Alzheimer’s Disease. Appl. Sci. 2023, 13, 2353. [CrossRef]
  60. An, X.; Wang, Y.; Cao, M.; Yi, Z.; Zeng, X.; Yu, W.; Ren, Z. Exploring BSCL2 and Associated Genes in Alzheimer’s Disease by Integrative Analysis of Bioinformatics, Sn-RNAseq and Machine Learning Approach. J. Alzheimers Dis. Rep. 2026, 10, 25424823261423079. [CrossRef]
  61. Patel, H.; Thakkar, A.; Pandya, M.; Makwana, K. Neural Network with Deep Learning Architectures. J. Inf. Optim. Sci. 2018, 39, 31–38. [CrossRef]
  62. Sarma, M.; Chatterjee, S. Machine Learning-Based Alzheimer’s Disease Stage Diagnosis Utilizing Blood Gene Expression and Clinical Data: A Comparative Investigation. Diagnostics 2025, 15, 211. [CrossRef] [PubMed]
  63. Jin, B.; Fei, G.; Sang, S.; Zhong, C. Identification of Biomarkers Differentiating Alzheimer’s Disease from Other Neurodegenerative Diseases by Integrated Bioinformatic Analysis and Machine-Learning Strategies. Front. Mol. Neurosci. 2023, 16, 1152279. [CrossRef]
  64. Wang, Q.; Chen, K.; Su, Y.; Reiman, E.M.; Dudley, J.T.; Readhead, B. Deep Learning-Based Brain Transcriptomic Signatures Associated with the Neuropathological and Clinical Severity of Alzheimer’s Disease. Brain Commun. 2022, 4, fcab293. [CrossRef]
  65. Kavukcuoglu, K.; Ranzato, M.; Fergus, R.; LeCun, Y. Learning Invariant Features through Topographic Filter Maps.; IEEE, 2009; pp. 1605–1612.
  66. Trivedi, M.R.; Joshi, A.M.; Shah, J.; Readhead, B.P.; Wilson, M.A.; Su, Y.; Reiman, E.M.; Wu, T.; Wang, Q. Interpretable Deep Learning Framework for Understanding Molecular Changes in Human Brains with Alzheimer’s Disease: Implications for Microglia Activation and Sex Differences. Npj Aging 2025, 11, 66. [CrossRef]
  67. Gandhewar, N.; Pimpalkar, A.; Jadhav, A.; Shelke, N.; Jain, R. Leveraging Deep Learning for Genomics Analysis: Advances and Applications. Genomics Nexus AI Comput. Vis. Mach. Learn. 2025, 191–225.
  68. Li, Z.; Gao, E.; Zhou, J.; Han, W.; Xu, X.; Gao, X. Applications of Deep Learning in Understanding Gene Regulation. Cell Rep. Methods 2023, 3. [CrossRef]
  69. Cheng, F.; Zhao, Y.; Yang, X. Self-Supervised Cross-Encoder for Neurodegenerative Disease Diagnosis. ArXiv Prepr. ArXiv250907623 2025.
  70. Ballard, J.L.; Wang, Z.; Li, W.; Shen, L.; Long, Q. Deep Learning-Based Approaches for Multi-Omics Data Integration and Analysis. BioData Min. 2024, 17, 38. [CrossRef]
  71. Maj, C.; Azevedo, T.; Giansanti, V.; Borisov, O.; Dimitri, G.M.; Spasov, S.; Alzheimer’s Disease Neuroimaging Initiative; Lió, P.; Merelli, I. Integration of Machine Learning Methods to Dissect Genetically Imputed Transcriptomic Profiles in Alzheimer’s Disease. Front. Genet. 2019, 10, 726.
  72. Alexiou, A.; Mantzavinos, V.D.; Greig, N.H.; Kamal, M.A. A Bayesian Model for the Prediction and Early Diagnosis of Alzheimer’s Disease. Front. Aging Neurosci. 2017, 9, 77.
  73. Nezhadmoghadam, F.; Martinez-Torteya, A.; Treviño, V.; Martínez, E.; Santos, A.; Tamez-Peña, J.; Alzheimer’s Disease Neuroimaging Initiative Robust Discovery of Mild Cognitive Impairment Subtypes and Their Risk of Alzheimer’s Disease Conversion Using Unsupervised Machine Learning and Gaussian Mixture Modeling. Curr. Alzheimer Res. 2021, 18, 595–606.
  74. Song, K.; Zhang, J. GeneTerrain-GMM Unmasks a Coordinated Neuroinflammatory and Cell Death Network Perturbed by Dasatinib in a Human Neuronal Model of Alzheimer’s Disease. bioRxiv 2025, 2025–08.
  75. Vimbi, V.; Shaffi, N.; Mahmud, M. Interpreting Artificial Intelligence Models: A Systematic Review on the Application of LIME and SHAP in Alzheimer’s Disease Detection. Brain Inform. 2024, 11, 10.
  76. Sharma, S.; Singh, M.; McDaid, L.; Bhattacharyya, S. XAI-Based Data Visualization in Multimodal Medical Data. bioRxiv 2025, 2025–07. [CrossRef]
  77. Vrahatis, A.G.; Skolariki, K.; Krokidis, M.G.; Lazaros, K.; Exarchos, T.P.; Vlamos, P. Revolutionizing the Early Detection of Alzheimer’s Disease through Non-Invasive Biomarkers: The Role of Artificial Intelligence and Deep Learning. Sensors 2023, 23, 4184.
  78. Jack, C.R.; Holtzman, D.M. Biomarker Modeling of Alzheimer’s Disease. Neuron 2013, 80, 1347–1358. [CrossRef] [PubMed]
  79. Misra, A.; Chakrabarti, S.S.; Gambhir, I.S. New Genetic Players in Late-Onset Alzheimer’s Disease: Findings of Genome-Wide Association Studies. Indian J. Med. Res. 2018, 148, 135–144. [CrossRef]
  80. Walker, K.A.; Chen, J.; Shi, L.; Yang, Y.; Fornage, M.; Zhou, L.; Schlosser, P.; Surapaneni, A.; Grams, M.E.; Duggan, M.R. Proteomics Analysis of Plasma from Middle-Aged Adults Identifies Protein Markers of Dementia Risk in Later Life. Sci. Transl. Med. 2023, 15, eadf5681. [CrossRef]
  81. Yin, F. Lipid Metabolism and Alzheimer’s Disease: Clinical Evidence, Mechanistic Link and Therapeutic Promise. FEBS J. 2023, 290, 1420–1453. [CrossRef]
  82. Reitz, C.; Pericak-Vance, M.A.; Foroud, T.; Mayeux, R. A Global View of the Genetic Basis of Alzheimer Disease. Nat. Rev. Neurol. 2023, 19, 261–277. [CrossRef]
  83. Ma, Z.; Zhong, P.; Yue, P.; Sun, Z. Identification of Immune-Related Molecular Markers in Intracranial Aneurysm (IA) Based on Machine Learning and Cytoscape-Cytohubba Plug-In. BMC Genomic Data 2023, 24, 20. [CrossRef]
  84. Zhang, B.; Horvath, S. A General Framework for Weighted Gene Co-Expression Network Analysis. Stat. Appl. Genet. Mol. Biol. 2005, 4, 1128. [CrossRef]
  85. Tang, S.; Luo, W.; Cheng, C.; Shen, L.; Wu, X.; Xiao, X. BDNF Gene Therapy Rescues Neuronal Function via Unique and Common Transcriptional Responses in Aβ and Tau-Driven Alzheimer’s Disease Mouse Models. Biochem. Biophys. Rep. 2025, 43, 102089. [CrossRef]
  86. Alkhatabi, H.A.; Pushparaj, P.N. Untangling the Complex Mechanisms Associated with Alzheimer’s Disease in Elderly Patients Using High-Throughput RNA Sequencing Data and next-Generation Knowledge Discovery Methods: Focus on Potential Gene Signatures and Drugs for Dementia. Heliyon 2025, 11. [CrossRef] [PubMed]
  87. Ghayal, N.; Koga, S.; Josephs, K.; Ahlskog, J.; Wszolek, Z.; Dickson, D. American Association of Neuropathologists, Inc. J Neuropathol Exp Neurol 2019, 78, 520–579.
  88. Herrera-Espejo, S.; Santos-Zorrozua, B.; Álvarez-González, P.; Lopez-Lopez, E.; Garcia-Orad, Á. A Systematic Review of microRNA Expression as Biomarker of Late-Onset Alzheimer’s Disease. Mol. Neurobiol. 2019, 56, 8376–8391. [CrossRef]
  89. Huang, Q.; Zhou, Y.; He, H.; Lin, S.; Chen, X. Research Progress on Exosomes and MicroRNAs in the Microenvironment of Postoperative Neurocognitive Disorders. Neurochem. Res. 2022, 47, 3583–3597. [CrossRef] [PubMed]
  90. Wang, L.; Shui, X.; Diao, Y.; Chen, D.; Zhou, Y.; Lee, T.H. Potential Implications of miRNAs in the Pathogenesis, Diagnosis, and Therapeutics of Alzheimer’s Disease. Int. J. Mol. Sci. 2023, 24, 16259. [CrossRef]
  91. Musgrove, M.R.; Mikhaylova, M.; Bredy, T.W. Fundamental Neurochemistry Review: At the Intersection between the Brain and the Immune System: Non-coding RNAs Spanning Learning, Memory and Adaptive Immunity. J. Neurochem. 2024, 168, 961–976. [CrossRef]
  92. Zhu, S.; Wu, J.; Hu, J. Non-Coding RNA in Alcohol Use Disorder by Affecting Synaptic Plasticity. Exp. Brain Res. 2022, 240, 365–379. [CrossRef]
  93. Griswold, A.J.; Sivasankaran, S.K.; Van Booven, D.; Gardner, O.K.; Rajabli, F.; Whitehead, P.L.; Hamilton-Nelson, K.L.; Adams, L.D.; Scott, A.M.; Hofmann, N.K. Immune and Inflammatory Pathways Implicated by Whole Blood Transcriptomic Analysis in a Diverse Ancestry Alzheimer’s Disease Cohort. J. Alzheimer’s Dis. 2020, 76, 1047–1060. [CrossRef]
  94. Guerriero, F.; Sgarlata, C.; Francis, M.; Maurizi, N.; Faragli, A.; Perna, S.; Rondanelli, M.; Rollone, M.; Ricevuti, G. Neuroinflammation, Immune System and Alzheimer Disease: Searching for the Missing Link. Aging Clin. Exp. Res. 2017, 29, 821–831. [CrossRef] [PubMed]
  95. Ferreiro, E.; Baldeiras, I.; Ferreira, I.; Costa, R.; Rego, A.; Pereira, C.; Oliveira, C. Mitochondrial-and Endoplasmic Reticulum-Associated Oxidative Stress in Alzheimer′ s Disease: From Pathogenesis to Biomarkers. Int. J. Cell Biol. 2012, 2012, 735206. [CrossRef] [PubMed]
  96. Tobore, T.O. On the Central Role of Mitochondria Dysfunction and Oxidative Stress in Alzheimer’s Disease. Neurol. Sci. 2019, 40, 1527–1540. [CrossRef]
  97. Godoy, J.A.; Rios, J.A.; Zolezzi, J.M.; Braidy, N.; Inestrosa, N.C. Signaling Pathway Cross Talk in Alzheimer’s Disease. Cell Commun. Signal. 2014, 12, 23. [CrossRef]
  98. D Skaper, S.; Facci, L.; Zusso, M.; Giusti, P. Synaptic Plasticity, Dementia and Alzheimer Disease. CNS Neurol. Disord.-Drug Targets Former. Curr. Drug Targets-CNS Neurol. Disord. 2017, 16, 220–233.
  99. Li, J.; Li, L.; Cai, S.; Song, K.; Hu, S. Identification of Novel Risk Genes for Alzheimer’s Disease by Integrating Genetics from Hippocampus. Sci. Rep. 2024, 14, 27484. [CrossRef]
  100. Caricasole, A.; Copani, A.; Caruso, A.; Caraci, F.; Iacovelli, L.; Sortino, M.A.; Terstappen, G.C.; Nicoletti, F. The Wnt Pathway, Cell-Cycle Activation and β-Amyloid: Novel Therapeutic Strategies in Alzheimer’s Disease? Trends Pharmacol. Sci. 2003, 24, 233–238.
  101. Kumari, S.; Dhapola, R.; Reddy, D.H. Apoptosis in Alzheimer’s Disease: Insight into the Signaling Pathways and Therapeutic Avenues. Apoptosis 2023, 28, 943–957. [CrossRef] [PubMed]
  102. Verma, R.; Savaria-Butler, A.; Enguita, F.J.; Meller, R. Commentary: A Review of Technical Considerations for Planning an RNA-Sequencing Experiment. BMC Genomics 2025, 26, 1–14. [CrossRef]
  103. Love, M.I.; Huber, W.; Anders, S. Moderated Estimation of Fold Change and Dispersion for RNA-Seq Data with DESeq2. Genome Biol. 2014, 15, 550. [CrossRef] [PubMed]
  104. Ritchie, M.E.; Phipson, B.; Wu, D.; Hu, Y.; Law, C.W.; Shi, W.; Smyth, G.K. Limma Powers Differential Expression Analyses for RNA-Sequencing and Microarray Studies. Nucleic Acids Res. 2015, 43, e47–e47. [CrossRef]
  105. Bhatia, V.; Chandel, A.; Minhas, Y.; Kushawaha, S.K. Advances in Biomarker Discovery and Diagnostics for Alzheimer’s Disease. Neurol. Sci. 2025, 46, 2419–2436. [CrossRef] [PubMed]
  106. Alamro, H.; Thafar, M.A.; Albaradei, S.; Gojobori, T.; Essack, M.; Gao, X. Exploiting Machine Learning Models to Identify Novel Alzheimer’s Disease Biomarkers and Potential Targets. Sci. Rep. 2023, 13, 4979. [CrossRef]
  107. Papassotiropoulos, A.; Fountoulakis, M.; Dunckley, T.; Stephan, D.A.; Reiman, E.M. Genetics, Transcriptomics, and Proteomics of Alzheimer’s Disease. J. Clin. Psychiatry 2006, 67, 652. [CrossRef]
  108. Park, M.-K.; Ahn, J.; Lim, J.-M.; Han, M.; Lee, J.-W.; Lee, J.-C.; Hwang, S.-J.; Kim, K.-C. A Transcriptomics-Based Machine Learning Model Discriminating Mild Cognitive Impairment and the Prediction of Conversion to Alzheimer’s Disease. Cells 2024, 13, 1920.
  109. Dimitriadis, S.I.; Liparas, D.; Tsolaki, M.N.; Alzheimer’s Disease Neuroimaging Initiative Random Forest Feature Selection, Fusion and Ensemble Strategy: Combining Multiple Morphological MRI Measures to Discriminate among Healhy Elderly, MCI, cMCI and Alzheimer’s Disease Patients: From the Alzheimer’s Disease Neuroimaging Initiative (ADNI) Database. J. Neurosci. Methods 2018, 302, 14–23. [PubMed]
  110. Abdelwahab, M.M.; Al-Karawi, K.A.; Semary, H.E. Deep Learning-Based Prediction of Alzheimer’s Disease Using Microarray Gene Expression Data. Biomedicines 2023, 11, 3304. [CrossRef]
  111. Kang, K.; Cai, J.; Song, X.; Zhu, H. Bayesian Hidden Markov Models for Delineating the Pathology of Alzheimer’s Disease. Stat. Methods Med. Res. 2019, 28, 2112–2124. [CrossRef]
  112. Sherif, F.F.; Zayed, N.; Fakhr, M. Discovering Alzheimer Genetic Biomarkers Using Bayesian Networks. Adv. Bioinforma. 2015, 2015, 639367. [CrossRef] [PubMed]
Figure 1. An illustration of comparison of the methods to predict AD status from blood proteins, CSF biomarkers, PET imaging, and blood transcriptomics created with R programming [41].
Figure 1. An illustration of comparison of the methods to predict AD status from blood proteins, CSF biomarkers, PET imaging, and blood transcriptomics created with R programming [41].
Preprints 213327 g001
Figure 2. The stages of the transcriptomic biomarker discovery with machine learning. Created with Biorender.com.
Figure 2. The stages of the transcriptomic biomarker discovery with machine learning. Created with Biorender.com.
Preprints 213327 g002
Figure 3. A diagram of the incorporation of multiple omics data in Alzheimer’s disease. Note. Adapted from [82].
Figure 3. A diagram of the incorporation of multiple omics data in Alzheimer’s disease. Note. Adapted from [82].
Preprints 213327 g003
Figure 4. Translational pipeline of blood transcriptomic biomarkers created with Microsoft Powerpoint.
Figure 4. Translational pipeline of blood transcriptomic biomarkers created with Microsoft Powerpoint.
Preprints 213327 g004
Table 1. Established alzheimer’s disease biomarkers: strengths and limitations.
Table 1. Established alzheimer’s disease biomarkers: strengths and limitations.
Biomarker Type Examples Strengths Limitations
CSF biomarkers Aβ42, pTau181, total tau, NfL High accuracy; direct measure of
pathology; part of AT(N) framework
Invasive lumbar puncture; patient reluctance; limited scalability
PET imaging Amyloid PET (florbetapir), Tau PET (flortaucipir) Visualizes pathology in vivo;
excellent specificity
Very expensive; radiation exposure; limited availability
Blood protein
biomarkers
Plasma pTau181, pTau217, Aβ42/40, NfL Minimally invasive; emerging
clinical assays
Require ultrasensitive assays; variability across cohorts; peripheral confounders
Blood
transcriptomics
mRNA, miRNA, lncRNA panels Unbiased discovery; mechanistic
insights; scalable with sequencing
Sensitive to external factors (diet, meds); standardization needed
Table 2. Machine learning approaches in transcriptomic biomarker discovery.
Table 2. Machine learning approaches in transcriptomic biomarker discovery.
Category Method Application Strengths Limitations
Feature
selection
LASSO, Ridge, Elastic Net Reduce dimensionality, select predictive genes Improves interpretability,
prevents overfitting
May exclude relevant weak features
Classifiers RF, GBM, SVM, PLS Classify AD vs controls High accuracy; handle
high-dimensional data
Sensitive to sample
imbalance
Deep
learning
DNN, CNN,
Autoencoders
Learn non-linear
representations
Captures complex
interactions
Data-hungry; less
interpretable
Probabilistic models Bayesian
Modeling
Estimate continuous
biomarker distributions
Provides risk probabilities Assumes data
distribution
Table 3. Representative studies of blood transcriptomic biomarkers in AD.
Table 3. Representative studies of blood transcriptomic biomarkers in AD.
Study Platform Cohort Method Key Findings
Booij et al. (2010) [44] Microarray Whole blood PLS regression Gene panel distinguished
AD from controls
Lunnon et al. (2013)[45] Microarray AD + MCI DEG + pathway analysis Mitochondrial & ribosomal
dysfunction
Roed et al. (2013) [46] Microarray MCI converters ML classification Predicted conversion to AD
Fehlbaum-Beurdeley et al. (2010) [47] Microarray AD vs controls AclarusDX
diagnostic panel
>80% classification accuracy
Huan et al. (2018) [48] RNA-seq AD vs controls Differential
expression
Dysregulated immune
pathways
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated