Submitted:
30 August 2024
Posted:
03 September 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
1.1. Related Work
| Study | Data source | Feature selection | Classifier | Results |
|---|---|---|---|---|
| Li et al. (2017) [15] | ANM1 and ANM2 | student’s t-test | Ref-REO | AUC: 0.733 (ANM2: test set) AUC: 0.775 (ANM1: test set) |
| Li et al., (2018) [16] | ANM1 and ANM2 | LASSO regression | Majority voting of RF,SVM, RR | AUC: 0.866 (ANM2: test set) AUC: 0.864 (ANM1: test set) |
| Lee H. et al., (2020) [9] | ANM1, ANM2, ADNI | VAE, TF genes. | Binary classification logistic regression (LR), (L1-LR), SVM, RF, and DNN. | AUC: 0.657, 0.874, and 0.804 for ADNI, ANMI and ANM2, respectively. |
| C. Park. Et al., (2020) [17] | Gene expression: GSE33000 and GSE44770 methylation data: GSE80970 |
Integrate DEGs and DMPs by inter-section | DNN (Deep Neural Network) | 0.823 is the average accuracy |
| Kalkan H. et al., (2022) [18] | GSE63060, GSE63061, GSE140829 | LASSO regression | CNN on transformed image representation. | AUC of 0.875 for the AD vs. CTL. AUC of 0.664 for the MCI vs. AD. AUC of 0.619 for the MCI vs. CTL. |
1.2. Key contribution
2. Material and Methods
2.1. Dataset Selection and Exploration
| Diagnosis (count) |
Gender | Race | Age | |||||
|---|---|---|---|---|---|---|---|---|
| Male | Female | White | Black | Asian | Others | Age < 65 | Age >= 65 | |
| CN (total=246) |
117 | 129 | 226 | 16 | 2 | 2 | 7 | 239 |
| MCI (total=382) |
216 | 166 | 356 | 11 | 5 | 10 | 66 | 316 |
| Dementia (total=116) |
75 | 43 | 108 | 3 | 4 | 1 | 6 | 110 |
2.2. Data Pre-Processing and Feature Selection
2.3. AD Stage Diagnosis Model Construction

3. Performance Result
4. Discussion and Conclusion
References
- Angelucci F, Spalletta G, di Iulio F, Ciaramella A, Salani F, Colantoni L, Varsi AE, Gianni W, Sancesario G, Caltagirone C, Bossù P. Alzheimer’s disease (AD) and Mild Cognitive Impairment (MCI) patients are characterized by increased BDNF serum levels. Curr Alzheimer Res. 2010 Feb;7(1):15-20. PMID: 20205668. [CrossRef]
- Cummings, JL., Morstorf, T., Zhong, K.: Alzheimer’s disease drug development pipeline: Few candidates, frequent failures. Alzheimer’s Res Ther (2014).
- Willette, V. D. Calhoun, J. M. Egan, D. Kapogiannis, and A. s. D. N. Initiative, “Prognostic classification of mild cognitive impairment and Alzheimer s disease: MRI independent component analysis,” Psychiatry Research: Neuroimaging, vol. 224, no. 2, pp. 81-88, 2014.
- H. Gorji and J. Haddadnia, “A novel method for early diagnosis of Alzheimer’s disease based on pseudo Zernike moment from structural MRI,” Neuroscience, vol. 305, pp. 361-371, 2015.
- Tanzi RE. The genetics of Alzheimer disease. Cold Spring Harb Perspect Med. 2012 Oct 1;2(10):a006296. PMID: 23028126; PMCID: PMC3475404. [CrossRef]
- Shen L, Jia J. An Overview of Genome-Wide Association Studies in Alzheimer’s Disease. Neurosci Bull. 2016;32(2):183–190. [CrossRef]
- “Genetics.” Alzheimer’s Disease and Dementia, www.alz.org/alzheimers-dementia/what-is-alzheimers/causes-and-risk-factors/genetics.
- Marian AJ. Molecular genetic studies of complex phenotypes. Transl Res. 2012;159:64–79. [CrossRef]
- Lee T, Lee H. Prediction of Alzheimer’s disease using blood gene expression data. Sci Rep. 2020 Feb 26;10(1):3485. PMID: 32103140; PMCID: PMC7044318. [CrossRef]
- Patel H, Dobson RJB, Newhouse SJ. A Meta-Analysis of Alzheimer’s Disease Brain Transcriptomic Data. J Alzheimers Dis. 2019;68(4):1635-1656. PMID: 30909231; PMCID: PMC6484273. [CrossRef]
- Liew CC, Ma J, Tang HC, Zheng R, Dempsey AA. The peripheral blood transcriptome dynamically reflects system wide biology: A potential diagnostic tool. J Lab Clin Med. 2006;147:126–32.
- Saykin AJ, Shen L, Foroud TM; et al. Alzheimer’s Disease Neuroimaging Initiative.
- biomarkers as quantitative phenotypes: Genetics core aims, progress, and plans. Alzheimers Dement. 2010;6(3):265–273. [CrossRef]
- P. Fehlbaum-Beurdeley et al., “Toward an Alzheimer’s disease diagnosis via high-resolution blood gene expression,” Alzheimer’s & Dementia, vol. 6, no. 1, pp. 25-38, 2010.
- K. Lunnon et al., “A blood gene expression marker of early Alzheimer’s disease,” Journal Of Alzheimer’s Disease, vol. 33, no. 3, pp. 737-753, 2013.
- Li, H. et al. Identification of molecular alterations in leukocytes from gene expression profiles of peripheral whole blood of Alzheimer’s disease. Sci. Rep. 7, 14027 (2017).
- Li, X. et al. Systematic analysis and biomarker study for Alzheimer’s disease. Sci. Rep. 8, 17394 (2018).
- Park, J. Ha and S. Park, “Prediction of Alzheimer’s disease based on deep neural network by integrating gene expression and DNA methylation dataset”, Expert Syst. Appl., vol. 140, pp. 112873, 2020.
- Kalkan H, Akkaya UM, Inal-Gültekin G, Sanchez-Perez AM. Prediction of Alzheimer’s Disease by a Novel Image-Based Representation of Gene Expression. Genes (Basel). 2022 Aug 8;13(8):1406. PMID: 36011317; PMCID: PMC9407775. [CrossRef]
- Shen, Liran and Qingbo Yin. “The classification for High-dimension low-sample size data.” Pattern Recognit. 130 (2020): 108828.
- Sarma, M., Chatterjee, S. (2020). Identification and Prediction of Alzheimer Based on Biomarkers Using ‘Machine Learning’. In: Bhattacharjee, A., Borgohain, S., Soni, B., Verma, G., Gao, XZ. (eds) Machine Learning, Image Processing, Network Security and Data Sciences. MIND 2020. Communications in Computer and Information Science, vol 1241. Springer, Singapore. [CrossRef]
- Catchpoole DR, Kennedy P, Skillicorn DB, Simoff S (2010) The curse of dimensionality: A blessing to personalized medicine. J Clin Oncol 28: 723-724.
- Marcilio, Wilson Estecio and Danilo Medeiros Eler. “From explanations to feature selection: Assessing SHAP values as feature selection mechanism.” 2020 33rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI) (2020): 340-347.
- Fernández, A.; García, S.; Galar, M.; Prati, R.C.; Krawczyk, B.; Herrera, F. Learning from Imbalanced Data Sets; Springer International Publishing: Cham, Switzerland, 2018; pp. 197–226.
- Krawczyk, B. Learning from imbalanced data: Open challenges and future directions. Prog Artif Intell 5, 221–232 (2016). [CrossRef]
- Ahmed, S.F., Alam, M.S.B., Hassan, M. et al. Deep learning modelling techniques: Current progress, applications, advantages, and challenges. Artif Intell Rev 56, 13521–13617 (2023). [CrossRef]
- Brownlee, J. Imbalanced Classification with Python. (2020).
- Chawla, N. V. et al. (2002). SMOTE: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research, 16, 321-357. [CrossRef]
- Han, H., Wang, WY., Mao, BH. (2005). Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning. In: Huang, DS., Zhang, XP., Huang, GB. (eds) Advances in Intelligent Computing. ICIC 2005. Lecture Notes in Computer Science, vol 3644. Springer, Berlin, Heidelberg. [CrossRef]
- Cardenas and J. Baras, “B-ROC curves for the assessment of classifiers over imbalanced data sets,” in Proceedings of the Twenty-First National Conference on Artificial Intelligence (Boston, MA, USA), pp. 1581–1584, AAAI Press, 2006.
- P. Flach, J. Hernandez-Orallo, and C. Ferri, “A coherent interpretation of AUC as a measure of aggregated classification performance,” in Proceedings of the 28th Inter-national Conference on Machine Learning (ICML-11) (New York, NY, USA), pp.657–664, Omnipress, 2011.
- David Harbecke, Yuxuan Chen, Leonhard Hennig, and Christoph Alt. 2022. Why only Micro-F1? Class Weighting of Measures for Relation Classification. In Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP, pages 32–41, Dublin, Ireland. Association for Computational Linguistics.
- Bloch, L., Friedrich, C.M. & for the Alzheimer’s Disease Neuroimaging Initiative. Machine Learning Workflow to Explain Black-Box Models for Early Alzheimer’s Disease Classification Evaluated for Multiple Datasets. SN COMPUT. SCI. 3, 509 (2022). [CrossRef]
- Wu Q, Boueiz A, Bozkurt A, Masoomi A, Wang A, DeMeo DL, Weiss ST, Qiu W. Deep Learning Methods for Predicting Disease Status Using Genomic Data. J Biom Biostat. 2018;9(5):417. Epub 2018 Dec 11. PMID: 31131151; PMCID: PMC6530791.
- J. Bins and B. A. Draper, “Feature selection from huge feature sets,” Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, Vancouver, BC, Canada, 2001, pp. 159-165 vol.2. keywords: {Computer vision;Principal component analysis;Probes;Computer science;Supervised learning;Particle measurements;Size measurement;Data mining;Object recognition;Biometrics}. [CrossRef]








| Feature name | Feature description |
|---|---|
| APOE4 | Number of ɛ4 allele |
| PTMARRY | Marital status. |
| FDG | Cell metabolism measurement, reduced for AD patients. |
| Hippocampus | Hippocampus measurement. |
| WholeBrain | Whole Brain measurement. |
| mPACCdigit | Modified Preclinical Alzheimer Cognitive Composite with Digit. |
| LDELTOTAL | The logical memory delayed recall total. |
| CDRSB | Clinical Dementia Rating Scale - Sum of Boxes. |
| Transcript of gene | Feature score | Corr. score | Gene nature |
|---|---|---|---|
| MAPK14 | 68.0 | 0.186508 | AD risk |
| CCPG1 | 50.0 | 0.146056 | AD risk |
| OVOS2 | 109.0 | 0.098956 | AD risk |
| ASXL3 | 92.0 | 0.108009 | AD risk |
| USP47 | 79.0 | -0.159388 | AD suppressor |
| ATP9A | 100.0 | -0.122405 | AD suppressor |
| ATP9A | 100.0 | -0.122405 | AD suppressor |
| SCGB1D4 | 122.0 | -0.065516 | AD suppressor |
| CDCP1 | 76.0 | -0.086064 | AD suppressor |
| KISS1R | 74.0 | -0.088843 | AD suppressor |
| Hyper tuning parameter | Parameter value |
|---|---|
| Optimizer | Adam optimizer |
| Cost or loss function | categorical cross_entropy |
| Learning rate | 0.001 |
| Batch size | 5 |
| Epochs | 4000 |
| No of layers | 2 |
| Activation function – layer 1 | RELU |
| Activation function – layer 2 | Softmax |
| Dropout rate | 0.20 |
| Disease stage wise ROC AUC | micro average F1 score | weighted average F1 score | ||
|---|---|---|---|---|
| CN | MCI | Dementia | ||
| 0.75 | 0.74 | 0.70 | 0.67 | 0.67 |
| 0.74 | 0.73 | 0.63 | 0.65 | 0.64 |
| 0.74 | 0.71 | 0.62 | 0.64 | 0.63 |
| 0.70 | 0.67 | 0.68 | 0.64 | 0.63 |
| 0.67 | 0.71 | 0.67 | 0.64 | 0.62 |
| Disease stage wise ROC AUC | micro average F1 score | weighted average F1 score | ||
|---|---|---|---|---|
| CN | MCI | Dementia | ||
| 0.97 | 0.94 | 0.94 | 0.94 | 0.94 |
| 0.98 | 0.94 | 0.92 | 0.95 | 0.94 |
| 0.97 | 0.95 | 0.92 | 0.95 | 0.95 |
| 0.98 | 0.94 | 0.93 | 0.94 | 0.94 |
| 0.97 | 0.93 | 0.90 | 0.93 | 0.93 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).