Afreixo, V.; Tavares, A.H.; Enes, V.; Pinheiro, M.; Rodrigues, L.; Moura, G. Stable Variable Selection Method with Shrinkage Regression Applied to the Selection of Genetic Variants Associated with Alzheimer’s Disease. Appl. Sci.2024, 14, 2572.
Afreixo, V.; Tavares, A.H.; Enes, V.; Pinheiro, M.; Rodrigues, L.; Moura, G. Stable Variable Selection Method with Shrinkage Regression Applied to the Selection of Genetic Variants Associated with Alzheimer’s Disease. Appl. Sci. 2024, 14, 2572.
Afreixo, V.; Tavares, A.H.; Enes, V.; Pinheiro, M.; Rodrigues, L.; Moura, G. Stable Variable Selection Method with Shrinkage Regression Applied to the Selection of Genetic Variants Associated with Alzheimer’s Disease. Appl. Sci.2024, 14, 2572.
Afreixo, V.; Tavares, A.H.; Enes, V.; Pinheiro, M.; Rodrigues, L.; Moura, G. Stable Variable Selection Method with Shrinkage Regression Applied to the Selection of Genetic Variants Associated with Alzheimer’s Disease. Appl. Sci. 2024, 14, 2572.
Abstract
In this work we looked for a stable and accurate procedure to perform feature selection in datasets with a much higher number of predictors than individuals, as in Genome-Wide Association Studies. Due to the instability in feature selection when many potential predictors are measured, a variable selection procedure is proposed that combines several replications of shrinkage regression models. A weighted formulation is used to define the final predictors. The procedure is applied to investigate Single Nucleotide Polymorphisms (SNPs) predictors associated to Alzheimer’s disease in the Alzheimer’s disease Neuroimaging Initiative (ADNI) dataset. Two data scenarios are investigated: one considering only the set of SNPs and another with the covariates age, sex, educational level and APOE4 genotype. The SNP rs2075650 and the APOE4 genotype are given as risk factors for AD, which is in line with the literature, and other four new SNPs are pointed, opening new hypotheses for in vivo analyses. These experiments demonstrate the potential of the new method for stable feature selection.
Keywords
penalized regression; Akaike’s Information Criterion; high-dimensional data; stability; overall weighted coefficients; Alzheimer’s disease; SNP
Subject
Computer Science and Mathematics, Applied Mathematics
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.