A Novel Approach for Early detection of Alzheimer's disease Based on Multi Level Fuzzy Neural Networks

Timely diagnosis of Alzheimer's diseases(AD) is crucial to obtain more practical treatments. In this paper, a novel approach Based on Multi-Level Fuzzy Neural Networks (MLFNN) for early detection of AD is proposed. The focus of study was on the problem of diagnosing AD and MCI patients from healthy people using MLFNN and selecting the best feature(s) and most compatible classification algorithm. In this way, we achieve an excellent performance using only a single feature i.e. MMSE score, by fitting the optimum algorithm to the best area using optimum possible feature(s) namely one feature for a real life problem. It can be said, the proposed method is a discovery that help patients and healthy people get rid of painful and time consuming experiments. Experiments shows the effectiveness of proposed method in current research for diagnosis of AD with one of the highest performance (accuracy rates of 96.6%), ever reported in the literature.


Introduction
Alzheimer's Diseases (AD) is one of the brain disabilities which, is common among the elderly patients. It starts with forgetting and losing memory-which causes cognitive functions become disturbed, and then lose their ability to remember recent events. Moreover, it can be one of the major causes of death in not so far future. Timely diagnosis can help to reduce its casualties. Many biomarkers are used to diagnosis of AD, such as, Magnetic Resonance Imaging (MRI) (Cuingnet et al., 2011;Davatzikos, Fan, Wu, Shen, & Resnick, 2008;Kloppel et al., 2008), Positron Emission Tomography (PET) (Foster et al., 2007;Nordberg, Rinne, Kadir, & Langstrom, 2010), Cerebrospinal Fluid (CSF) (Dubois et al., 2007). Some clinical trial and examination such as Alzheimer's Disease Assessment Scale-Cognitive (ADAS-Cog) Behaviour Section (M. W. , Geriatric Depression Scale (GDS) (Yesavage et al., 1982), and Functional Activities Questionnaire (FAQ) (Pfeffer, Kurosaki, Harrah, Chance, & Filos, 1982), Clinical Dementia Rating (CDR) Sum of Boxes (CDR-SOB) (Daly et al., 2000), and Mini-Mental State Examination (MMSE) (Michael W Weiner et al., 2012), were used diagnosis of Alzheimer's Disease too.
Automatic detection of AD procedures based on various biomarkers and techniques are using machine learning techniques on brain images for diagnosis of AD ((Amat-ur-Rasool, Ahmed, Hasnain, & Carter, 2021;Bivona et al., 2021;De Marchi et al., 2021;Jiang et al., 2021;McGrowder et al., 2021)). Most of these CAD (Computer Aided Diagnosis) systems have three processing steps: preprocessing, feature extraction, and classification. As another example of CAD systems (Dashtban & Li, 2019) used wide spectrum of health causes, economics and environment to improve understanding of patient behaviors by deep-learning algorithm.
There are several different techniques to achieve precise classification, such as, Principal Component Analysis (PCA), Artificial Neural Networks, Partial Least Square (PLS) and support vector machine (SVM). The purpose of this study is to compare different features for classification of patients with AD, MCI and NC subjects based on segmented MRI, personal information, and MMSE scores and select the best feature(s) and best classification algorithm to achieve better diagnosis AD patients from healthy people using Multi Level Fuzzy Neural Networks. Richer information can help us to improve diagnosis accuracy. Finding the best features and more than that, least number of feature that can achieve accurate and precise classification is an old problem, but yet an area of considerable importance; therefore, this study has focused on issues regarding Alzheimer's Disease diagnosis.

Characteristic of subjects
It needs to be mentioned that only the first exam of each patient in ADNI database has been used in this work. 705 images and other features of patients were used for assessing the proposed approach.
Demographic data of patients and subjects is summarized in Table 1. All the data have been downloaded from the ADNI database in September 2016. Data and proposed method (preprocessing technique, feature selection) of this work are presented in Section 2. The experimental results are provided in Section 3. The discussion and conclusions are presented in the final section.

Proposed Method
A novel method for diagnosis of AD based on Multi-Level Fuzzy Neural Networks (MLFNN) and SVM classifiers is proposed. This method involves mainly the feature extraction and selection, data normalization, and classification of AD using MLFNN and SVM classifiers on MMSE, MRI, and personal information (i.e. age, marital status, gender, and education level of subjects). Before doing so, the proper features of interested area of those images had to be extracted. Then, classification was done using combined SVM and MLFNN classifiers as a framework.
Finding least features and a compatible algorithm that can make a precise classification on it, is the problem that was aimed and solved in this study. In this way, Alzheimer's disease diagnosis problem is solved using only a single feature i.e. MMSE scores, that is a clinical test which can free patients and healthy people from a lot of painful and time consuming experiments. To do so, lots of algorithms on several datasets were tested to find the best match to solve the problem, some of which are reported here. To achieve better results the input data were normalized using natural logarithm. Then, proposed framework, combining results of MLFNN classifier and SVM classifier, using only one feature (MMSE scores) gave us excellent performance. The proposed method was evaluated using k-fold cross validation method (k = 10). In the 10-fold cross validation method, all the data partition to ten equal parts, nine parts for training and one part for testing.

MRI Acquisition Parameters
In the multiple ADNI sites, multiple machines as Siemens, Philips, and GE Medical scanners are used. Standard protocol developed to evaluate 3D T1-weighted sequences for morphometric analyses (Jack et al., 2008). Structural brain MRI scans were acquired using 1.5 and 3 T MRI scanners. Most of the 1.5T MRIs were obtained from GE Medical scanners, and most of the 3T MRIs were acquired from Siemens machines.
For modern systems, scan time at 1.5T is 7.7min, and for 3T systems is 9.3min. This usually happens because of the difference between vulnerability artifacts, spin relaxation and chemical shift properties in 1.5T and 3T.

Preprocessing of MR Images
Spatial Parametric Mapping (SPM) software used for preprocessing (Ashburner, 2011). SPM used for realignment, smoothing and spatial normalization and feature extraction from ROI of MRIs. The reprocessing steps using VBM8 1 tools are as below: 1) Checking format of the images for being in a suitable condition using SPM tools.
2) Segmentation of the images, to identify gray matter (GM) and white matter (WM) and warp GM to the segmented image to Montreal Neurological Institute MNI space using the SPM tools.
3) Deformations estimation to best align the images to each other and create templates by registering the imported images with their average, iteratively using DARTEL tools of SPM. 4) Generation spatially normalised and smoothed GM images, normalised to MNI Space, using the estimated deformations by the DARTEL tools of SPM, generated the smoothed \modulated warped GM and WM images. Data cleanizing and selection was done in the preprocessing step. In the second step (feature extraction) the input data is converted into small vectors (Duin, 2000). The classification algorithm determines that the vectors are more similar to Mild Cognitive Impairment (MCI) or to AD patient, or to Normal Control (NC) subject.

Feature Extraction and selection
The features of MRI images were obtained based on regions of interest (ROI). The feature vector consisting of average voxel values from MRI images and also volume of MRI images of ROI using VBM-SPM were extracted. In this step volumetric changes of specific regions such as hippocampus, entorhinal cortex, temporal and parietal lobes, and ventricles, were used (Miranian & Abdollahzade, 2013). To extract volume and voxel values form ROIs, some masks using WFU PickAtlas 2 tools were made and used. Then, after applying the mask to gray matter images, the amounts of voxels were measured and counted. Other ROIs also used because, their effectiveness was reported in the literature (M. W. . For this purpose, segmented MRIs from ADNI database were used. Some masks were made for ROIs using SPM8 toolbox to get the desired features, and MMSE scores were obtained from ADNI database. MMSE, personal information and MRI data were used to make feature vector and classification AD and MCI patients, and NC subjects from each other. Also, using another tools as CAT12, that has great results in MRI segmentation and feature extraction (Farokhian et al., 2017), besides the VBM8 tools, can be considered as further work.
132 MRI features, including 117 voxel values and 15 volumes of ROIs, 1 MMSE score, and 4 personal information were used to diagnose from ADs from NCs using SVM and MLFNN. Table 2 shows 15 Brain ROIs and those average volume values before and after applying Natural Logarithm. The average voxel value of 117 regions include these regions are used in this study. In Figure 1, BoxPlots of average of voxels values of the 117 ROIs for AD diagnosis before and after applying natural logarithm are presented. In Figure 2, Volume of 15 ROIs for these areas, before and after applying natural logarithm are presented. It is obvious in Figure 2, BoxPlots of (b) are more clear than (a) and the inter-variances of the dataset after applying natural logarithm are more dominant and tangible. These are not obvious in Figure Figure 1.b) are right between 0 and 1 but in Figure 2.b) the values are between 0 and 0.08, the cause is that we normalized all the data all together and as one block.

Differences of original values are obvious in Figures 1.a) and 2.a), but after applying Natural
Logarithm the values after normalization will be more dominant and tangible.
Mini-Mental State Examination (or Folstein test) is a questionnaire with 30 points that take 5 to 10 minutes. One of its advantages is requiring no specialized equipment or training for test. It has both validity and reliability for the diagnosis of Alzheimer's disease. Personal information is downloaded directly from ADNI.  Not all of the 137 extracted features can be used for Alzheimer's Disease clinical diagnosis, not only because they may be not beneficial in diagnosis, but also they can affect negatively the diagnosis performance. In this study after features extraction of the ROIs that mentioned in the literature, because of affecting the above regions of the brain in the preliminary stages of the AD and not letting go those important information, we used one of the public methods for dimension reduction that is, principal component analysis (PCA). The PCA maps feature set or data to a lesser-dimension with keeping most of the variance of them. Eigenvectors of largest eigenvalues will be principal components. The some initiatory eigenvectors often have most of the information of the original feature set. Reducing computation complexity and elimination of the negative effect of redundant features are advantages of using PCA. But when mean and covariance are not enough to define Indetermination of the principal components to keep information of data, PCA will fail. Figure 3 displays diagram of the proposed framework. All the combinations of the data were examined in this study using both of classifiers (SVM and MLFNN). The diagram of Figure 3 represents main goal of the proposed method, that is diagnosis of AD using one feature. As can be seen in the Figure 3, based on one of the three problems we decide to solve in the top of framework, it will select the algorithm to classification in the bottom of it.
In this study we used 25 PCA component that include 99% variance of the original data, it is a lot lesser than the 137 features and more effective.

Multi Level Fuzzy Min-Max Neural Networks
A Fuzzy min-max (FMM) neural network is a type of artificial neural network that has high efficiency among other machine learning methods (Jawarkar, Holambe, & Basu, 2011). The FMM algorithm is proposed by Simpson in 1992. These kinds of networks have been used in many cases for classification and clustering purposes (Lin, Chang, & Lin, 2013), (Miranian & Abdollahzade, 2013). Training phase of FMM algorithm is online-adaptive and single-pass-through, which will be done in appropriate time. These networks are based on the hyperbox concept. Every hyperbox determines a part of one class. Hyperboxes are convex boxes. Those boxes represent a small part of pattern space. These hyperboxes expand during training phase to cover pattern space completely. There are no overlaps between hyperboxes that belong to different classes, at the end of training stage. Expansion parameter controls the maximum extension of these hyperboxes. Initialization value of this parameter is between 0 and 1.
The boxes are determined by their two opposite corners vertices. V and W points represent the beginning (min) and end (max) points of the hyperbox, respectively. Figure 4 shows a threedimensional hyperbox. Each hyperbox belongs to one of the classes and covers a part of pattern space, defined as Where and are min and max corners of the hyperbox, respectively. X is input vector, and n is number of input space dimensions. Each class can belong to one hyperbox at least. Hyperboxes from different classes must not overlap but same class's hyperboxes could have overlap. Figure 5 shows the final hyperboxes of an FMM network in a 2-D binary classification example.

Figure 4 3-D hyperbox and its min and max points.
Various membership functions may be used for hyperboxes. For instance, Simpson's function in an earlier method of FMM is as follows: where Ah = (ah1, ah2, ..., ahn) ∈I n is the hth sample and γ is a coefficient between 0 and 1 that adjusted how fast the membership values decrease along with increasing distance between Ah and Bj. Figure 6 shows this function and the coverage area.  Figure 6, FMM neural network is composed of three layers. The first layer is input layer, the second layer represents hyperboxes, and each node in the third layer represents one class. Also, each hyperbox is a node in middle layer, and the membership function of this hyperbox is the transfer function of that node. Figure 7 depicts details of a hyperbox. Each node of FA layer is connected to all nodes in the FB layer and each of these links has two weights (Vij and Wij), which are, respectively, the min and max points of the Bj hyperbox, and i is the index of the nodes in the first (Input) layer. Each node of the FB (middle) layer is also connected to all nodes in the FC (Output) layer. Weights of those links are Figure 7 Details of one node in FMM neural network. In learning phase all hyperboxes will be formed and adjusted. Every new sample will be checked to find out if there is a box that belong to the same class of the sample. If such a hyperbox appears, no further processing is needed and it is next sample's turn. If algorithm cannot find one, following three steps will be executed.

1) Expansion:
When a sample is sent to algorithm, a hyperbox of same class must be found and it also needs to be capable of expansion to cover the sample. Hyperbox size does not exceed the θ parameter. If there is none, a new hyperbox is created with min and max points equal to coordinate of the sample.
2) Overlap Test: In this step, the overlapping area of the new hyperbox is checked with all hyperboxes of other classes. When each dimension is recognized in one case of (5), two hyperboxes overlap. To elimination of overlaps, the dimension (Δ) with least overlap is selected for contraction Case 1: < < < Case 2: < < < Case 3:

3) Contraction: When there was no overlap, there is no need of this step; otherwise, due to the type of the overlap
in the Δ direction, one case of following will be executed.
Case 1: Case 2: Case 3: Case 4: Here, Δ is the selected dimension. These three steps are executed on every learning sample to create and adjust the required hyperboxes.
MLFNN algorithm uses a multilevel tree structure. The network function is not like a decision tree but like a homogenous classifier. This method, despite of the classic FMM methods, does not manage the overlap problem in the contraction step. Hyperboxes of each level are smaller and more detailed than hyperboxes of previous levels. The proposed method, creation of hyperboxes in the levels, and the classification are illustrated in Figure 8.
The first level classifier, classifies the non-boundary region of the data, and nodes (classifiers) of second level take care of the boundary region. After all, the node that has the best output among all nodes is selected as the network's output.
In MLFNN, like in other FMM methods, all hyperboxes are created and adjusted during training phase and are used in the test phase. Other FMM methods handle overlap problem as soon as it is found, but MLF handles the overlaps when creation and adjustment of all hyperboxes are done, that results in reduction of memory and time complexity.

Experiments and Results
Experimental results using MRI, MMSE, and Personal Information data have been conducted for classifying NC, AD and MCI patients. Evaluations were performed under different features subsets including the features of all the data, only the MRI data, only the personal information, and only the MMSE scores. A total number of 137 features were selected to form the final feature subset involving 132 MRI features (including 117 voxel values and 15 volumes of ROIs), 1 MMSE score, and 4 personal Information. After normalization and feature reduction, the Multi-Level Fuzzy Neural Network and SVM classifiers were trained using these feature vectors, to make a framework. The average accuracy of 10-fold cross validation was used to represent the performance of the learning models, and the obtained results were compared with other state-of-the-art methods. Some popular performance measures such as Accuracy (ACC), sensitivity (SEN), positive predictive value (PPV), specificity (SPE), negative predictive value (NPV), and area under curve (AUC) are employed to demonstrate the real performance of the constructed model. To this end, MATLAB 2014 software, on a PC having 2 GHz Core i7 CPU, and 12GB of RAM was employed. The elapsed time for recognizing binary groups are as below table: In this study SVM with RBF Kernel was utilized to compare to and combine with MLFNN results. The SVM box constraint is set to 50 and RBF sigma was set to 100 (scaling factor with default 1). Table 3 shows that the time complexity of MLFNN for one feature is better than SVM with 137 features. Moreover, resources usage such as CPU while running MLFNN is so lower than SVM, but from the table can observe that when number of the sample increased MLFNN running time go to increase progressively. Table 4 shows the classifier and features parameter used in this study. According to Table 5 and Figure 9 - Figure 12, the AUC value of MLFNN using only MMSE scores is the highest compared to other feature subsets used here. On the other hand, using this setup the model could achieve high accuracy of 96.6%. Figure 12 shows that box-plots of method using MLFNN with MMSE scores were more compressed which depicts the more stability in the constructed classification model. Plus, the accuracy of NC versus MCI was higher when using SVM with MMSE scores. The MLFNN algorithm showed lower performance for classifying NC vs. MCI, yet it has the highest NPV in this area. As shown in Figure 12, the demographic information in all the three binary classifications has slightly more than 50% accuracy. The SVM using all the features achieved best results in NC vs. MCI classification in the most of the performance measures. MLFNN using MMSE scores for AD vs. NC classification achieved best performance in the most of measures, which is a great result, and using SVM over all the data for AD vs. MCI and NC vs. MCI showed accuracy was 79.08% and 70.43% which are appropriate results compared to other methods in this realm and other state of the art methods. To achieve final appropriate performance, we combined results of MLFNN and SVM classifiers using only MMSE scores, as can be seen in Figure 3

Discussion
For the early detection of AD, we developed a novel machine learning method with MMSE scores. The information provided in Figures 9-11 and Table 4 shows that using MLFNN classifier and MMSE scores together makes better results for distinguishing AD patients from NC subjects, and the Box-Plots are more compressed, clearly, as shown in Figure 12, that means more reliability and stability in classification. As mentioned at the beginning of the paper, the first exam of every person was adopted from ADNI. The proposed method yielded higher performance in distinguishing AD, that can assist with detecting Alzheimer's Disease in its initial stages.
In this study, it was found that with only a single indicator (MMSE), a drastically high accuracy detection can be built for distinguishing between the AD, MCI patients, and NC subjects. This can be suitable for patients and patrons who are exhausted of coming and going to MRI imaging centers and of the pain of injecting materials for other imaging methods.
As can be seen in Figure 12.d, personal information were experimented to see how that might contribute to the results. However, it shows a seemingly poor discriminant potential. It actually can distinguish between all three binary groups marginally better than a random classifier. Despite that, it can help us to build up a better feature set for classification. From Table 5, it can be realized that employing personal information for example marital status have a small positive impact on AD diagnostics; but exploying them can help to achieve a slightly better performance.
From the table 5, MLFNN classifier using all the features obtains accuracy of 84.94%, and using all the features without marital data obtains 84.87%. SVM classifier obtains accuracy of 94.49% using all the features, and 94.46% using all the features without marital data. The above mentioned shows a little friction between classification with and without marital data. But, we always have to use all the information we have. Table 6 represents comparison of the proposed framework to other state of the art methods. As shown in the table, the proposed framework outperformed other methods in most performance measures. Most of the listed methods in Table 6 have used ADNI database and we used all the images and data in ADNI (at the time of start), consists of 705 participants. Whereas some of the reported methods used a portion of the ADNI samples, we have rights to compare our proposed method to them.
In the Table 5, for instance, the AUC of the MLFNN is significantly higher than the other methods. The proposed method also yielded a higher performance in most metrics of AD diagnosis evaluation. Comparing to typical supervised learning approaches, MLFNN represented a significant increase in AUC scores. This single biomarker provided 0.9609 for AUC score. Since the average value of AUC is 0.9609, the proposed model has near perfect diagnostic capability on testing (unseen) samples to differentiate the diseased and non-diseased patients. As can be seen in Table 6, the proposed framework obtained highest accuracy (96.6%) in classifying AD patients from NC subjects, and due to the balanced dataset, the accuracy is a suitable performance measure.
It should be noted, imbalanced dataset is a dataset that the number of its samples is not the same for all the classes. In some fields, same as fraud detection, where the majority of the cases will be in the "Not-Fraud" class and a very small minority will be in the "Fraud" class, imbalanced datasets are quite common problem. In these cases, it is recommended calculating the sensitivity of the classifier in addition to accuracy (the accuracy achieved for the balanced test dataset is not suitable for estimating the accuracy of imbalanced dataset).
In addition, sensitivity (recall rate) of proposed method (94.83) is the second highest sensitivity rate as can be seen in Table 6, after (Liu et al., 2016) the specificity rate of this method which is significantly lower than our proposed method. The proposed method stood at the second place with specificity of 97.9% against perfect detection rate of (Ben Ahmed et al., 2014), similar to sensitivity. However, (Ben Ahmed et al., 2014) method had a high false alarm rate while the proposed method succeeded to achieve a trade-off between the two! It is worth mentioning that the lower the specificity, the higher the sensitivity (94.83 vs. 87 for example in case of (Ben Ahmed et al., 2014) method). Nevertheless, combined methods can be used to achieve better performance.
According to the Table 5, when we use all the data together as feature set, SVM outperforms MLFNN, nevertheless, adopting only one feature (MMSE score) for MLFNN makes it to outperform SVM in 6 measurements. Further, the SVM classifier using all the data (137 features) in 2 of measurements (sensitivity and NPV) is better than MLFNN. Adopting MMSE scores and MLFNN algorithm achieved a lower performance in classifying MCI vs. NC or MCI vs. AD. A better performance could be reached using SVM and MRI features, though. As a result, even though the MLFNN classifier performs rather poor in detecting direct combination (concatenation) of multimodal data, when we adopted a great singular feature, its performance outperformed the other methods significantly. Even though, MLFNN had comparatively lower performance in distinguishing MCI patients, integrating the proposed method simply with SVM can yield great performance. Actually, when the problem is distinguishing AD patients from NC subjects we can rely on the MLFNN and when the problem is finding MCI patients we use SVM. It is worth noting, this combination is a wise approach. For this, Figure 3 demonstrate this framework.
It should be noted, as can be seen in the Figure 12.d and Table 5 the recognition performance of the personal information is a bit higher than random, but adding this little information to the model can impacts the performance of the whole model, and a bit of information can help to improve performance of whole system. In this regards all the information used in this study can be useful, that is age, gender, marital status, and education can have effect in dementia and AD diagnosis (Sundstrom, Westerlund, & Kotyrlo, 2016).
Furthermore, one important justification of this method's outstanding achievement comes from the fact that the distribution of testing samples for the diseased and non-diseased subjects has no overlap (Folstein, Folstein, & McHugh, 1975). From the Table 6 the MLFNN algorithm for AD vs NC classification in all of the performance measure is in first or second place, and in other two classification problems have comparable results. From that can realize that the proposed framework is almost dominant one compare to the other methods.

Conclusion
An effective framework for early detection of Alzheimer Disease was proposed. Three separated group of data was used in this study, i.e. extracted features from MR Images, Personal information and MMSE scores of the subjects. For evaluation purpose the ADNI database was utilized. In the realm of Alzheimer's disease diagnosis, we achieved an excellent performance using only a single feature i.e. MMSE indicator by developing and adopting Multi Level Fuzzy neural networks that discriminated AD patients from non-patient people near perfectly and SVM in the other two classification problems, that is AD vs MCI and MCI vs NC. In other words, the current study fit the optimum algorithm to the best area using optimum possible feature(s), namely one feature. The discovery of this study can free patients and healthy people from painful and time-consuming experiments, and classification algorithm from time and memory complexity. However demographic information had not great performance in diagnosis AD patients from healthy subjects, and we use them to tune performance of the model. We used Natural Logarithm for data normalization purpose.