Gait pattern recognition in patients with Hereditary Ataxia's using supervised learning algorithms based on small subsets smartphone sensor data

The progressive impairment analysis in gait from neurological diseases patients such as Hereditary Ataxias (HA) has been carried out using gait data collected with movement sensors. This research is focused on finding the minimum amount required of gait features to recognize efficiently and less intrusive way, HA patients based on data collected with iPhone movement sensors placed on the ankles from 14 HA patients and 14 healthy people. A twofold proposal is made , first a local minimum prominent peak criterion to find out the starting point of each stride, to get 10-stride window about which 56 spatial-temporal features are derived; second a search strategy based on Hill Climbing algorithm to reduce the number of gait features and sensors. The main results were the findings that with two gait patterns a 96% of classification accuracy was achieved by using K-Nearest Neighbors (KNN) and Multi-Layer Perceptron (MLP) algorithms, but in addition, MLP only right ankle sensor patterns were required which also allows to reduce the intrusion.


Introduction
Neurodegenerative diseases (ND) affect motor skills, resulting in gait pattern disturbance due to lack of balance and coordination. However, each neurodegenerative disease affects the patient's movements in different ways. Hereditary Ataxias (HA) are a type of neurodegenerative disease that includes a heterogeneous group of disorders phenotypically characterized by ataxic gait, lack of hands and eyes movements coordination, usually associated with atrophy of the cerebellum [1]. The main symptom of the motor disorders group is a progressive walking ataxia resulting from a neurodegeneration of cerebellar cortex, brain-stem and spinal cord structures. The Autosomal Dominant Cerebellar Ataxia (ADCA) are more prevalent and classified in Spinocerebellar Ataxias (SCA) and Episodic Ataxias (EA). [1][2][3]. The number of ataxia patients is growing in Mexico since 2012 [4] with SCA2 being the most common taxonomy [5]. SCA patient's movement disorders associated with gait disturbances are characterized by instability when walking, decreased stride length, decreased walking speed, increased walking variability and balance disorders [6], most patients show imbalance and ataxic gait [7,8,8]. Specialists examine movement disorders through clinical observation to relate them to disease progression. However, given the variety of symptoms, different subtypes of Ataxia, and slow progression of the disease, finding a reliable instrument to measure changes in gait disturbances over time becomes a challenge [9]. It is therefore necessary to develop technological solutions to quantify and characterize these disorders. This will enable to relate identified guidelines to the established prognosis, to monitor therapeutic interventions and the progression of gait disorders, as well as the remote assistance to patients with home-based technological systems.
Movement sensors technologies have been used in recent works to measure and evaluate gait movements with different purposes [10][11][12]. These devices have been used to assess gait in patients with neurodegenerative diseases such as Parkinson Disease (PD) [13], Huntington Disease (HD) [14,15] and Hereditary Ataxias (HA) [16,17]. Devices with measurement capabilities ranging from single-sensor devices [18] to multi-sensor smart phones [19,20] have been used in recent works. iPhone smartphone sensors have been shown to be accurate and reliable enough to evaluate and identify kinematic gait patterns [21,22], the iPhone's ability to capture gait characteristics with a sufficient level of consistency in ankle position has been shown in studies related to gait assessment and health care, as well as its comfort, portability and wearability [23][24][25]. In previous works, gait data collected with movement sensors from patients with Huntington's disease and healthy elderly people were compared and validated by implementing automatic learning techniques with several meta-classification algorithms [26].
Progressive gait disturbances and accidents that may result from loss of motor capacity require continuous and objective monitoring of the patient's state of health; therefore, research focused on developing mobile technology solutions that can efficiently run in lightweight devices and with the ability to integrate with other digital technology platforms are required.
This work aims to find the least number of wearable sensors, and walking patterns to discriminate patients with Hereditary Ataxia disease from healthy people with high accuracy, using automatic learning algorithms that are able to run in devices with reduced resources.
The remainder of this paper is organized as follows. Section 2 contains a review of related works. Section 3 presents a description of the dataset used in the study and its methodological context. Section (4) presents a discussion on experimental results, and Section (5) provides conclusions and perspectives for future work.

Related works
Movement sensors are well suited for collecting gait information given their portability and user-friendliness, but as far as we know, no standard configuration of sensors has been established to capture the gait data of people with Ataxias diseases. Their number and body location varies in recent related works, from those with 6 sensors that were used on different parts of the body [27], to those with 2 sensors that were used on the ankles [28], passing through intermediate numbers like in [29] where 5 sensors were used on the extremities of the body and waist. However, excessive number of sensors can shift gait pattern when data is processed. Therefore, the number and location of sensors must be designed to reduce intrusiveness without sacrificing data reliability.
The suitability of the use of spatio-temporal parameters extracted from triaxial sensors, was also validated by comparing them with the most advanced gait analyses, such as stereo-photogrammetry and dynamometry. For example, in a study, 22 features extracted from a inertial sensor (Free4Act) during the level walking of 22 healthy young people, were validated by comparing them with those obtained with dynamometry and stereophotogrammetry [30]. Spatio-temporal parameters that were extracted from iPhone movement sensors during walking trials of 11 healthy subjects were also validated by comparison with stereophotogrammetric measurements in [31].
The space-time approach of extracting gait characteristics from the raw data of movement sensors has been used as an input to classification algorithms to validate how well these characteristics identify gait impairments in individuals with Huntington's diseases [14,32]. Gait features based on spatio-temporal parameters of frequency domain used in this work, were introduced in an exploratory study to analyze gait patterns of subjects with Complex Regional Pain Syndrome (CRPS), using data from a DynaPort MiniMod triaxial accelerometer [33]. These gait features were used in a previous study to improve the results in the classification of people with Huntington's disease, in which the results obtained with these characteristics were also validated against those obtained from a stratified sensors dataset [26].
In recent works focusing on HA gait data, data analysis has been performed using sensors raw data (original time series) as input to the classification algorithms [29], or extracting from raw data, gait features related to spatial-temporal parameters [34], statistics in time domain [28] and based on Hidden Markov Model (HMM) with frequency domain [27].
As we can see from the summary of recent works in Table 1 (last column), resulting average accuracy are quite similar, going from 75.78% [29] to 72.9% over early onset ataxia population (EOA) [27]. This is regardless of the fact that the data used in those works, came from different numbers of patients (column 4), using different sensors in type, number and position in the body (column 4), and that they were processed with different classification algorithms (column 3). It can be noticed from average accuracy results that 25% of patients were not correctly recognized, which indicates a wide room for improvement.

Materials and Methods
This work involves data collection, pre-processing, stride segmentation, feature extraction and selection, data classification and performance evaluation activities. Figure 1 provides an overview of implemented activities.

Subjects and Data Collection
This work was made in collaboration with the Instituto Nacional de Neurología y Neurocirugía "Manuel Velasco Suárez" (INNN-MVS) located in Mexico City, who study and treat patients with walking disorders such as hereditary ataxias (HA). INNN-MVS allowed enabling a gait laboratory in a large enough space (20m long by 3m wide) to capture gait features and ensure patients with walking disturbances displace comfortably.

Patients
Twenty-eight subjects from INNN-MVS participated in the study, fourteen patients with Hereditary Ataxias (HA) and fourteen Healthy subjects as Control (HC). The patients had been diagnosed with HA disease by the specialists, and controls were healthy people with no existing neurodegenerative diseases. Physiological patient information is listed in Table 2. The limited number of sick subjects is due to the fact that HA is a rare disease in the population, patients can only be found in specialized centers that study this disease such as the INN-MVS in Mexico City; On the other hand, patients have motor defenses that are evident in their walking that can cause accidents, hence they are not available to participate in the experiments.

Data collection
Data capture protocol was carefully planned to take into account patient motor disturbances and gait deterioration such as loss of balance, precision and speed of movements. This protocol states that INNN-MVS medical staff supported gait data collection with the aim of preventing accidents, avoiding contact to allow patients to walk freely during the trajectory, respecting the swings that prevented them from walking in a straight line due to the illness. Data was collected from patients with different severity of the disease, in an eight-day period, during patient's medical visit. Movement sensors were placed on the subjects' ankles at each walk ( Figure 2) with the patient's comfort and freedom of movement in mind. This arrangement of sensors was well suited to reduce gait disturbances when walking. The capture protocol establishes that each participant in the capture process makes a tour of 20 meters along the track with a normal walking pace.
Gait data was collected using movement sensors of two iPhone Smartphones 5S (three axes accelerometer and gyroscope, rate 100 Hz) [35], affixed to each subject ankle as seen in Figure 2. A flat bag (to place the smartphones) with elastic bands and Velcro was designed to facilitate the adjustment to the subjects' ankles. Data was collected with iPhone app "VibSensor" [36]. Each sensor's raw data was recorded in individual files on the corresponding device during each participant's walk-through. Binary data was stored in the device and downloaded on the computer after capture was finished; a folder by subject was designed with their data files to shape Gait Dataset. Raw data was processed on a HP computer with a AMD-A10 (10 compute cores) processor, 12 GB of ram memory, and Windows 10 OS.
An average of 14.82 strides (about 1.38+-0.20 m in length each, with a step length of 0.69+-0.10m) was collected for healthy subjects and an average of 22.96+-10.08 strides (about 0.99 +-0.3m in length each, with a step length of 0.49 +-0.15m). These measures are consistent with those reported by [37] with a stride length of 1.008+-0.6m with a 0.504+-0.3m step length for HA patients and the stride length between 1.13 and 1.69 for healthy people reported by [38].

Gait data Preprocessing
Gait raw data captured by the Smartphone (iPhones) sensors need to be prepared for adequately feature extraction; the preprocessing procedure was inspired by the methods implemented in previous researches [39][40][41]. Data processing techniques are described below.
The linear interpolation was used to calibrate the data to a sampling of constant time intervals, this is because the accelerometer data are captured at a variable sampling rate regardless of the configuration of a capture frequency, due to the time delay between the capture call is made and the recording time in the device.
As accelerometers measure the rate of change of the velocity of an object (acceleration), average acceleration of a motionless accelerometer should be equal to gravity acceleration; however, because of their high sensitivity, gait data captured from these devices is continuously swinging as the acceleration values change. Zero normalization is used to eliminate the constant effects on the signal on each of the axes.
Data is generally smoothed to eliminate undesirable noise and to identify outliers (data points that significantly differ from the remainder). A common data smoothing technique is the moving average that computes the mean of the points inside a window that slides along the data. This cancel insignificant variations from one data point to the next, this process is equivalent to low-pass filter with the response of the smoothing by its difference.
The measured values on the sensor axis depend on the position and how the device is attached to the subject's body, but the relative movement between the sensor and the ankle can be neglected when rigidly attached to the ankle. Sensor inertial signals are recorded in a coordinate system relative to the smartphone but for our purposes, the gait dynamics need to be represented in a coordinate system relative to the subject. It is not always practical to control or determine the device orientation related to the subject and it is highly unlikely that the device remains in the same orientation when recording data form different subjects, to deal with this change in orientation we use the orientation invariant signal acceleration magnitude, that is calculated as the square root of the sum of the square of each 3D vector values in the time series (L 2 norm) [42] (equation 2).
Let a(t) be the tri-axial accelerometer output at any time t, were a(t) ∈ R 3 , then time series is: were a x (t), a y (t), a z (t) ∈ R and N is the considered dataset size. The acceleration magnitude a m = {a m (t) | t = 0 : N − 1} time series which comes from time series a:

Segmentation and features extraction
Stride detection algorithm proposed by [39] uses data extracted from a phone placed in the subject's pocket, to establish a correlation between the signal peaks and the stride start and end. Gait cycles (strides) are determined by extracting a data section around the center of the walk and then finding the minimum value in this section and from this point cycle detection is done in forward and backward direction by adding and subtracting the cycle length. However, this normalization of all strides to the same length may reject meaningful peaks. To address this inconvenient situation, we introduce a constraint (equation 3) to ensure the periodic characteristics of gait cycle using local prominent minimal values that represent the gait cycle starting positions. Let , a m (i + 1))} be a vector with the local minima of the input signal vector, a m . A local minimum peak is a data sample that is lower (p m ) than its two neighboring samples.
Let p r = {p r (i) | i = 0 : N − 1} be the vector of local prominent minima peaks in a m , that may be denoted by: A local prominent minimum peak is a local minimum peak that is placed between two bigger minima peaks. This situation correspond to the moment when both feet are on the floor. The use of averaged p m ensures that only prominent peaks are taken into account.
The "iGAIT tool" [43] based in MATLAB was used to help extracting gait features; 56 gait characteristics were extracted from data taken from the right (R) and left (L) ankle sensors of each patient: six space-time, fifteen related to frequency and seven of regularity and symmetry of step (28 for each sensor). Given the capture instrument's position (see Figure 2), sensors' movement axes correspond to gait movements as follows: Y axis corresponds to vertical movement (VER), X axis to Anterior-Posterior movement (AP) and Z axis to Middle-Lateral movement (ML).

Gait features selection
Attribute selection goal is to select the smallest subset of attributes in such a way that the classification percentage is not significantly affected and the resulting class distribution is as close as possible to the original one. An attribute is considered relevant if it is not irrelevant (it does not affect the target concept in any way) or redundant (adds nothing new to the target).
Attribute selection methods reduce attribute set dimension to that of the relevant attribute set. The selection methods can be in one of three classes because of how the attributes are evaluated. The wrapper method approach uses a classification algorithm to measure attribute relevance; in general, its results are better than filter methods because the attribute selection process is optimized for the classification algorithm to be used. Filter methods assess the relevance of features by looking only at the intrinsic properties of the data. Methods with this approach are independent of classification algorithms, they are easily scale to very high dimensional datasets, they are computationally simple and fast [44,45]. Hybrid methods use a combination of the above two evaluation criteria at different stages of the search process.
We use a mixed method in this work, but since for N attributes there are 2 N subsets, a good search strategy is required. First, We use the Ranker algorithm as search method for this purpose; Ranker algorithm rates attributes based on individual evaluations provided by the attribute evaluator, assigning them a level of significance using a threshold to discard attributes with low scores. The result is a ranked list of attributes ordered from highest to lowest [46]. We then use the Wrapper method approach using several classification algorithms over this ranked list, with the no improvement in the quality of the learning algorithm as cutting criteria and a forward-selection strategy. After this step, we obtain a reduced features subset for each algorithm with the most relevant features and the best accuracy performance. The second strategy used is based on the Hill Climbing algorithm, that starts with an arbitrary problem solution, then tries to find a better solution by incrementally varying a single element of the solution. If the change produces a better solution, another incremental change is made to the new solution, repeating this process until no improvements can be found. Given that this technique belongs to the local search family, the idea is to reduce the search space to find an optimal solution. The forward-selection strategy was executed to find the minimum number of attributes allowing the best algorithm accuracy as follows: (1) we start with an empty set, (2) we increase cardinality by 1 attribute and all the elements of this subset are evaluated with the corresponding learning algorithm, (3) we continue with the following cardinality until there is no improvement in the quality of the learning algorithm.

Classification algorithms
Several algorithms have been deployed in the classification tasks; however, none have been found able to give highly accurate class discrimination results for any data set. In this work we use 2 parametric and 2 non parametric supervised classifiers, that are among the most recently used to classify people based on gait features: K-Nearest Neighbours (KNN), Multi layer Perceptron (MLP), Support Vector Machine (SVM) and Ensemble modeling with Classification Trees.
KNN Algorithm is based on feature similarity, wherein closely out-of-sample features resemble the training set determines how a given data point is sorted; in classification, the instances are assigned to a class using a majority vote technique concerning to neighbors, each object is assigned to the class most common among its k-nearest neighbors. KNN is a non-parametric and lazy learning algorithm, which means that it does not make any assumptions on the underlying data distribution, but a estimate locally is performed to increase the swiftness in training phase and, data processing and compute are delayed up to classification tasks [47,48]. We used (attributes + instances)/2 as K-neighbours with a distance weighted by 1/distance and linear nearest neighbour search algorithm (LinearNNSearch) implementing Euclidean distance function. KNN has been used in other disease categorization problems by [34,49]. Support Vector Machine (SVM) is a supervised machine learning algorithm which plot each feature as a point in n-dimensional space (where n is number of features) with the value of each feature being a particular coordinate based on a selected kernel option. The classification is performed by finding the hyper-plane that better differentiate the two classes. Support Vectors are simply the coordinates of individual observation. With SVM the training is relatively easy, there is no local optimal, it scales well to high dimensional data and the trade-off between classifier complexity and error can be controlled explicitly [50]. In this study a quadratic kernel function was used because of the good results reported in previous studies by [27,49].
Multilayer Perceptron (MLP) neural network is a classifier that uses backpropagation to classify instances; usually has an input layer, an output layer, and one or more hidden layers; the nodes in this network are all sigmoid; and the random numbers are used for setting the initial weights of the connections between nodes, and also for shuffling the training data. The hidden layers in the neural network was setup in (attribs + classes)/2 and the The number of epochs to train through was set in 500. This algorithm has been implemented in the classification of other neurodegenerative diseases with good performance by [28,51].
The meta, multiple or assembly classifier combines a set of models based external classifier where each one solves the same original task to get a global composite model; the strengths of both methods (meta and classifier) improve the outcome with more precise and reliable estimates than those of a single model [52]. The meta-classifier generates classifiers in sequence by re-sampling and / or assigning weights to the instances in each iteration (N) taking into account the errors made in the previous classifier (N − 1), and therefore the new classifier (N + 1) avoids the errors made in the previous, thus each subsequent classifier will have a better result. Finally, a method is implemented to assemble the individual predictions of each classifier into a single output of classification. In this work we used Random Committee meta-classifier that build an ensemble of randomizable base Extremely randomized trees (Extra-Trees) classifier [53], from data training with M = 100 iterations. The final prediction of the Random Committee with Extra-Trees (RCET) algorithm combination is a straight average of the predictions generated by the Extra-tree base classifier.

Performance evaluation
Assessing the quality of the model allows you to know to what extent the trained model correctly predicts unseen data. The generalization of knowledge in a machine learning model refers to find out data patterns over a training data set. The aim of a suitable machine learning model is to generalize well from the training data to any data from the problem domain, in order to enable to get predictions accurate on the data the model has never seen.

Classification algorithms performance assessment
The Accuracy is the percentage of total items classified correctly. It summarizes the performance of a classification algorithm based on the percentage of instances that were correctly predicted in the class they belong (True Positives or TP) and the rate of elements that were erroneously predicted in another class that the one they belong (False Positives or FP).
The F-Measure is considered as a harmonic mean for the proportion of instances that are truly of a class divided by the total instances classified(Precision) and the proportion of instances classified as a given class divided by the actual total (Recall); when the F-Measure is closed to unit, the Precision and Recall have the same weight.
The area under the ROC curve is an overall indicator of the precision of a test, the score is interpreted as the probability that between a pair of individuals belonging to different groups the test will classify them correctly. The value goes from 1 (perfect discrimination) to 0.5 (no difference in the distribution of the test values between the 2 groups).
Matthews correlation coefficient (MCC) represents the correlation between the observed and predicted classifications, and it is calculated directly from all the four values in confusion matrix. This is a strong metric (may be the best singular assessment metric so far) that considers both accuracy and error rates on both classes. A high MCC value means the learner should have high accuracy on positive and negative classes, and also have less misclassification on the two classes. A coefficient of +1 represents a perfect prediction, 0 random prediction and −1 indicates total disagreement between prediction and observation [54].

Model accuracy validation
The terminology over-fitting and under-fitting in machine learning are used to describe the performance from learned model. Under-fitting refers to a model that can neither model the training data nor generalize to new data. Under-fit is not a suitable model because it suggests a low generalization in training data with limited performance,It is easy to detect by low accurate, this can be solved adjusting the feature set and try alternate machine learning algorithms. Over-fitting is shown when a high performance in training data and low performance in testing data are gotten; the model learned is incapable to recognize unseen data.
Re-sampling of the techniques to limit over-fitting when evaluating machine learning algorithms and the most frequently used is k-folds cross validation, it allows for model training and testing, the data are split into k-folds subsets to virtualize the model training and testing, thus calculate the approximated performance of learning model on unseen data. K-folds values of 5 and 10 are particularly the most used [55]. In real applications with access to a finite set of small samples, the concept of cross-validation provides the greatest accuracy [56].
Leave-one-out (LOOCV) is a special type of k-fold cross-validation where k is the full data size. That is, only one sample is "left out" at a time as the test set. LOOCV is much more accurate for small sample sizes than other methods [57]. In this work we used 10-fold cross-validation to find out the algorithms with high accuracy [55], then we use LOOCV [58] that does not need random partitioning and fits better with small samples. The assessment of the LOOCV classifiers performance were done by comparing several well known measures.

Errors evaluation in outcomes
Error rates assess the extent to which the prediction satisfies the real value distribution. The Mean Absolute Error (MAE) is the average absolute difference between classifier predicted output and actual output, it measures how close a predicted model is to the real model. The Root Mean Squared Error (RMSE) measures the differences between the values predicted by a model and the values actually observed. It is quite similar to MAE, the difference is that RMSE gives a relatively high weight to large errors, since in RMSE the errors are squared before they are averaged. The lower the RMSE and MAE values, the better the prediction and accuracy.

Results And Discussion
The sensor raw data were download from device, each file was rename with identifier patient and side (L=left, R=right). We plotted each data set to identify and eliminate missing data, meaningless data at data set edges, or atypical data captured during the participant change phase gait. The linear interpolation was applied to calibrate each sensor data to a sampling of constant time intervals fixed to 10ms (rate 100 Hz). The constant effects were eliminated with Zero normalization on the signal on each of the axes (X, Y and Z). The moving average data smoothing technique was applied to eliminated the insignificant variations from one data point to the next.
We applied the L 2 norm (equation 2) over the three axes of each sensor to eliminate the invariant of the sensor orientation getting the a m vector data. The local minimum peaks vector (p m ) located between two neighboring samples was found using a m vector data. The time when both feet go on the ground, the minimal signal data is read, which we nominated it a local prominent minimum peak. We used the equation 3 to find out the local prominent minimum peaks; the most prominent minimal peak was taken and search forward and backward around this. We extracted data corresponding ten gait cycles (prominent minimal peaks) to compute features gait.
The local prominent minimum peaks are circled in the Figure 3,the horizontal middle dotted line is the mean and the bottom line the standard deviation, the all local prominent minimum peak are lower, making clear a common pattern between local prominent minimum peak. The movement sensor location at the ankle defines the stride very clearly. Sensors data extraction algorithm using this criterion captures information over the total length of each stride, including information that could be discarded by methods based on average stride lengths. We derived a list of 56 gait features using the "iGait" tool based on the 10-gait window of each participant (28 by sensor), extracted with the proposed local minimum prominent peak criterion. The features extracted from each sensor were: cadence (step/min), mean step length (m), velocity (m/s), root mean square (RMS in X, Y and Z axis), symmetry (X and Y axis), stride regularity (X, Y and Z axis), integral power spectral density (Integral_PSD in X, Y and Z axis), cumulative power spectral density (frecuency_at_##_energy in 50%, 75%, 90% and 99% energy in X, Y and Z axis) and step regularity (X and Y axis); R for right and L for left were used to identify each patient's sensor; AP to X, VT to Y and ML to Z axis.
The selection best attributes method was carried out in two phases: ranking attributes and forward-selection. We implemented the wrapper algorithm "ClassifierAttributeEvaluator"+ "classifier (SVM, NN, RCMT and MLP)" in order to evaluate the attributes relevance's and "Ranker" algorithm as search method; the above allowed us to obtain each attribute with its scores according to the relevance assigned by the algorithms; four list order by ranker score were obtained.
The forward-selection strategy was executed to find the minimal number of attributes over ranked lists. Each list was input in each classifier algorithm classifier algorithm (SVM, KNN, RCMT and MLP) with a ten-cross-validation. We using the algorithm's accuracy as an improvement criterion. The table 3 shows the algorithms accuracy when the best scores attributes are added gradually to classification tasks. we can see that the "no improvement criterion" is met with different sized subsets of characteristics. For example, SVM and MLP have met the non-improvement criterion with the first 6 characteristics, RCMT with 5 and KNN with 2, but only RCMT and MLP reach 100% accuracy. We can also observe that the number of features of both ankles appear rather balanced for all results in Table 3, except for SVM which shows a predominance on the right ankle. Two characteristics are agree with the highest ranges for all the algorithms, the acceleration dimension (RMS) and the signal power (Integral_PSD) in both ankles with predominance in the VER and AP axes.. Table 3. Algorithms accuracy (%) for top ranked features using forward-selection strategy with a) SVM, b) KNN, c) RCTM and d) MLP algorithms. Names beginning with R refer to the right sensor and L to the left sensor. We got back to running the forward-selection strategy, taking the feature subset cutting until the improvement criterion for each classifier algorithm; the above to find the best attributes that would increase or ascertain the accuracy achieved. We did no find any better features subset when applying the proposed iterative over KNN and RCMT algorithms. However, we found a reduced subset of 4 features for the SVM algorithm with 92.86% in performance accuracy and a 3 feature subset that achieved 100% accuracy for the MLP algorithm (see Table 4). KNN and RCMT classified the instances a similar way, the healthy subjects were recognized better than the sick; nonetheless, when new features were added, the instances (ICC) of sick subjects were increasing, KNN fails to recognize one sick subject with two features, while RCMT recognizes all subjects with four features. SVM and MLP were more successful in recognizing the features of the sick group, however, MLP was more effective overall, because it recognized all subjects with the smallest sub-dataset.

(a) SVM
The proposed combination of prominent minimum criteria and feature selection strategy allowed the following findings: The most relevant feature is R_RMS_VER (the statistical measure of the acceleration magnitude of right ankle in (up and down) vertical axis) which enables MLP to achieve an accuracy of almost 90% in the correct discrimination of control subjects and HA patients. No other feature alone achieves such high accuracy, but the total signal power in frequency domain (Integral_PSD) contributes to its improvement in all algorithms. ML features were not relevant for any of the algorithms. KNN and MLP algorithms reached 96% in accuracy with only two features and RMCT attained 89.29%. KNN mistakenly classified one HA patient while MLP misclassified one control subject, RMCT obtained 96% with three features, missing the classification of one HA patient. For MLP, R_Integral_PSD_AP (the total signal power for right ankle in backwards and forwards X axis movement) had to be added to enable all HA patients to be correctly classified, but it was necessary to use the same feature for the left ankle (L_Integral_PSD_AP) to correctly separate both classes.
The Table 5 show the gait features subset that allowed the correct classification of all participants. In addition to the three gait features required by MLP, RCMT needed the stride regularity in VER axis for left ankle (L_Stride_regularity _in_VER) to reach 100% in accuracy. The same number of features from both ankles were selected in RCMT, but there is a right ankle predominance for MLP.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 2 December 2020 doi:10.20944/preprints202012.0054.v1 Table 5. We closed our experiments executing all classification algorithms over the corresponding gait features subset (Table 4) with a LOOCV. Classification outcomes show that random partition in 10-fold CV has no impact that could be reveled by LOOCV in accuracy assessment, giving the same results. We can observe in Table 6 that all algorithms achieved an accuracy over 90% with corresponding attributes subset, but KNN consumed only 51 sec to achieve its best result (96.43%) with 2 gait features, whilst to obtain a 100% accuracy RCET needed 4 features and almost 2 minutes and MLP with 3 features took more than 3 minutes. The F-Measure indicates that the relationship between precision and recall of classified instances between the two groups is highly significant due to their values being close to unity. The ROC area under the curve keeps values above the threshold (0.5) and closed to unit, which indicates that the predicted outcomes are in accordance with the expected ones. In applications such as continuous monitoring, where changes in gait pattern need to be detected, time-consuming is important as it will have to be repeated many times, especially in real time monitoring. The Matthews correlation coefficients between the observed and predicted accuracy ( Table 7), shows that MLP and RCET had a correlation of one (+1) meaning high accuracy on positive and negative classes.
Error measurements (MAE and RMSE) are indicators of how well the prediction results agree with the real value distribution, the greater the difference between them, the greater the variance in individual errors. We can see in table 7 that SVM and KNN have the greatest difference in values for MAE and RMSE, which means a greater incidence of individual errors. MLP has the lower values in both measures, which is consistent with obtained accuracy.

Conclusions
This paper focuses on finding the smallest set of gait features that correctly discriminate two classes of individuals, HA patients and healthy people, as unobtrusively as possible. To achieve this, we introduced a criterion based on the local prominent minimum peak position to determine the start of each stride. This allowed the length variability of each stride to be considered in the estimation of a 10-stride window from data collected with accelerometer values of two movement sensors placed in 28 participants ankles. Information extracted from the full stride length captures movements that are relevant to better discriminate ataxic gait. We proposed a feature selection strategy based in Hill Climbing algorithm to reduce the 56 spatial-temporal gait features that were obtained from this 10-stride window dataset.
KNN and MLP both obtained 96% with two gait features, however MLP only required right ankle sensor features and therefore could be used when reducing intrusiveness is an important factor. When the execution time is more important than the intrusiveness it is better to use KNN, since MLP had a time of 26+46 seconds (training+validation), although the difference is only 51 seconds in total for the KNN validation only took 7 seconds against 46 for MLP. When precision is more important than time and the number of gait features, MLP should be used because with an additional feature, it reaches 100% accuracy.
This study was conducted with a low number of patients with Hereditary Ataxia's because it is complicated to have access to a larger quantity of them because the disease is uncommon in the population and only in specialized medical centers can patients be accessed; however, motor disorders caused by the disease restrict the participation of patients. The small number of samples can lead to limited generalization of patient gait patterns, so to avoid over fitting, we used k cross-validation and the final results included a leaving-one-out cross-validation. The F-Measure, ROC area under the curve and Matthews correlation coefficients scores indicate a strong consistency between accuracy and correctly classified instances.
Future work should determine the results of the proposed method of stride extraction and attribute selection in a context such as continuous monitoring where the challenge is to determine changes in gait that may reflect disease progression.
Binary classification results using a very small number of gait features could be obtained thanks to the contrast between the two classes. However, in a context of continuous monitoring in which data are taken from the same person at different times, it can be assumed that in order to obtain similar results, the classes that are constructed with these data must present a similar contrast to the classes used in this study. Future works using our proposal in this context should establish a balance between the number of attributes and algorithm accuracy, appropriate to the level of disease to be monitored.
A similar balance should be established in other research studies in which classes are not well differentiated, such as the early diagnosis of diseases that affect patients' gait.
Therefore, this work lays the groundwork for the generation of computational tools to support medical diagnosis and continuous monitoring of gait to oversee long-term disease progression.
The proposed approach of accelerometer based stride size identification and the classification algorithm selection with the minimum number of gait characteristics that better discriminates the gait of a patient with a neurodegenerative disease from that of a healthy person, can be useful in the differential diagnosis between diseases like Spinocerebellar Ataxias and Huntington, that are clinically similar.