Identification of Hand Movements from Electromyographic Signals Using Machine Learning

Electromyographic (EMG) signals provide information about a person's muscle activity. For hand movements, in particular, the execution of each gesture involves the activation of different combinations of the forearm muscles, which generate distinct electrical patterns. Conversely, the analysis of these muscle activation patterns, represented by EMG signals, allows recognizing which gesture is being performed. In this study, we aimed to implement an automatic identification system of hand or wrist gestures based on supervised Machine Learning (ML) techniques. We trained different computational models and determined which of these showed the best capacity to identify six hand or wrist gestures and generalize between different subjects. We used an open access database containing recordings of EMG signals from 36 subjects. Among the results obtained, we highlight the performance of the Random Forest model, with an accuracy of 95.39%, and the performance of a convolutional neural network with an accuracy of 94.77%.


Introduction
EMG signals are generated by the electrical activity of skeletal muscle fibers and can be captured through two ways. First, invasive methods involve the insertion of a needle directly into the muscle; second, non-invasive methods rely on surface electrodes that are placed on the skin in areas close to the target muscle. In particular, surface EMG is the most suited for establishing human-machine interfaces of daily use [1]. These kind of signals present subtle variations between subjects, associated with anatomical differences and motor coordination [2]. The variations are random and highly non-stationary [3]; therefore, it is necessary to use different techniques of feature extraction and selection for time series, which are also useful for developing optimal models of artificial intelligence (AI) [4].
These signals are used for several purposes by health professionals, including disease diagnosis and rehabilitation. Recently, with the rise of automatic learning techniques, these signals are used in systems for the classification of movements [5][6][7], the recognition of motor unit action potential (MUAP) [8], and the diagnosis of neuromuscular diseases [9]. These tasks are achieved by measuring and studying the features that can be extracted from EMG signals. Feature extraction can be done through different techniques, such as autoregressive models [10], signal entropy measurements [11], and statistical measurements of amplitude in the time and frequency domains [4]. In particular, the Wavelet transform, which provides information in the frequency domain at both high and low frequencies, is highly used [12,13].
The feature extraction stage is followed by the implementation of classification models that can learn from the extracted features. Regarding computational models, AI is considered the main domain, which includes ML and Deep Learning (DL). AI refers to the process of displaying human intelligence features in a machine. ML is used to achieve this task, while DL is a set of models and algorithms used to implement ML [14]. In general, ML and DL technologies are powerful for extracting features and finding relationships between data; therefore, these approaches are suitable for tasks in which they can take advantage of human experience [15][16][17]. Machine and Deep Learning algorithms applied to electrophysiological signals constitute a rapidly growing field of research and allow researchers to train computer systems as experts that can be used later on to support the decision-making process.
In ML, the most commonly used models are Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Ensemble classifiers. A SVM was used in [18] to classify human gait phases based on bilateral EMG recordings. The KNN model was used in [8] to indicate how well different types of features could be used to decompose an EMG signal into its component motor unit potential trains (MUPTs). As for Ensemble classifiers, these include the ensemble of SVM used in [19] to detect neuromuscular disorders using statistical measurements from each sub-band of the discrete wavelet transform (DWT) as features; Random Forest (RF) used in [9] to classify disorders as neuropathies or myopathies based on EMG signals using features extracted from DWT coefficients and power spectral density; and Rotation Forest used in [20] for the diagnosis of neuromuscular disorders. Finally, in DL, different architectures are reported, including fully connected artificial neural networks (ANN) and convolutional neural networks (CNN). In particular, [21] used an architecture that includes a convolution layer with 32 filters, a ReLu activation layer, a MaxPooling layer, a fully connected layer, and, lastly, a softmax output layer with six units.
This article proposes a novel approach to EMG signal processing, in which not all the information of the time series is considered. Here, we use only the raw data for hand gestures (i.e. the muscular activation values provided by each channel) to identify the patterns associated with six gestures, regardless of the subject performing the action. We compared different ML algorithms to improve the precision in the identification and classification of hand gestures from the EMG signals. More important, we aimed to generalize among different subjects, which is still an issue regarding this classification task. The classification accuracies obtained using this approach are over 90% for several models, which demonstrates its feasability.
This article is organized as follows: section 2 presents the materials and methods, including the database and models used in the experimental process, which is explained in detail in section 3. Section 4 contains the experimental results and, lastly, sections 5 and 6 correspond to the discussion and conclusions, respectively, derived from the research.

Database
The information used to develop this research was retrieved from a free database called EMG data for gestures Data Set available on the website of the UCI Machine Learning Repository. This database contains surface EMG (sEMG) signal recordings in a time interval where six different static hand gestures are executed, providing a total of 72 recordings from 36 patients. The static hand gestures referred to include resting hand, grasped hand, wrist flexion, wrist extension, ulnar deviation, and radial deviation. These gestures were performed for three seconds, each with a three-second interval between gestures. The information was collected using a MYO thalmic bracelet; this device has eight channels equally spaced around the patient's forearm. Accordingly, each recording consists of an eight-channel time series, in which each segment is properly labeled with numbers zero to six, where zero corresponds to the intervals between gestures and numbers one to six refer to the gestures mentioned above. Table 1 relates the gestures to the classes or labels.

Label
Gesture 0 Intervals between gestures 1 Resting hand 2 Grasped hand 3 Wrist flexion 4 Wrist extension 5 Ulnar deviation 6 Radial deviation Table 1. Labels of the gestures contained in the database This database was consolidated by [2] during their research on the factors limiting human-machine interfaces via sEMG.

Conventional models
The ML models implemented in this experiment were the following: K-Nearest Neighbors (KNN), Logistic Regression (LR), Gaussian Naive Bayes (GNB), Multilayer Perceptron (MLP) using one hidden layer, Random Forest (RF), and Decision Tree (DT).

K-Nearest Neighbors (KNN)
is one of the most recognized ML algorithms for solving classification tasks. This model is quite simple and consists of storing the labeled training data. To classify an unlabeled object, a similarity metric between the object and the labeled objects is computed to identify the k closest elements to the unlabeled object. Then, the labels of these closest elements are used to perform the classification. If k=1, then, the label of the unlabeled object is determined by the label of the closest element of the training set; otherwise, if k> 2, the label is determined by vote [22].
Logistic Regression (LR) is a linear classification model that computes the weighted sum of the imput features and uses a Sigmoid activation function to compute class probabilites and make a prediction [22].
Gaussian Naive Bayes (GNB) is based on a probabilistic model derived from Bayes' theorem. This classification model assumes that all features are independent from each other and determines the probability that an instance belongs to a class based on the feature probabilities. In the Naive Bayes classifier, the features are assumed to be normally distributed (Gaussian distribution) [23].
Multilayer Perceptron (MLP) model has multiple stages of processing and can be considered a generalization of linear models. Essentially, this model uses multiple linear units that compute the weighted sum of the input features and are affected by an activation function. In MLP, this process is repeated multiple times to compute an intermediate processing step called the hidden layer, which is then used to generate the final result [22].
Random Forests (RF) are in the category of ensemble learning methods, which combine multiple ML models to create a more robust one. A RF is basically a collection of Decision Trees, where each tree is different from the others. The differences between the trees can be accomplished in two ways; by randomizing the selection of the training data to build the tree or randomizing the selection of the features used in each question [22]. The overall prediction is made using the predictions of the ensemble by majority vote.
Decision Trees (DT) are widely known ML algorithms used for classification or regression tasks. The basis of this model is that it learns a hierarchy of yes/no (if/else) questions that allow making a prediction [22].

Deep Learning models
The concept of DL comes from artificial neural network research. Unlike the traditional proposals, modern DL has achieved training stability and the ability to generalize, even on big Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 28 February 2020 doi:10.20944/preprints202002.0443.v1 data. Nowadays, DL is the technique of choice to achieve the highest predictive accuracy since DL algorithms perform quite well in a variety of problems. DL architectures are computational models involving hierarchical feature extraction, which is typically accomplished using multiple levels of nonlinearity [14]. Particularly, artificial neural networks occupy a prominent position among the many supervised learning algorithms currently used in ML [24].

Fully connected artificial neural network (ANN) architecture
A fully connected artifical neural network is characterized by having all units between layers connected; therefore, the activation values computed by a unit are passed to every unit in the following layer. In general, the activation values of a layer are given by equation 1.
W represents the weights of the layer, x corresponds to the input data of the layer, which can also be the activation values of the previous layer , and b is the bias. The sign · is used to indicate the matrix multiplication operation and g() represents the activation function used, usually nonlinear.
The model implemented here corresponds to a network with three hidden layers and one output layer. The output layer has six units corresponding to the number of movements that we aim to identify. The activation function for the output layer was softmax and the loss function was categorical cross-entropy. The diagram of the ANN architecture is shown in figure 1. The main feature of these networks is that they use the convolution operation to extract features from the data. Each convolutional layer has several kernels, or filters, that are slid over the data while computing the number of coefficients that can be used as features. Given its nature, the output dimension of a convolutional layer depends on the size and amount of filters used. The convolution operation for one layer is presented in equation 2.
f i represents the i-th filter of the layer, x corresponds to the input data of the layer, which can also be the activation values of the previous layer, and b is the bias. The sign * is used to indicate the convolution operation and g() represents the activation function used.
The model implemented in this study corresponds to a convolutional neural network with two 1D convolutional layers, one ReLu activation layer, three fully connected layers, and one output layer. The output layer has six units corresponding to the number of movements that we aim to identify. The activation function for the output layer was softmax and the loss function was categorical cross-entropy. The diagram for the CNN architecture is presented in figure 2.

Tools
The complete experimental procedure was implemented in Python v. 3.7.3 programming language. We used Pandas library v. 0.24.2 for database import and preprocessing. The ML models mentioned in section 2.2, GridSearchCV, Principal Component Analysis (PCA), train-test split, and the scaling function were implemented using the Scikit-learn library v. 0.21.2 [25]. The artificial neural network architectures (section 2.3) and the corresponding training and validation processes were implemented using the Keras library v. 2.2.4 [26].
The resources used in this research are available in a public GitHub repository containing the EMG recordings files from the database, as well as a Jupyter notebook with the implemented source code. The repository is available here.

Importing the data
The data from the database is presented as a plain text file (.txt). We used the read_csv function from Pandas library to import these files.

Preprocessing
We removed the data that was labeled with the number zero since it corresponds to the resting intervals between gesture execution and does not contain relevant information for the solution of the problem. Thus, only the data captured during the execution of the six gestures of interest were kept. We did not apply any feature extraction procedure (i.e. raw data) because we intended to use the values of the eight channels provided by the device as discriminating information between gestures.

Dataset creation
For dataset creation, we guaranteed that the patients were correctly distributed into the training, validation, and test sets. Accordingly, each patient was present in only one set to avoid bias induced by considering data from the same patient for training and testing. With this in mind, the training set was formed by concatenating the data from 30 patients; the validation set comprised three patients, and test set included the remaining three patients. The patients conforming each of the sets were randomly selected. A second group of sets was created by scaling the first one; in this way, all the features equally contributed to the training process. The sets described above were used during the hyperparameter adjustment process and to compute the test set accuracy metric. Finally, to conserve the correct patient distribution during the cross-validation process, each of the nine k-folds was manually generated by concatenating the data from four random patients.

Hyperparameter adjustment
Following dataset creation, we adjusted the parameters inherent to each classification method assessed. Table 2 summarizes the parameters that were adjusted in the ML models, as well as the ranges used for each one. For the parameters not listed in this table, we used the default values defined by the library.

Model
Parameters and range KNN n_neighbors in the range of 1 to 6, with step 1.  Given the number of variable parameters for ANN and CNN architectures, we decided to adjust these one or two parameters at a time using the GridSearchCV method implemented in the Scikit-learn library. This method selects the best combination of parameters using the result of a cross-validation process. The parameters were adjusted in the following order: batch_size and epochs, optimization algorithm, network weight initialization, activation function, and, finally, neurons in hidden layers. Table 3 presents the adjusted parameters for the ANN and their corresponding ranges, and Table 4 shows the parameters set for the CNN.

Model training and hyperparameter optimization
For parameter optimization, we used an approach similar to a grid search. Given a set of parameters and value ranges, the model was trained and tested on the validation set using every possible combination. For traditional ML models, this process was implemented manually using for cycles, whereas for artificial neural networks, we used the GridSearchCV function from Scikit-learn library due to the high number of parameters.

Evaluation of the model using the best parameters
After performing hyperparameter optimization, the best set of parameters for each model was used to generate the definitive evaluation metric. For this, we used cross-validation to measure model performance. This method uses the complete feature matrix and divides the dataset into k-folds parts. In this case, we used k-folds = 9 and each part contained data from four randomly selected patients. The data was iterated over these partitions, using eight for training and one for validation until they all passed through both states. This procedure was used to check the stability of the results. In addition to cross-validation, we calculated model accuracy on the test set to estimate the model's ability to generalize. The test set had not been used in any previous stage of the experiment. Finally, as class measurements, we used precision, recall, and F1-score.

Results
Parameter optimization of the ML models was visualized by the graphs shown in figure 4, which display the model performance behavior during parameter adjustment. According to these graphs, we selected the parameter values and dataset (Normal/Scaling) for each of the ML models, which are listed in table 5.

Model
Best parameter value  For the ANN architecture, the results of the parameter adjustment using GridSearchCV are presented in table 6. Using these values, we trained the ANN and generated graphs of performance accuracy and loss in each training epoch to observe the model behavior and determine the performance  Figure 5 contains the model accuracy and loss curves during training. For the CNN architecture, the results of the parameter adjustment using GridSearchCV are shown in table 7. The number of epochs was manually adjusted according to the model training curves; as a result, the most appropriate value found was Epochs = 100. Figure 6 shows the accuracy and loss curves during training for CNN using the parameters listed in table 7.

Parameter
Optimal value batch_size 50,000 epochs 150 optimization algorithm Nadam(lr = 0.01) network weight initialization normal activation function tanh neurons in hidden layers 1,050   The two methods mentioned in section 3 for measuring model performance, namely cross-validation and test set accuracy, were implemented using the optimized parameters for each model. The results are shown in table 8, where STD stands for standard deviation.

Model
Cross-validation Accuracy (%) ± STD (%) Summarizing the results from table 8, the LR, GNB, and MLP models displayed the lowest performance. On the other hand, the best results belong to the KNN, RF, DT, and DL models; the best three results are highlighted in yellow. For these models with the best performance, we also calculated accuracy, recall, and F1-score per class. We found that the resting hand gesture shows the best classification according to these measures. The results are found in table 9.  Table 9. Performance by class

Discussion
In this study, we explored the performance of different ML and DL techniques in the task of classifying six hand gestures from electromyographic signals without using any feature extraction method for time series. We demonstrate the feasibility of using the information provided by each of the channels as discriminant features between the gestures. Thus, it is possible to use each record as an instance of the features matrix and obtain a more significant amount of labeled information for model training. Also, this approach allows avoiding computational costs and reduces the time required to implement the different feature extraction methods. For instance when training using Google Colaboratory platform in a GPU environment, the KNN model takes less than a minute for training and testing; RF takes approximately seven minutes, and both DL models take nearly 20 minutes. These computational time metrics are not provided by other authors, which makes it difficult to determine if our approach is faster. A novel finding was that SVMs, or at least the Python implementation by the Scikit-Learn library, require a high amount of computational resources when training over a high-dimensional feature matrix, such as the one generated here. We used ensemble classifiers (Bagging) to train up to ten models in parallel with a fraction of the dataset; however, the computational time and results were not encouraging enough to spend additional time attempting to run this model.
The best results described here are comparable to literature reports on similar problems and on the same classification task. Concerning the best results from previous studies, we highlight those by [20], who obtained 99.2% accuracy in MUAP classification using the statistical properties of each sub-band of the discrete Wavelet transform as features. On the other hand, [21] report a 98.12% accuracy for the identification of seven hand gestures using four time-domain features and stacked sparse autoencoders as a classifier. Working on a similar classification task, [27] achieved a 96.38% accuracy when using RF model to identify three hand gestures in a subject-indenpendent fashion with data from 10 subjects, the best model on their investigation matches the model that achieved the highest accuracy in our investigation.
The contribution of this study is to demonstrate the feasibility of identifying, by means of traditional ML and DL models, different hand gestures from raw EMG signals recorded from the surface of the subject's forearm, as well as to generalize among 36 subjects and still achieve high classification accuracy. An important detail that we want to mention, since we did not see the researchers suggest this topic, is the correct way to distribute the patients in the training, validation and test sets, this implies that the information of a patient is limited to only one of the sets; this way the results are reliable avoiding the bias induced by having information of the same patient in two or even all three sets. This is an important point when designing and implementing ML models since they can memorize the information in the training set, and if we have the same information in the validation or test set, we can obtain biased performance measures.

Conclusions
Our results confirm the feasibility of using Machine Learning and Deep Learning techniques to identify muscle activation patterns, specifically, hand movements. Further, we present a different approach to this classification problem using raw data provided by the MYO thalmic bracelet device to train models with high performance levels, regardless of the subject to which the signals belong. This allows overcoming the different anatomical and biological factors between subjects, which translate into subtle differences in EMG signals. The best performances reported here were obtained by implementing Random Forest with 95.391% precision, Convolutional Neural Network with 94.77%, Artificial Neural Network with 93.73%, Decision Tree with 93.29%, and K-nearest neighbor with 93.22%. As future work, we intend to optimize and adapt our trained models for real-time classification tasks; for instance, in the control of movements of a prosthesis.