This work was not supported by any organization.
1. Introduction
Emotions play a significant role in human life. Despite extensive research into the nature of emotions, there is still no general consensus on what emotions are and how they can be displayed. Paul Ekman has proposed a relatively comprehensive definition of emotions. He considers emotions to be the result of interactions between mental factors, environmental factors, and neural and hormonal processes in the body [
1,
2,
3]. Emotion is a physiological and brain-related mental state that is associated with a wide range of feelings, behaviors, and thoughts.
Therefore, feeling can be considered a subset of emotion, and this study focuses on emotion recognition[
4]. Emotions, in a way, shape various aspects of human life, such as daily experiences, perception, and the performance of daily tasks such as learning, communication, and even daily decision-making. Emotion recognition has played a crucial role in human life, and most traditional and ancient studies in this field have relied on physical parameters such as facial expressions and body movements. Over time and with the advancement of science and technology, conditions have been created to directly obtain information from the brain. There are various options for obtaining brain information, including functional magnetic resonance imaging (fMRI), electroencephalography (EEG), and near-infrared spectroscopy (NIRS). Among these, the use of EEG signals is more common due to its advantages over other signal acquisition methods. EEG has two main advantages: high temporal resolution and low cost of measurement equipmentdimensions[
5].
Understanding emotions through EEG signals enables the detection of emotional states without traditional methods such as questionnaires, making it possible to express emotions in individuals without clinical examinations and visits, which plays a crucial role in completing the puzzle of brain-computer interaction (BCI)[
5]. There are various methods for inducing emotions in humans. Watching emotional films, viewing emotional images, mental imagery, and emotional music are some of the methods of inducing emotions[
6].
In the field of emotion recognition through EEG, there is generally not much agreement on the suitability of features, and only a few studies have compared different sets of features. For example, in a study [
7], five individuals were tested using emotionally evocative stimuli presented through images. Each image was shown to the participants for 6 seconds from a distance of 5 meters using a display screen. Considering a 15-second interval between the induction of emotional states and a 2-second interval between image presentations, the total recording time was 20 minutes, during which three emotions (pleasantness, neutrality, and unpleasantness) were examined. Additionally, in this study, two sets of features were compared, and using Fast Fourier Transform and extracting several statistical features with Support Vector Machine (SVM) as a classifier for both sets, a correct recognition rate of 66% was reported.
In [
8], two emotional classes were tested in a calm state with open eyes. Data segmentation and linear regression were used to extract features in this study. The experiment was conducted on 43 participants. Then, by normalizing the signal and using fuzzy clustering [
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23] in different frequency bands, quantitative results were obtained. In [
24], feature extraction [
25,
26,
27,
28,
29] for emotion recognition through 64-channel EEG signals was performed, comparing them with each other and selecting important features in the domain of cognition using machine learning techniques. In this article, features in three domains (time, frequency, and time-frequency) were extracted from data composed of 7 females and 9 males. The data included 5 emotional states (joy, curiosity, anger, discomfort, and calmness) and were presented using IAPS images in 8 segments of 30 seconds for each emotional class, validated by self-assessment manikin (SAM). In the signal processing stage, the data related to 5 participants were excluded due to low signal quality, and the average of 11 individuals with 6 different feature extractions and using Quadratic Discriminant Analysis (QDA) as a classifier yielded reported emotion recognition accuracies between 34% and 36%.
In[
30], deep learning networks (DLN) [
31] were used for emotion recognition through EEG signals. In this study, recordings were conducted on 32 participants with 32 channels. Additionally, Principle Component Analysis (PCA) was applied to extract important features. An interesting point in this study was the separate extraction and classification of features for arousal and valence dimensions. The results were evaluated using SVM and Bayes classifiers.
According to the reports, the DLN network performed better than SVM. In this study, the feature extraction part was considered fixed, and four different and combined algorithms were used in the classification part. To assess the performance of the proposed method, three common quantitative measures were used. For this purpose, the sensitivity [
32,
33], accuracy, and specificity measures were calculated. Sensitivity represents the relative ratio of positive cases correctly marked as positive. Specificity represents the relative ratio of negative cases correctly marked as negative, and accuracy represents the number of correct diagnoses of the two classes relative to all correct and incorrect diagnoses of the two classes. The classification results of this study reported accuracies of 53.42% with a standard deviation of 64% for valence dimension and 52% with a standard deviation of 75% for arousal dimension.
In [
34], three categories of features in emotion recognition were compared with each other based on video stimuli using a custom recorded dataset. Then, the resulting feature matrix was classified using SVM. The sampling frequency in this study was 100 Hz. The final results of this article, considering the diversity of the used features and applying a feature normalization stage, reported an accuracy of 87% for classification. Yan and colleagues [
35] used a correlation-based model for feature extraction from EEG signals to classify different emotions (calmness, joy, sadness, and grief) in 8 participants. They employed BP, SVM, LDA, and C4.5 classifiers and concluded that the C4.5 classifier performed better in emotion recognition than the others. Y.hou and colleagues [
36] used a parallel fuzzy cascade for predicting the emotional content of EEG signals.
These researchers used musical stimuli on 15 participants in their experiment. They also compared their proposed model with several common algorithms. The reported error of their proposed model for the classification of 2 emotions was approximately 0.089. Panaio and colleagues [
37] employed deep neural networks for detecting two types of emotions using EEG signals.
These researchers used 12 participants in their experiment. The architecture of their proposed network consisted of 6 convolution layers. They compared their proposed model with SVM and concluded that their proposed model performed better in emotion detection. Yang and colleagues [
38] used a recurrent neural network for automatic identification of emotions from EEG signals. These researchers used a dataset based on video stimuli in their experiments. Additionally, in their proposed method, they transformed one-dimensional EEG signals into two-dimensional frames for network training. Their reported accuracies for both valence and arousal classes were 90% and 91%, respectively. Chen and colleagues [
39] used EEG signals for automatic classification of two-class emotions. These researchers used parallel recurrent neural networks in their proposed algorithm.
The final reported accuracies for classifying valence and arousal based on their proposed algorithm were 93.64% and 93.26%, respectively. Wei and colleagues[
40] used dual-tree wavelet transform for extracting desired features from EEG signals for emotion recognition. Additionally, after extracting the desired features, they utilized recurrent units to train their model. Ultimately, they achieved reported accuracies of 85%, 84%, and 87% for positive, negative, and neutral emotions, respectively.
The main challenge in emotion recognition algorithms from EEG signals is the selection of distinct features for different emotional stages. In most previous studies, conventional statistical and processing methods were used to extract features, and then feature selection was performed using dimensionality reduction techniques. Manual feature extraction introduces computational complexity in classifying different emotional stages. Furthermore, features that are desirable and optimal [
41,
42,
43,
44] for one problem may not be optimal for another problem [
45].
Therefore, it is necessary to use a method that can learn suitable features based on the type of problem and data. This issue is a key point in this research. In this article, a fully automatic classification algorithm (without the need for manual feature selection and extraction) has been designed for the detection of 3 emotional stages (positive, negative, and neutral) using EEG signals. The algorithm utilizes deep learning and a CNN network to process the raw input signals for feature learning and automatic identification of different emotional stages with high accuracy and prediction speed.
The following continuation of the paper is structured as follows. In
Section 2, we delve into the analysis of the recorded experimental data obtained through musical stimulation. Additionally, we provide a detailed explanation of the Convolutional Neural Network (CNN) in conjunction with the Long Short-Term Memory (LSTM) network. Moving on to
Section 3, we present the architecture of our proposed method, which is based on the combination of CNN and LSTM, denoted as CNN-LSTM. In
Section 4, we present and discuss the results obtained from our simulations. Finally, in
Section 5, we summarize our findings and draw conclusions based on the outcomes of our study.
4. Results and Discussion
In this section, the results of the simulation of the proposed deep neural network for automatic emotion recognition from EEG signals are presented.
Figure 8 shows the learning curves for scenario one and two.
According to
Figure 8, the network error for scenario one reaches a stable state at around iteration 130, and for scenario two, it becomes almost constant at iteration 145.
Figure 9 illustrates the accuracy of the proposed method for scenario one and two over 400 iterations. It can be observed from
Figure 9that the performance of the proposed method in emotion classification improves significantly after 200 iterations, achieving an accuracy of 97.42% and 95.23% for scenario one and two, respectively.
Figure 10 presents the confusion matrix for scenario one and two, demonstrating the promising performance of the proposed deep neural network.
Furthermore,
Figure 11 displays the bar chart (including precision, Sensitivity, accuracy and specificity) for scenario one and two. It indicates the effectiveness of the proposed deep neural network.
Figure 12 depicts the scatter plot of different convolutional filters for scenario one and two. Considering the scatter plot in
Figure 12, the proposed method demonstrates high efficiency in data classification. However, to showcase the desirable performance of the proposed algorithm for classification, scenario two (including positive, neutral, and negative) was compared to several existing methods such as CNN, DBM, and MLP. For the CNN network, the proposed architecture from
Table 3 was utilized without considering LSTM networks. For the DBM and MLP networks, three hidden layers and a learning rate of 0.001 were employed.
Figure 13 illustrates the performance of the proposed LSTM-CNN method compared to CNN, DBM, and MLP networks for scenario two.
According
Figure 13, the achieved accuracies for the three compared classifiers were 90%, 79%, and 73%, respectively. The proposed algorithm’s architecture, based on LSTM-CNN networks, demonstrates efficient performance in emotion classification for positive, neutral, and negative sentiments, as shown in
Figure 13. However, the computational complexity for the proposed LSTM-CNN algorithm, as well as the LSTM, CNN, DBM, and MLP networks, is presented in
Table 4. According to the table, the proposed algorithm has higher computational complexity compared to the other three methods but achieves the highest accuracy for both scenarios.
Table 5 presents the obtained kappa values for the two scenarios to validate the achieved accuracies.
Previous studies have utilized common methods such as wavelet transform (WT), empirical mode decomposition (EMD), and other feature selection and extraction techniques to extract meaningful features from EEG signals. These methods involve selecting parameters like the mother wavelet type, decomposition levels, and feature extraction algorithms. In contrast, the proposed method eliminates the need for conventional feature selection methods and directly focuses on emotion detection from EEG signals without utilizing feature extraction algorithms. In order to evaluate the performance of the proposed algorithm in noisy environments, white Gaussian noise with signal-to-noise ratio (SNR) ranging from -4 to 20 dB was added to the recorded EEG signals. The classification accuracy for scenario two was compared with the baseline methods for each SNR spectrum, as shown in
Figure 14. The obtained results are presented in
Figure 14.
According to
Figure 14, the classification performance of the proposed algorithm is significantly superior and more robust compared to the baseline methods across a wide range of SNR levels. The proposed algorithm demonstrates remarkable resistance to the degrading effects of noise in comparison to the compared methods.
5. Conclusion
This study delved into the challenging task of emotion recognition and explored the selection of discriminative features for classification. Traditional approaches typically involved integrating feature selection with various classification methods, leading to diverse outcomes. However, this paper proposed a novel method that eliminated the need for complex classification techniques by bypassing the feature selection step and sidestepping common feature extraction techniques. Instead, it leveraged the raw EEG signals directly for emotion recognition, achieving remarkable accuracy rates of over 90% across two different scenarios.
The key innovation of the proposed model lay in its adoption of a hierarchical structure comprising seven convolutional layers, three LSTM layers, and two fully connected layers. This architecture was designed to identify and prioritize features that exhibited the highest discrimination power among different emotions. Through rigorous experimentation, the results convincingly demonstrated that the proposed model was capable of extracting and utilizing these crucial features effectively. Consequently, it achieved exceptional accuracy in accurately categorizing emotions.
The implications of such a high-performing algorithm are significant, particularly in the domain of Brain-Computer Interface (BCI) systems. BCI technology aims to establish direct communication channels between the human brain and external devices, enabling individuals to control machines using their thoughts. By incorporating the proposed model into BCI systems, it holds the potential to enhance the accuracy and efficiency of emotion recognition, consequently enabling more seamless and intuitive interactions between users and computer interfaces. This advancement opens up exciting avenues for applications in fields such as healthcare, virtual reality, gaming, and beyond. The prospect of a streamlined and robust emotion recognition algorithm paves the way for exciting advancements in human-machine interaction.
Figure 1.
The result of SAM validation regarding the level of effective elicitation of emotional stimuli (10 musical pieces) for the first subject
Figure 1.
The result of SAM validation regarding the level of effective elicitation of emotional stimuli (10 musical pieces) for the first subject
Figure 2.
The order of music playback for the participants
Figure 2.
The order of music playback for the participants
Figure 3.
Block diagram of the proposed algorithm
Figure 3.
Block diagram of the proposed algorithm
Figure 4.
Selected channels in the simulation
Figure 4.
Selected channels in the simulation
Figure 5.
Convolution operation (overlap) on the recorded signal
Figure 5.
Convolution operation (overlap) on the recorded signal
Figure 6.
Details of the proposed deep neural network (CNN-LSTM)
Figure 6.
Details of the proposed deep neural network (CNN-LSTM)
Figure 7.
Allocation of EEG signal data related to the first and second scenarios
Figure 7.
Allocation of EEG signal data related to the first and second scenarios
Figure 8.
Error graph for subfigure(a). First scenario (Positive and negative emotions) and subfigure(b). Second scenario (Positive, neutral, and negative emotions)
Figure 8.
Error graph for subfigure(a). First scenario (Positive and negative emotions) and subfigure(b). Second scenario (Positive, neutral, and negative emotions)
Figure 9.
Accuracy graph subfigure(a). First scenario (Positive and negative emotions) and subgigure(b). Second scenario (Positive, neutral, and negative emotions)
Figure 9.
Accuracy graph subfigure(a). First scenario (Positive and negative emotions) and subgigure(b). Second scenario (Positive, neutral, and negative emotions)
Figure 10.
Confusion matrix for subfigure(a). First scenario (Positive and negative emotions) and sunfigure(b). Second scenario (Positive, neutral, and negative emotions).
Figure 10.
Confusion matrix for subfigure(a). First scenario (Positive and negative emotions) and sunfigure(b). Second scenario (Positive, neutral, and negative emotions).
Figure 11.
Bar chart (including accuracy, sensitivity, specificity, and precision) for the first scenario and the second scenario
Figure 11.
Bar chart (including accuracy, sensitivity, specificity, and precision) for the first scenario and the second scenario
Figure 12.
t-SNE plot for different convolutional layers for subfigure A. First scenario and subfigure B. Second scenario
Figure 12.
t-SNE plot for different convolutional layers for subfigure A. First scenario and subfigure B. Second scenario
Figure 13.
Performance of the proposed deep neural network (CNN-LSTM) compared to CNN, DBM, and MLP networks for the second scenario
Figure 13.
Performance of the proposed deep neural network (CNN-LSTM) compared to CNN, DBM, and MLP networks for the second scenario
Figure 14.
Comparison of the accuracy of the proposed deep network compared to other methods in a noisy environment
Figure 14.
Comparison of the accuracy of the proposed deep network compared to other methods in a noisy environment
Table 1.
Validation of participants in the EEG signal recording process
Table 1.
Validation of participants in the EEG signal recording process
Table 2.
Songs used for emotion induction
Table 2.
Songs used for emotion induction
Table 3.
Details of the architecture and size of filters in the proposed network
Table 3.
Details of the architecture and size of filters in the proposed network
Table 4.
Computational complexity of the proposed algorithm compared to 3 other methods
Table 4.
Computational complexity of the proposed algorithm compared to 3 other methods
Table 5.
Cohen’s kappa values for the first and second scenarios.
Table 5.
Cohen’s kappa values for the first and second scenarios.