Peng, Z.; He, W.; Li, Y.; Du, Y.; Dang, J. Multi-Level Attention-Based Categorical Emotion Recognition Using Modulation-Filtered Cochleagram. Appl. Sci.2023, 13, 6749.
Peng, Z.; He, W.; Li, Y.; Du, Y.; Dang, J. Multi-Level Attention-Based Categorical Emotion Recognition Using Modulation-Filtered Cochleagram. Appl. Sci. 2023, 13, 6749.
Peng, Z.; He, W.; Li, Y.; Du, Y.; Dang, J. Multi-Level Attention-Based Categorical Emotion Recognition Using Modulation-Filtered Cochleagram. Appl. Sci.2023, 13, 6749.
Peng, Z.; He, W.; Li, Y.; Du, Y.; Dang, J. Multi-Level Attention-Based Categorical Emotion Recognition Using Modulation-Filtered Cochleagram. Appl. Sci. 2023, 13, 6749.
Abstract
Speech emotion recognition is a critical component for achieving natural human-robot interaction. The modulation-filtered cochleagram is a feature based on auditory modulation perception, which contains multi-dimensional spectral-temporal modulation representation. In this study, we propose an emotion recognition framework that utilizes a multi-level attention network to recognize emotions from the modulation-filtered cochleagram. The channel-level attention and spatial-level attention modules are used to capture emotional saliency maps of channel and spatial feature representations from the 3D convolution feature maps, respectively. Furthermore, the temporal-level attention module captures significant emotional regions from the concatenated feature sequence of the emotional saliency maps. Our experiments on the IEMOCAP dataset demonstrate that the modulation-filtered cochleagram significantly improves the prediction performance of categorical emotion compared to other evaluated features. Moreover, our emotion recognition framework achieves a better unweighted accuracy of 71% in categorical emotion recognition than several existing approaches in the experiments. In summary, our study demonstrates the effectiveness of the modulation-filtered cochleagram in speech emotion recognition, and our proposed multi-level attention framework provides a promising direction for future research in this field.
Keywords
Categorical emotion recognition; Auditory signal processing; Modulation-filtered cochleagram; Multi-level attention
Subject
Computer Science and Mathematics, Artificial Intelligence and Machine Learning
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.