Article
Version 1
Preserved in Portico This version is not peer-reviewed
Multi-Level Attention-Based Categorical Emotion Recognition Using Modulation-Filtered Cochleagram
Version 1
: Received: 30 April 2023 / Approved: 1 May 2023 / Online: 1 May 2023 (03:06:38 CEST)
A peer-reviewed article of this Preprint also exists.
Peng, Z.; He, W.; Li, Y.; Du, Y.; Dang, J. Multi-Level Attention-Based Categorical Emotion Recognition Using Modulation-Filtered Cochleagram. Appl. Sci. 2023, 13, 6749. Peng, Z.; He, W.; Li, Y.; Du, Y.; Dang, J. Multi-Level Attention-Based Categorical Emotion Recognition Using Modulation-Filtered Cochleagram. Appl. Sci. 2023, 13, 6749.
Abstract
Speech emotion recognition is a critical component for achieving natural human-robot interaction. The modulation-filtered cochleagram is a feature based on auditory modulation perception, which contains multi-dimensional spectral-temporal modulation representation. In this study, we propose an emotion recognition framework that utilizes a multi-level attention network to recognize emotions from the modulation-filtered cochleagram. The channel-level attention and spatial-level attention modules are used to capture emotional saliency maps of channel and spatial feature representations from the 3D convolution feature maps, respectively. Furthermore, the temporal-level attention module captures significant emotional regions from the concatenated feature sequence of the emotional saliency maps. Our experiments on the IEMOCAP dataset demonstrate that the modulation-filtered cochleagram significantly improves the prediction performance of categorical emotion compared to other evaluated features. Moreover, our emotion recognition framework achieves a better unweighted accuracy of 71% in categorical emotion recognition than several existing approaches in the experiments. In summary, our study demonstrates the effectiveness of the modulation-filtered cochleagram in speech emotion recognition, and our proposed multi-level attention framework provides a promising direction for future research in this field.
Keywords
Categorical emotion recognition; Auditory signal processing; Modulation-filtered cochleagram; Multi-level attention
Subject
Computer Science and Mathematics, Artificial Intelligence and Machine Learning
Copyright: This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Comments (0)
We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.
Leave a public commentSend a private comment to the author(s)
* All users must log in before leaving a comment