A Novel Heterogeneous Parallel Convolution Bi-LSTM for Speech Emotion Recognition

Huiyun Zhang; Heming Huang; Henry Han

doi:10.20944/preprints202108.0433.v1

Submitted:

20 August 2021

Posted:

23 August 2021

You are already at the latest version

Abstract

Speech emotion recognition remains a heavy lifting in natural language processing. It has strict requirements to the effectiveness of feature extraction and that of acoustic model. With that in mind, a Heterogeneous Parallel Convolution Bi-LSTM model is proposed to address these challenges. It consists of two heterogeneous branches: the left one contains two dense layers and a Bi-LSTM layer, while the right one contains a dense layer, a convolution layer, and a Bi-LSTM layer. It can exploit the spatiotemporal information more effectively, and achieves 84.65%, 79.67%, and 56.50% unweighted average recall on the benchmark databases EMODB, CASIA, and SAVEE, respectively. Compared with the previous research results, the proposed model achieves better performance stably.

Keywords:

Speech emotion recognition

;

Feature extraction

;

Heterogeneous parallel network

;

Spectral features

;

Prosodic features

;

Multi-feature fusion

Subject:

Computer Science and Mathematics - Computer Science

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

A Novel Heterogeneous Parallel Convolution Bi-LSTM for Speech Emotion Recognition

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe