Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Emotion Detection Based on Consecutive Facial by Combining CNN and LSTM Implemented on FPGA Chip

Version 1 : Received: 18 April 2024 / Approved: 19 April 2024 / Online: 22 April 2024 (12:21:52 CEST)

How to cite: Pan, S.; Wu, H. Emotion Detection Based on Consecutive Facial by Combining CNN and LSTM Implemented on FPGA Chip. Preprints 2024, 2024041307. https://doi.org/10.20944/preprints202404.1307.v1 Pan, S.; Wu, H. Emotion Detection Based on Consecutive Facial by Combining CNN and LSTM Implemented on FPGA Chip. Preprints 2024, 2024041307. https://doi.org/10.20944/preprints202404.1307.v1

Abstract

This paper proposed emotion recognition methods for consecutive facial images, and implements the inference of neural network model on a field-programmable gate array (FPGA). The proposed emotion recognition methods are based on a neural network model architecture that combines convolution neural networks (CNNs), long short-term memory (LSTM), and fully connected neural networks (FCNNs), called CLDNN or ConvLSTM-FCN. This type of neural network mod-el can analyze the local feature sequences obtained through convolution of data, making it suita-ble for processing time-series data such as consecutive facial images. In this paper, sequences of facial images are sampled from videos corresponding to various emotional state of subjects. The sampled images are then pre-processed with the processes includes facial detection, greyscale conversion, resize, and data augmentation if necessary. The 2-D CNN in ConvLSTM-FCN is used for feature extraction for these pre-processed facial images. These sequences of facial im-age features are time sequences with dependent properties between the elements within them. The LSTM is then used to model these time sequences followed by fully connected neural net-works (FCNNs) for classification. The proposed consecutive facial emotion recognition method achieves an average recognition rate of 99.51% on RAVDESS dataset, 87.80% on BAUM-1s dataset and 96.82% on eNTERFACE’05 data set, using 10-fold cross-validation on the PC. Some com-parisons of recognition accuracies between the proposed method and the other existing related works are conducted in this paper. According to the comparisons, the proposed emotion recog-nition methods outperform the existing related researches. The proposed emotion recognition methods are then implemented on an FPGA chip using the neural network model inference algo-rithms in this paper, and the accuracies of the experiments conducted on the FPGA chip are iden-tical to those obtained on the PC. This verifies that the proposed neural network model imple-mented on the FPGA chip performs well.

Keywords

Consecutive facial images; Emotion recognition; convolution neural networks (CNNs); Long short-term memory (LSTM); Field-programmable gate array (FPGA)

Subject

Computer Science and Mathematics, Signal Processing

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.