Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Speech & Song Emotion Recognition Using Multilayer Perceptron and Standard Vector Machine

Version 1 : Received: 18 May 2021 / Approved: 19 May 2021 / Online: 19 May 2021 (12:53:55 CEST)

How to cite: Javaheri, B. Speech & Song Emotion Recognition Using Multilayer Perceptron and Standard Vector Machine. Preprints 2021, 2021050441 (doi: 10.20944/preprints202105.0441.v1). Javaheri, B. Speech & Song Emotion Recognition Using Multilayer Perceptron and Standard Vector Machine. Preprints 2021, 2021050441 (doi: 10.20944/preprints202105.0441.v1).

Abstract

herein, we have compared the performance of SVM and MLP in emotion recognition using speech and song channels of the RAVDESS dataset. We have undertaken a journey to extract various audio features, identify optimal scaling strategy and hyperparameter for our models. To increase sample size, we have performed audio data augmentation and addressed data imbalance using SMOTE. Our data indicate that optimised SVM outperforms MLP with an accuracy of 82 compared to 75%. Following data augmentation, the performance of both algorithms was identical at ~79%, however, overfitting was evident for the SVM. Our final exploration indicated that the performance of both SVM and MLP were similar in which both resulted in lower accuracy for the speech channel compared to the song channel. Our findings suggest that both SVM and MLP are powerful classifiers for emotion recognition in a vocal-dependent manner.

Subject Areas

emotion recognition; MLP; SVM; RAVDESS

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our diversity statement.

Leave a public comment
Send a private comment to the author(s)
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.