Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Exploring Prosodic Features Modelling for Secondary Emotions Needed for Empathetic Speech Synthesis

Version 1 : Received: 16 December 2022 / Approved: 3 January 2023 / Online: 3 January 2023 (07:29:37 CET)

A peer-reviewed article of this Preprint also exists.

James, J.; B.T., B.; Watson, C.; Mixdorff, H. Exploring Prosodic Features Modelling for Secondary Emotions Needed for Empathetic Speech Synthesis. Sensors 2023, 23, 2999. James, J.; B.T., B.; Watson, C.; Mixdorff, H. Exploring Prosodic Features Modelling for Secondary Emotions Needed for Empathetic Speech Synthesis. Sensors 2023, 23, 2999.

Abstract

A low-resource emotional speech synthesis system for empathetic speech synthesis based on modelling prosody features is presented here. Secondary emotions, identified to be needed for empathetic speech, are modelled and synthesised in this paper. As secondary emotions are subtle in nature, they are difficult to model compared to primary emotions. They are also less explored, and this is one of the few studies that model secondary emotions in speech. Current speech synthesis research uses large databases and deep learning techniques to develop emotion models. There are many secondary emotions, and hence, developing large databases for each of the secondary emotions is expensive. This research presents a proof-of-concept using hand-crafted feature extraction and modelling of these features using a low resource-intensive machine learning approach, thus creating synthetic speech with secondary emotions. Here, a quantitative model-based transformation is used to shape the emotional speech fundamental frequency contour. Speech rate and mean intensity are modelled via rule-based approaches. Using these models, an emotional text-to-speech synthesis system to synthesise five secondary emotions - anxious, apologetic, confident, enthusiastic and worried is developed. A perception test to evaluate the synthesised emotional speech is also conducted.

Keywords

Secondary emotions; emotional speech synthesis; fundamental frequency contour; Fujisaki model; low-resource; empathetic speech

Subject

Engineering, Electrical and Electronic Engineering

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.