Synthesis of Emotional Speech by Prosody Modification of Vowel Segments of Neutral Speech

Md   Shah   Fahad; Shreya       Singh; Shruti       Gupta; Akshay       Deepak; Abhinav

doi:10.2174/2213275912666191112144014

Abstract

Background: Emotional speech synthesis is the process of synthesising emotions in a neutral speech – potentially generated by a text-to-speech system – to make an artificial humanmachine interaction human-like. It typically involves analysis and modification of speech parameters. Existing work on speech synthesis involving modification of prosody parameters does so at sentence, word, and syllable level. However, further fine-grained modification at vowel level has not been explored yet, thereby motivating our work.

Objective: To explore prosody parameters at vowel level for emotion synthesis.

Methods: Our work modifies prosody features (duration, pitch, and intensity) for emotion synthesis. Specifically, it modifies the duration parameter of vowel-like and pause regions and the pitch and intensity parameters of only vowel-like regions. The modification is gender specific using emotional speech templates stored in a database and done using Pitch Synchronous Overlap and Add (PSOLA) method.

Results: Comparison was done with the existing work on prosody modification at sentence, word and syllable label on IITKGP-SEHSC database. Improvements of 8.14%, 13.56%, and 2.80% for emotions angry, happy, and fear respectively were obtained for the relative mean opinion score. This was due to: (1) prosody modification at vowel-level being more fine-grained than sentence, word, or syllable level and (2) prosody patterns not being generated for consonant regions because vocal cords do not vibrate during consonant production.

Conclusion: Our proposed work shows that an emotional speech generated using prosody modification at vowel-level is more convincible than prosody modification at sentence, word and syllable level.

Keywords: Duration, emotional speech, intensity, pitch, PSOLA, prosody modification, vowel onset-offset points.

Graphical Abstract

Rights & Permissions Print Cite

Article Metrics

4

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/2213275912666191112144014	Print ISSN 2666-2558
Publisher Name Bentham Science Publisher	Online ISSN 2666-2566

Recent Advances in Computer Science and Communications

Synthesis of Emotional Speech by Prosody Modification of Vowel Segments of Neutral Speech

Abstract Play Pause

Graphical Abstract

Related Journals

Related Books

Abstract