Spontaneous Emotion Recognition From Audio-Visual Signals

Abstract

This chapter introduces an emotion recognition system based on audio and video cues. For audio-based emotion recognition, we have explored various aspects of feature extraction and classification strategy and found that wavelet analysis is sound. We have shown comparative results for discriminating capabilities of various combinations of features using the Fisher Discriminant Analysis (FDA). Finally, we have combined the audio and video features using a feature-level fusion approach. All the experiments are performed with eNTERFACE and RML databases. Though we have applied multiple classifiers, SVM shows significantly improved performance with a single modality and fusion. The results obtained using fusion outperformed in contrast results based on a single modality of audio or video. We can conclude that fusion approaches are best as it is using complementary information from multiple modalities.

Cite as

Multimodal Affective Computing: Affective Information Representation, Modelling, and Analysis

Spontaneous Emotion Recognition From Audio-Visual Signals

Abstract