Abstract
Objective: Humans with their developed senses can easily ascertain a person’s gender just by listening to a few uttered words and it does not take any conscious additional effort to do so, however, a machine cannot do the same unless trained. This research proposes a real time system to identify a person’s gender from their voice.
Method: Features are extracted from the dataset and checked for outliers. Then a baseline classifier is constructed to measure performance of the different models. Next the dataset is prepared for training and five machine learning models, Decision Tree Classifier, Random Forest Classifier, K Nearest Neighbours, Support Vector Machine and Gaussian Naive Bayes Classifier are applied. Finally, real time prediction is done by taking speech input and analysing it against the trained model, after input of speech the gender along with accuracy of prediction is displayed within 1.37s.
Results: A maximum accuracy score of 88.19% is obtained using SVM. Additionally, the juxtaposition of the feature importance graph highlights the two most important features which fuel this classification. A combination of these features is then studied to design a less complex system and it is observed that using just MFCCs and Chroma Vector a near optimal accuracy score of 87.78% is obtained.
Conclusion: Identification of gender prior to applying speech recognition and emotion recognition algorithm can help in reduction of the search space. Further, using only MFCC and Chroma Vector can make the system memory efficient and yet provide near optimal accuracy. The system can be used as an authentication mechanism and can be installed in public places.
Keywords: Acoustic, classifier, mel frequency cepstral coefficient, gender, speech recognition, emotion recognition.
Graphical Abstract
[http://dx.doi.org/10.1177/002383099503800304] [PMID: 8816083]
[http://dx.doi.org/10.1044/jshr.1901.168]
[http://dx.doi.org/10.1121/1.2047107 PMID: 16334696]
[http://dx.doi.org/10.1016/S1364-6613(99)01319-4] [PMID: 10354575]
[PMID: 0278-7393]
[PMID: 9294954]
[http://dx.doi.org/10.1109/NNSP.1994.366038]
[http://dx.doi.org/10.1007/s10044-010-0178-6]
[http://dx.doi.org/10.1186/cc1455] [PMID: 11940268]
[http://dx.doi.org/10.1109/ISIT.2013.6620428]