Generic placeholder image

Recent Advances in Computer Science and Communications

Editor-in-Chief

ISSN (Print): 2666-2558
ISSN (Online): 2666-2566

General Research Article

Real Time Acoustic Feature Analysis for Gender Identification

Author(s): Vanita Jain*, Paras Chhabra and Mansi Bhardwaj

Volume 13, Issue 2, 2020

Page: [275 - 282] Pages: 8

DOI: 10.2174/2213275912666190228112838

Price: $65

Abstract

Objective: Humans with their developed senses can easily ascertain a person’s gender just by listening to a few uttered words and it does not take any conscious additional effort to do so, however, a machine cannot do the same unless trained. This research proposes a real time system to identify a person’s gender from their voice.

Method: Features are extracted from the dataset and checked for outliers. Then a baseline classifier is constructed to measure performance of the different models. Next the dataset is prepared for training and five machine learning models, Decision Tree Classifier, Random Forest Classifier, K Nearest Neighbours, Support Vector Machine and Gaussian Naive Bayes Classifier are applied. Finally, real time prediction is done by taking speech input and analysing it against the trained model, after input of speech the gender along with accuracy of prediction is displayed within 1.37s.

Results: A maximum accuracy score of 88.19% is obtained using SVM. Additionally, the juxtaposition of the feature importance graph highlights the two most important features which fuel this classification. A combination of these features is then studied to design a less complex system and it is observed that using just MFCCs and Chroma Vector a near optimal accuracy score of 87.78% is obtained.

Conclusion: Identification of gender prior to applying speech recognition and emotion recognition algorithm can help in reduction of the search space. Further, using only MFCC and Chroma Vector can make the system memory efficient and yet provide near optimal accuracy. The system can be used as an authentication mechanism and can be installed in public places.

Keywords: Acoustic, classifier, mel frequency cepstral coefficient, gender, speech recognition, emotion recognition.

Graphical Abstract

[1]
L. Muda, M. Begam, and I. Elamvazuthi, "Voice recognition algorithms using mel frequency cepstral coefficient and dynamic time warping techniques", J. Comput., vol. 2, no. 3, pp. 138-143, March 2010.
[2]
W.A. van Dommelen, and B.H. Moxness, "Acoustic parameters in speaker height and weight identification: sex-specific behaviour", Lang. Speech, vol. 38, no. Pt 3, pp. 267-287, 1995.
[http://dx.doi.org/10.1177/002383099503800304] [PMID: 8816083]
[3]
K.N. Stevens, Acoustic Phonetics., MIT Press: Cambridge, Massachusetts, 2000.
[4]
R.O. Coleman, "A comparison of the contributions of two voice quality characteristics to the perception of maleness and femaleness in the voice", J. Speech Hear. Res.. Vol. 19, pp. 168-180, 1976
[http://dx.doi.org/10.1044/jshr.1901.168]
[5]
L. Thurman, and G. Welch, Bodymind & voice: Foundations of voice education., VoiceCare Network, 2000.
[6]
"D.R.R., Smith, R.D. Patterson, “The interaction of glottal-pulse rate and vocal-tract length in judgements of speaker size, sex, and age",", J. Acoust. Soc. Am.. Vol. 118, Mo. 5, pp. 3177-3186, 2005
[http://dx.doi.org/10.1121/1.2047107 PMID: 16334696]
[7]
W.J. Levelt, Models of word productionTrends Cogn. Sci.(Regul. Ed.), . Vol. 3, No. 6, pp. 223-232, 1999.
[http://dx.doi.org/10.1016/S1364-6613(99)01319-4] [PMID: 10354575]
[8]
W.J.M. Levelt, "Models of word production", in Trends Cogn. Sci. , vol. 3, no. 6, pp. 223-232, 1999.
[9]
J.D. Jescheniak, and W.J.M. Levelt, "Word frequency effects in speech production: Retrieval of syntactic information and of phonological form", J. Exp. Psychol. Learn. Mem. Cogn.. Vol. 20, No.4, pp. 824-843, 1994
[PMID: 0278-7393]
[10]
W.J.M. Levelt, Producing spoken language: a blueprint of the speaker, the neurocognition of language.. Oxford Press, Chapter 4,pp. 87-117,1999
[11]
G. Keramidas, N. Voros, and M. Hubner, "Components and Services for IoT Platforms” Paving the Way for IoT Standards.Springer International:", Pu. 2016
[12]
K. Aizawa, Y. Nakamura, S. Shinichi, and M. Xu, "HMM-based audio keyword generation", In: Pacific-Rim Conference on Multimedia, Springer: Berlin, Heidelberg, 2004, pp. 566-574.
[13]
A. Dobra, Decision tree classification.Encyclopedia of Database Systems.. L. Liu., M.T. Özsu Eds.; Springer, 2009, pp. 765-769.
[14]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, and J. Vanderplas, "Scikit-learn: Machine learning in python", JMLR, vol. 12, pp. 2825-2830, 2011.
[15]
A. Liaw, and M. Wiener, "classification and regression by random forest,", R. News. Vol. 2, pp. 18-22, 1609-3631, 2002.
[16]
L. Breiman, "“Random forests, machine learning statistics”", Department University of California Berkeley. Kluwer Academic Publishers:Hingham, Vol. 45, No. 1, pp. 5-32, 2001.
[17]
T.L. Daniel, "k-Nearest Neighbor Algorithm Discovering Knowledge in Data An Introduction to Data Mining", John Wiley & Sons,. Inc. 0-471-66657-2 ,pp. 90-106., 2005.
[18]
M. Adankon, and M. Cheriet, "support vector machine", In: Encyclo.of Biometrics. Springer: Boston, MA, 2009, pp. 1303-1308.
[19]
M. Friedman, D. Geiger, and M. Goldszmidt, "Helen Brooke Taussig, M.D.: the original pediatric cardiologist", Md. Med. J., vol. 46, no. 8, pp. 445-447, 1997.
[PMID: 9294954]
[20]
E.S. Parris, and M.J. Carey, "Language independent gender identification ", IEEE Inter. Confer. on Acoust., Speech, and Sig. Process.Confer. Proceed., Atlanta,. Vol. 2, 1996pp. 685-688 GA
[21]
K-H. Lee, S-I. Kang, D-H. Kim, and J-H. Chang, "A Support Vector Machine-Based Gender Identification Using Speech Signal", IEICE Transactions on Communications. Vol 91.B, no. Issue 10, pp.3326-3329, 2010
[22]
"Slomka, and S. Sridharan, “Automatic gender identification optimised for language inde-pendence”, TENCON ’97, IEEE Region 10 Annual Conference, Speech and Image Tech. for Comput. and Telecomm", Proceed. of. IEEE, vol. 1, pp. 145-148, 1997.
[23]
N. Mirghafori, N. Morgan, and H. Bourlard, "Parallel training of MLP probability estimators for speech recognition: a gender-based approach", Proceedings of IEEE Workshop on NNSP, pp. 289-298.Ermioni 1994
[http://dx.doi.org/10.1109/NNSP.1994.366038]
[24]
M. Liwicki, A. Schlapbach, and H. Bunke, "Automat-ic Gender Detection Using on-Line and Off-Line Information", Pattern Anal. Appl., vol. 14, no. 1, pp. 87-92, 2011.
[http://dx.doi.org/10.1007/s10044-010-0178-6]
[25]
D.A. Afshordi, L.J. Shih, and W.B. Theodore, "Automatic gender identification", J. Acoust. Soc. Am., vol. 109, no. 5, 2001.
[26]
E. Yücesoy, and V.V. Nabiyev, "Gender identification of a speaker using MFCC and GMM", 8th (ELECO). 626-629 Bursa, 2013
[27]
H. Ting, Y. Yingchun, and W. Zhaohui, "”Combining MFCC and Pitch to Enhance the Performance of the Gender Recognition", ICSP2006. vol. Vol. 1,pp. 16-20, 2006.
[28]
V. Ahirkar, and B. Naveen, Gender Recognition using AAAI. 2017
[29]
L. Feng, "Speaker Recognition, “English language speech database for speaker recognition”", ELSDSR Inform. and Math. Model.. Technical University of Denmark, DTU, 2004.
[30]
T. Giannakopoulos, "py. Audio Analysis: An open-source python library for audio signal analysis", PLoS One. Vol. 10, No. 12, 2015.
[31]
L. Sunitha, "M. BalRaju, J.Sasikiran and E.Venkat Ramana, “Automatic outlier identification in data mining using IQR in real-time data", Int. J. Adv. Res. Comput. Commun. Eng., vol. 3, no. 6, 2014.
[32]
E. Whitley, and J. Ball, "Statistics review 1: presenting and summarising data", Crit. Care, vol. 6, no. 1, pp. 66-71, 2002.
[http://dx.doi.org/10.1186/cc1455] [PMID: 11940268]
[33]
M.M. Christiansen, K.R. Duffy, F. du Pin Calmon, and M. Mé-dard, Brute force searching, the typical set and Guesswork., IEEE, pp. 1257-1261. 2013
[http://dx.doi.org/10.1109/ISIT.2013.6620428]

Rights & Permissions Print Cite
© 2025 Bentham Science Publishers | Privacy Policy