Abstract
Cry is the universal language of babies to communicate with others. Infant cry classification is a kind of speech recognition problem that should be treated wisely. In the last few years, it has been gaining momentum and if it is researched in depth will be of help for caretakers and the community at large.
Objective: This study aims to develop an infant cry classification system predictive model by converting audio signals into spectrogram image through deep convolutional neural network. The network performs end to end learning processes, thereby reducing the complexity involved in audio signal analysis and improving the performance using optimisation technique.
Method: A time frequency-based analysis called Short Time Fourier Transform (STFT) is applied to generate the spectrogram. 256 DFT (Discrete Fourier Transform) points are considered to compute the Fourier transform. A deep convolutional neural network called AlexNet with few enhancements is utilised in this work to classify the recorded infant cry. To improve the effectiveness of the above mentioned neural network, Stochastic Gradient Descent with Momentum (SGDM) is used to train the algorithm.
Results: A deep neural network-based infant cry classification system achieves a maximum accuracy of 95% in the classification of sleepy cries. The result shows that convolutional neural network with SGDM optimisation acquires higher prediction accuracy.
Conclusion: This proposed work has been compared with the convolutional neural network with SGD and Naïve Bayes and based on the result, it is implied the convolutional neural network with SGDM performs better than the other techniques.
Keywords: Infant cry classification, Spectrogram, STFT, SGDM, AlexNet, Deep Convolutional Neural Network.
Graphical Abstract
[http://dx.doi.org/10.1109/ISCSLP.2014.6936626]
[http://dx.doi.org/10.1109/AISP.2012.6313750]
[http://dx.doi.org/10.1109/NCC.2013.6487999]
[http://dx.doi.org/10.1109/INFOCT.2018.8356861]
[http://dx.doi.org/10.1109/TNSM.2019.2899085]
[http://dx.doi.org/10.23919/TMA.2019.8784565]
[http://dx.doi.org/10.21595/jve.2018.19859]
[http://dx.doi.org/10.1109/ACCESS.2019.2916000]
[http://dx.doi.org/10.1007/s11771-019-4183-7]
[http://dx.doi.org/10.1109/ICMIC.2015.7409362]
[http://dx.doi.org/10.1016/j.phycom.2017.08.007]
[http://dx.doi.org/10.1109/LSP.2009.2020887]
[http://dx.doi.org/10.1109/InfoSEEE.2014.6946207]
[http://dx.doi.org/10.1109/LSP.2015.2463093]
[http://dx.doi.org/10.1109/CCDC.2017.7978597]
[http://dx.doi.org/10.1016/j.patcog.2012.11.009]
[http://dx.doi.org/10.1016/j.patcog.2011.02.014]
[http://dx.doi.org/10.1007/978-981-13-3459-7_3]
[http://dx.doi.org/10.1007/s11263-015-0816-y]
[http://dx.doi.org/10.1109/IDAP.2017.8090299]
[http://dx.doi.org/10.1109/ICPR.2010.112]
[http://dx.doi.org/10.1109/PDP.2016.19]
[http://dx.doi.org/10.1109/ICPP.2017.10]