A Deep Convolutional Neural Network Based Approach for Effective
Neonatal Cry Classification

K       Ashwini; P.M.   Durai Raj   Vincent

doi:10.2174/2666255813999200710135408

Abstract

Cry is the universal language of babies to communicate with others. Infant cry classification is a kind of speech recognition problem that should be treated wisely. In the last few years, it has been gaining momentum and if it is researched in depth will be of help for caretakers and the community at large.

Objective: This study aims to develop an infant cry classification system predictive model by converting audio signals into spectrogram image through deep convolutional neural network. The network performs end to end learning processes, thereby reducing the complexity involved in audio signal analysis and improving the performance using optimisation technique.

Method: A time frequency-based analysis called Short Time Fourier Transform (STFT) is applied to generate the spectrogram. 256 DFT (Discrete Fourier Transform) points are considered to compute the Fourier transform. A deep convolutional neural network called AlexNet with few enhancements is utilised in this work to classify the recorded infant cry. To improve the effectiveness of the above mentioned neural network, Stochastic Gradient Descent with Momentum (SGDM) is used to train the algorithm.

Results: A deep neural network-based infant cry classification system achieves a maximum accuracy of 95% in the classification of sleepy cries. The result shows that convolutional neural network with SGDM optimisation acquires higher prediction accuracy.

Conclusion: This proposed work has been compared with the convolutional neural network with SGD and Naïve Bayes and based on the result, it is implied the convolutional neural network with SGDM performs better than the other techniques.

Keywords: Infant cry classification, Spectrogram, STFT, SGDM, AlexNet, Deep Convolutional Neural Network.

Graphical Abstract

[1] 
A. Chittora,  and H.A. Patil, "Classification of Pathological Infant Cries using Modulation Spectrogram", In: 9th International Symposium on Chinese Spoken Language Processing, 2014, pp. 541-545.
[http://dx.doi.org/10.1109/ISCSLP.2014.6936626] 
[2] 
J. Saraswathy, M. Hariharan,  and N. Wan Khairunizan, "Thiyagar, “Infant Cry Classification: Time Frequency Analysis", In: 2013 International Conference on Control System, Computing and Engeering, 2013, pp. 499-504.
[3] 
F.Q. Lauzon, "An Introduction to Deep Learning", In: 2012 11th
IEEE International Conference on Information Science, Signal Processing and their Applications, 2012, pp. 1438-1439.
[4] 
Y. Abdulaziz,  and S.M. Ahmad, "An accurate infant cry classification system based on continuous Hidden Markov Model", In: International Symposium on Information Technology, 2010, pp. 1648-1652.
[5] 
S.F. Molaeezadeh, M. Salarian,  and M.H. Moradi, "Type 2 Fuzzy Pattern Matching for classifying hunger, pain cries of healthy full term infants", In: 2012 16th CSI International Symposium on Artificial Intelligence and Signal Processing, 2012, pp. 233-237.
[http://dx.doi.org/10.1109/AISP.2012.6313750] 
[6] 
Y.D. Rosita,  and H. Junaedi, "Infants cry sound classification using Mel frequency Cepstrum Coefficient feature extraction and back propagation neural network", In: 2016 2nd International Conference on Science and Technology Computer, 2016, pp. 160-166.
[7] 
A.K. Singh, J. Mukhopadhyay,  and K.S. Rao, "Classification of Infant Cries using Epoch and Spectral Features", In: 2013 National Conference on Communications, 2013, pp. 1-5.
[http://dx.doi.org/10.1109/NCC.2013.6487999] 
[8] 
L. Liu, Y. Li,  and K. Kuo, "Infant Cry Signal Detection, Pattern Extraction and Recognition", In: 2018 IEEE International Conference on Information and Computer Technologies, 2018, pp. 159-163.
[http://dx.doi.org/10.1109/INFOCT.2018.8356861] 
[9] 
G. Aceto, D. Ciuonzo, A. Montieri,  and A. Pescape, "Mobile encrypted traffic classification using deep learning: Experimental evaluation, lessons learned, and challenges", IEEE eTrans. Netw. Serv. Manag., vol. 16, no. 2, pp. 445-458, 2019.
[http://dx.doi.org/10.1109/TNSM.2019.2899085] 
[10] 
G. Aceto, D. Ciuonzo, A. Montieri,  and A. Pescape, "Mobile encrypted traffic classification using deep learning", In: 2018 Network
traffic Measurement and Analysis Conference (TMA), vol. 165. 2018, pp. 1-8.
[11] 
G. Aceto, D. Ciuonzo, A. Montieri, V. Persico,  and A. Pescape, "Know your Big Data Trade-offs when Classifying Encrypted Mobile Traffic with Deep Learning", In: 2019 IEEE Network Traffic Measurement and Analysis Conference (TMA), 2019, pp. 121-128.
[http://dx.doi.org/10.23919/TMA.2019.8784565] 
[12] 
V. Thai, J. Cheng, V. Nguyen,  and P. Daothi, "Optimizing SVM’s parameters based on backtracking search optimization algorithm for gear fault diagnosis", J. Vibroeng., vol. 21, no. 1, pp. 66-81, 2019.
[http://dx.doi.org/10.21595/jve.2018.19859] 
[13] 
H-X. Tian, D-X. Ren,  and K. Li, "A Hybrid Vibration Signal Prediction Model Using Autocorrelation Local Characteristic-Scale Decomposition and Improved Long Short Term Memory", IEEE Access, vol. 7, pp. 60995-61007, 2019.
[http://dx.doi.org/10.1109/ACCESS.2019.2916000] 
[14] 
"H. Ao, T. N. Thoi, V. H. Huu and L. Anh Le, "Backtracking search
optimization algorithm and its application to roller bearing fault diagnosis", Int. J. Acoust. Vib., vol. 21, no. 4, pp. 445-452, 2016.
[15] 
F. Xu,  and W.T. Peter, "A method combining refined composite multiscale fuzzy entropy with PSO-SVM for roller bearing fault diagnosis", J. Cent. South Univ., vol. 26, no. 9, pp. 2404-2417, 2019.
[http://dx.doi.org/10.1007/s11771-019-4183-7] 
[16] 
T. Thelaidjia, A. Moussaoui,  and S. Chenikher, "Bearing fault diagnosis based on independent component analysis and optimized support vector machine", In: 2015 7th International Conference on
Modelling, Identification and Control (ICMIC), 2015, pp. 1-4.
[http://dx.doi.org/10.1109/ICMIC.2015.7409362] 
[17] 
L. Zhou, Z. Sun,  and W. Wang, "Learning to short time Fourier transform in spectrum sensing", Phys. Commun., vol. 25, pp. 420-425, 2017.
[http://dx.doi.org/10.1016/j.phycom.2017.08.007] 
[18] 
S. Ouelha, S. Touati,  and B. Boashah, "An efficient inverse short
time Fourier transform algorithm for improved signal reconstruction
by time frequency synthesis: Optimality and computational issues", Digital Signal Processing, vol. 65, pp. 81-93, 2017.
[19] 
"W. K. Lu and Q. Zhang, “Deconvolutive Short Time Fourier Transform Spectrogram", IEEE Signal Process. Lett., vol. 16, no. 7, pp. 576-579, 2009.
[http://dx.doi.org/10.1109/LSP.2009.2020887] 
[20] 
C.H. Chou,  and B.J. Liao, "Music genre classification by analyzing the sub band spectrogram", In: 2014 International Conference on Information Science, Electronics and Electrical Engineering, vol. 3. 2014, pp. 1677-1680.
[http://dx.doi.org/10.1109/InfoSEEE.2014.6946207] 
[21] 
R. Decorsiere, P.L. Sondergaard, E.N. MacDonald,  and T. Dau, "Inversion of Auditory Spectrograms, Traditional Spectrograms and Other Envelope Representations", IEEE/ACM Trans. Audio Speech Lang. Process., vol. 23, no. 1, pp. 46-56, 2015.
[22] 
P. Flandrin, "Time-Frequency Filtering Based on Spectrogram Zeros", IEEE Signal Process. Lett., vol. 22, no. 11, pp. 2137-2141, 2015.
[http://dx.doi.org/10.1109/LSP.2015.2463093] 
[23] 
G.T. Beareguard, M. Harish,  and L. Wyse, "Single Pass Spectrogram Inversion", IEEE International Conference on Digital Signal Processing, pp. 427-431, 2015.
[24] 
X. Wang, X. Shi, D. Yang,  and Y. Zhou, "Research on the application of 3D spectrogram in bird tweet and speech signals", In: 2017 29th Chinese Control and Decision Conference, 2017, pp. 7744-7747.
[http://dx.doi.org/10.1109/CCDC.2017.7978597] 
[25] 
M.S. Towhid,  and M.M. Rahman, "Spectrogram segmentation for bird species classification base on temporal continuity", In: 2017
20th IEEE International Conference of Computer and Information
Technology (ICCIT), 2018, pp. 1-4.
[26] 
"M. Mulimani, U. P. Jahnavi and S. G. Koolagudi, "Acoustic event
classification using graph signals", In: 2017 10th Annual International Conference Proceedings, TENCON, 2017, pp. 1460-1464.
[27] 
T.A. Lampert,  and S.E. Keefe, "On the detection of tracks in spectrogram images", Pattern Recognition, vol. 46, no. 5, pp. 1396-1408, 2013.
[http://dx.doi.org/10.1016/j.patcog.2012.11.009] 
[28] 
"Y. M. Costa, L. S. Oliveira and C. N. Silla, “An evaluation of COnvolutional Neural Networks for music classification using spectrograms", Appl. Soft Comput, vol. 52, pp. 28-38, 2017.
[29] 
F. Leonard, Phase spectrogram and frequency spectrogram as new
diagnostic toolsMech. Syst. Signal Process, vol. 21, pp. 125-137, 2007.
[30] 
T.A. Lampert,  and S.E.M. O’Keffe, "A detailed investigation into low level feature detection in spectrogram images", Pattern Recognition, vol. 44, no. 9, pp. 20176-2092, 2011.
[http://dx.doi.org/10.1016/j.patcog.2011.02.014] 
[31] 
K. O’Shea,  and R. Nash, "An introduction to convolutional neural
network", Neural Evol. Comput, 2015.
[32] 
J. Ahmad, H. Farman,  and Z. Jan, "Deep Learning Methods and Applications", Springer Briefs in Computer Science, vol. 7, pp. 31-42, 2019.
[http://dx.doi.org/10.1007/978-981-13-3459-7_3] 
[33] 
O.A. Hamid,  and D.Y. Li, "Exploring Convolutional Neural Network Structures and Optimization Techniques for Speech Recognition", In: Interspeech, 2013, pp. 3366-3370.
[34] 
A. Krizhevsky, I. Sutskever,  and G.E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks", Adv. Neural Inf. Process. Syst., vol. 25, pp. 1106-1114, 2012.
[35] 
O. Russakovsky, "J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z.
Huang, A. Karpathy, A. Khosla, M. Bernstein and A. C. Berg, “Image Net Large Scale Visual Recognition Challenge", Int. J. Comput. Vis., vol. 115, no. 3, pp. 211-252, 2015.
[http://dx.doi.org/10.1007/s11263-015-0816-y] 
[36] 
E. Yazan,  and M. Faith Talu, "Comparison of the stochastic gradient descent based optimization technique", In: 2017 IEEE International Artificial Intelligence and Data Processing symposium (IDAP), 2017, pp. 1-5.
[http://dx.doi.org/10.1109/IDAP.2017.8090299] 
[37] 
R.G.J. Wijnhoven,  and P.H.N. de With, "Fast Training of Object Detection Using Stochastic Gradient Descent", In: 2010 20th IEEE International Conference on Pattern Recognition, 2010, pp. 424-427.
[http://dx.doi.org/10.1109/ICPR.2010.112] 
[38] 
L. Chen,  and J. Wang, "Dictionary learning with weighted stochastic gradient descent", In: 2012 IEEE International Conference on Computational Problem Solving, 2012, pp. 9-12.
[39] 
I. Hegedus,  and M. Jelasity, "Distributed Differentially Private Stochastic Gradient descent: An Empirical Study", In: 24th Euromicro International Conference on Parallel, Distributed and Network Based Processing, 2016, pp. 566-573.
[http://dx.doi.org/10.1109/PDP.2016.19] 
[40] 
G. Cong, O. Bhardwaj,  and M. Feng, "An Efficient, Distributed Stochastic Gradient Descent Algorithm for Deep Learning Applications", In: 2017 46th International Conference on Parallel Processing, 2017, pp. 11-20.
[http://dx.doi.org/10.1109/ICPP.2017.10] 
[41] 
T. Watanabe,  and H. Iima, "Nonlinear Optimization Method Based on Stochastic Gradient Descent for Fast Convergence", In: 2019 IEEE International Conference on Systems, Man and Cybernatics, 2019, pp. 4198-4203.
[42] 
D. Newton, R. Pasupathy,  and F. Yousefian, "Recent trends in stochastic
gradient descent for machine learning and big Data", In: 2018 IEEE Winter Simulation Conference (WSC), 2019, pp. 366-380.

Rights & Permissions Print Cite

Article Metrics

15

2

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/2666255813999200710135408	Print ISSN 2666-2558
Publisher Name Bentham Science Publisher	Online ISSN 2666-2566

Recent Advances in Computer Science and Communications

A Deep Convolutional Neural Network Based Approach for Effective Neonatal Cry Classification

Abstract Play Pause

Graphical Abstract

Related Journals

Related Books

Abstract