Innovative Video Classification Method based on Deep Learning
Approach

V.      Hemamalini; D.      Jayasutha; V. R.      Vinothini; R.   Manjula   Devi; A.      Kumar; E.      Anitha

doi:10.2174/0118722121248139231023111754

Abstract

Background: The automated classification of videos through artificial neural networks is addressed in this work. To explore the concepts and measure the results, the data set UCF101 is used, consisting of video clips taken from YouTube to recognize actions. The study is carried out with the authors' resources to determine the feasibility of independent research in the area.

Methods: This work was developed in the Python programming language using the Keras library with Tensorflow as the back-end. The objective is to develop a network that presents performance compatible with the state of the art in terms of classifying videos according to the actions taken.

Results: Given the hardware limitations, there is considerable distance between the implementation possibilities in this work and what is known as the state-of-the-art.

Conclusion: Throughout the work, some aspects in which this limitation influenced the development are presented, but it is shown that this realization is feasible and that obtaining expressive results is possible 98.6% accuracy is obtained in the UCF101 data set, compared to the 98 percentage points of the best result ever reported, using, however, considerably fewer resources. In addition, the importance of transfer learning in achieving expressive results as well as the different performances of each architecture are reviewed. Thus, this work may open doors to carry patent- based outcomes.

[1]
I. Jegham, A. Ben Khalifa, I. Alouani,  and M.A. Mahjoub, "Vision-based human action recognition: An overview and real world challenges", Forensic Sci. Int. Digit. Investig., vol. 32, p. 200901, 2020.
 [http://dx.doi.org/10.1016/j.fsidi.2019.200901]
[2]
C. Feichtenhofer, H. Fan, J. Malik,  and K. He, "Slowfast networks for video recognition", In: Proceedings of the IEEE/CVF International Conference on Computer Vision. Seoul, South Korea, pp. 6202-6211, 2019.
[3]
D.R. Beddiar, B. Nini, M. Sabokrou,  and A. Hadid, "Vision-based human activity recognition: A survey", Multimedia Tools Appl., vol. 79, no. 41-42, pp. 30509-30555, 2020.
 [http://dx.doi.org/10.1007/s11042-020-09004-3]
[4]
M. Monfort, C. Vondrick, A. Oliva, A. Andonian, B. Zhou, K. Ramakrishnan, S.A. Bargal, T. Yan, L. Brown, Q. Fan,  and D. Gutfreund, "Moments in time dataset: One million videos for event understanding", IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 2, pp. 502-508, 2020.
 [http://dx.doi.org/10.1109/TPAMI.2019.2901464] [PMID: 30802849]
[5]
L. Song, G. Yu, J. Yuan,  and Z. Liu, "Human pose estimation and its application to action recognition: A survey", J. Vis. Commun. Image Represent., vol. 76, p. 103055, 2021.
 [http://dx.doi.org/10.1016/j.jvcir.2021.103055]
[6]
J. Li, X. Liu, M. Zhang,  and D. Wang, "Spatio-temporal deformable 3D ConvNets with attention for action recognition", Pattern Recognit., vol. 98, p. 107037, 2020.
 [http://dx.doi.org/10.1016/j.patcog.2019.107037]
[7]
S. Majumder,  and N. Kehtarnavaz, "A review of real-time human action recognition involving vision sensing, Real-Time Image Processing and Deep Learning", Int. Society Optics Photonics, vol. 11736, p. 117360A, 2021.
[8]
Y. Zhou, X. Sun, C. Luo, Z-J. Zha,  and W. Zeng, "Spatiotemporal fusion in 3d cnns: A probabilistic view", In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, pp. 9829-9838, 2020. 
 [http://dx.doi.org/10.1109/CVPR42600.2020.00985]
[9]
M. Majd,  and R. Safabakhsh, "Correlational convolutional LSTM for human action recognition", Neurocomputing, vol. 396, pp. 224-229, 2020.
 [http://dx.doi.org/10.1016/j.neucom.2018.10.095]
[10]
Z. Qiu, T. Yao, C-W. Ngo, X. Tian,  and T. Mei, "Learning spatio-temporal representation with local and global diffusion", In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Long Beach, CA, USA, pp. 12056-12065, 2019. 
 [http://dx.doi.org/10.1109/CVPR.2019.01233]
[11]
G. Huang,  and A.G. Bors, "Learning spatio-temporal representations with temporal squeeze pooling", In: Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE: Barcelona, Spain, pp. 2103-2107, 2020.
 [http://dx.doi.org/10.1109/ICASSP40776.2020.9054200]
[12]
Q. Liu, X. Che,  and M. Bie, "R-stan: R-STAN: Rpublisheresidual spatial-temporal attention network for action recognition", IEEE Access, vol. 7, pp. 82246-82255, 2019.
 [http://dx.doi.org/10.1109/ACCESS.2019.2923651]
[13]
M.E. Kalfaoglu, S. Kalkan,  and A.A. Alatan, "Late temporal modeling in 3d cnn architectures with bert for action recognition", In: Proceedings of the European Conference on Computer Vision, Springer: Glasgow, Scotland, pp. 731-747, 2020.
 [http://dx.doi.org/10.1007/978-3-030-68238-5_48]
[14]
L. Chi, G. Tian, Y. Mu,  and Q. Tian, "Two-stream video classification with cross-modality attention", In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. 2019. Seoul, Korea.
[15]
X. Li, B. Shuai,  and J. Tighe, "Directional temporal modeling for action recognition", In: Proceedings of the European Conference on Computer Vision, Springer: Glasgow, Scotland, 2020, pp. 275-291.
[16]
J. Hong, B. Cho, Y. Hong,  and H. Byun, "Contextual action cues from camera sensor for multi-stream action recognition", Sensors, vol. 19, no. 6, p. 1382, 2019.
 [http://dx.doi.org/10.3390/s19061382] [PMID: 30897792]
[17]
J. Xu, R. Song, H. Wei, J. Guo, Y. Zhou,  and X. Huang, "A fast human action recognition network based on spatio-temporal features", Neurocomputing, vol. 441, pp. 350-358, 2021.
 [http://dx.doi.org/10.1016/j.neucom.2020.04.150]
[18]
A. Diba, M. Fayyaz,  and V. Sharma, "Large scale holistic video understanding", In: Proceedings of the European Conference on Computer Vision, Springer: Glasgow, Scotland, pp. 593-610, 2020.
[19]
Z. Liu, Z. Li, R. Wang, M. Zong,  and W. Ji, "Spatiotemporal saliency-based multi-stream networks with attention-aware LSTM for action recognition", Neural. Comput. Appl., vol. 32, no. 18, pp. 14593-14602, 2020.
 [http://dx.doi.org/10.1007/s00521-020-05144-7]
[20]
X. Wang, L. Gao, P. Wang, X. Sun,  and X. Liu, "Two-stream 3D convNet fusion for action recognition in videos with arbitrary size and length", IEEE Trans. Multimed., vol. 20, no. 3, pp. 634-644, 2018.
 [http://dx.doi.org/10.1109/TMM.2017.2749159]
[21]
S. Li, Y. Sheng, Z. Luo, H. Min, S. Su,  and D. Cao, "Human behavior recognition method based on deep neural network", U.S. Patent 109919031B, 2019.
[22]
W.Q. Tianlianfang, W. Qichao,  and D. Qiliang, "23Method for detecting and identifying abnormal behaviors of multiple persons based on machine vision", U.S. Patent 109522793B, 2015.
[23]
 https://www.crcv.ucf.edu/data/UCF101.php

Rights & Permissions Print Cite

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/0118722121248139231023111754	Print ISSN 1872-2121
Publisher Name Bentham Science Publisher	Online ISSN 2212-4047

Recent Patents on Engineering

Innovative Video Classification Method based on Deep Learning Approach

Abstract Play Pause

Related Journals

Related Books

Abstract