Generic placeholder image

Recent Patents on Mechanical Engineering

Editor-in-Chief

ISSN (Print): 2212-7976
ISSN (Online): 1874-477X

Review Article

Research Progress on the Human Behavior Recognition Based on Machine Learning Methods

Author(s): Wenjin Yu, Peijian Zhou*, Lingfeng Shu, Shang Wei, Chenglong Jiang and Haisheng Zheng

Volume 16, Issue 1, 2023

Published on: 10 October, 2022

Page: [32 - 44] Pages: 13

DOI: 10.2174/2212797615666220827164210

Price: $65

Abstract

Background: Machine vision has been used in the industrial automation system for a long time. It also plays a significant role in the field of human behavior recognition. Behavior recognition based on machine vision, such as object tracking, motion detection and crime recognition, greatly broadens the application field of artificial intelligence and has a good application prospect.

Objective: We summarize the latest applications of various machine learning algorithms in human behavior recognition, and analyze the accuracy of various algorithms combined with data sets, so as to provide reference for researchers in related fields.

Methods: By sorting out the typical research results, briefly expound on the application of machine learning in the field of behavior recognition in recent years. This review focuses on the Two Stream Network structure, TSN structure, LSTM network and C3D network.

Results: This paper analyzes the principles, advantages and disadvantages of various human behavior recognition methods, and briefly discusses the future development direction.

Conclusion: The wide application prospect of behavior recognition and detection makes it a hot research direction in the field of computer vision, and greatly improves the accuracy of complex human motion recognition combined with deep learning. However, it still faces many difficulties, such as insufficient discrimination of violence attributes, difficult collection, verification of special action data and insufficient hardware computing resources, etc.

Keywords: Deep learning, Human behavior recognition, Convolutional Neural Network, Human behavior data set

[1]
Chou CJ, Chien JT, Chen HT. Self adversarial training for human pose estimation. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference. Honolulu. 2018.
[2]
Johansson G. Visual perception of biological motion and a model for its analysis. Percept Psychophys 1973; 14(2): 201-11.
[http://dx.doi.org/10.3758/BF03212378]
[3]
Aggarwal JK, Cai Q. Human motion analysis: A review. Comput Vision Image Understanding 1990; 73(3): 428-40.
[4]
Moeslund TB, Granum E. A survey of computer vision-based human motion capture. Comput Vis Image Underst 2001; 81(3): 231-68.
[http://dx.doi.org/10.1006/cviu.2000.0897]
[5]
Moeslund TB, Hilton A, Krüger V. A survey of advances in vision-based human motion capture and analysis. Comput Vis Image Underst 2006; 104(2): 90-126.
[http://dx.doi.org/10.1016/j.cviu.2006.08.002]
[6]
Turaga P, Chellappa R, Subrahmanian VS, Udrea O. Machine recognition of human activities: a survey. IEEE Trans Circuits Syst Video Technol 2008; 18(11): 1473-88.
[7]
Poppe R. A survey on vision-based human action recognition. Image Vis Comput 2010; 28(6): 976-90.
[http://dx.doi.org/10.1016/j.imavis.2009.11.014]
[8]
Aggarwal J, Ryoo MS. Human activity analysis: A review. ACM Comput Surv 2011; 43(3): 16.
[http://dx.doi.org/10.1145/1922649.1922653]
[9]
Chi X, Govindarajan LN, Yu Z, Li C. Lie-x: Depth image based articulated object pose estimation, tracking, and action recognition on lie groups. Int J Comput Vis 2017; 123(3): 454-78.
[http://dx.doi.org/10.1007/s11263-017-0998-6]
[10]
Lowe DG. Distinctive image features from scale-invariant key points. Int J Comput Vis 2004; 60(2): 91-110.
[http://dx.doi.org/10.1023/B:VISI.0000029664.99615.94]
[11]
Dalal N, Triggs B. Histograms of oriented gradients for human detection. Proceedings of International Conference on Computer Vision & Pattern Recognition. 2005 June 20-25; San Diego, CA, USA: IEEE
[12]
Harinarayan S, Sonam K, Aniket DK, Pawan K, Mamookho ME. A machine learning approach to predict the performance of refrigerator and air conditioning using Gaussian process regression and support vector methods. Recent Pat Mech Eng 2021; 14(4): 550-7.
[13]
Schmidhuber J. Deep Learning in neural networks. Elsevier Science Ltd. 2015; pp. 85-117.
[14]
Fischler MA, Elschlager RA. The representation and matching of pictorial structures. Comput IEEE Trans 1973; 100(1): 67-92.
[http://dx.doi.org/10.1109/T-C.1973.223602]
[15]
Andriluka M, Roth S, Schiele B. Pictorial structures revisited: People detection and articulated pose estimation 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009; pp. 1014-21.
[16]
Zhang H, Meng L, Ma S, Jiang H, Wang H. Recent advances on robot visual servo control methods. Recent Pat Mech Eng 2021; 14(3): 298-312.
[http://dx.doi.org/10.2174/2212797613999201117151801]
[17]
Eichner M, Ferrari V. British Machine Vision Conference, BMVC 2009. London, UK. September 7-10, 2009;
[18]
Tian Y, Zitnick CL, Narasimlian SG. Exploring the Spatial Hierarchy of Mixture Models for Human Pose Estimation. In: Fitzgibbon A, Lazebnik S, Perona P, Sato Y, Schmid C (eds) Computer Vision– ECCV 2012 ECCV 2012. Lecture Notes in Computer Science, Springer, Berlin, Heidelberg 2012; 7576: p. 256-269.
[http://dx.doi.org/10.1007/978-3-642-33715-4_19]
[19]
Xiao B, Wu H, Wei Y. Simple Baselines for Human Pose Estimation and Tracking. In: Proceedings of the European conference on computer vision (ECCV). 2018; pp. 466-81.
[http://dx.doi.org/10.1007/978-3-030-01231-1_29]
[20]
Anan S. Output tracking performance of active exoskeleton robot using sliding mode control. Recent Pat Mech Eng 2021; 14(2): 242-51.
[21]
Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. In: Proceedings of IEEE ANIPSʼ15. Washington D. C. USA: IEEE Press 2015; pp. 91-9.
[22]
Li C, Zhong Q, Xie D, Pu S. Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2018; pp. 786-92.
[http://dx.doi.org/10.24963/ijcai.2018/109]
[23]
Zhang Z. Microsoft Kinect sensor and its effect. IEEE Multimed 2012; 19(2): 4-10.
[http://dx.doi.org/10.1109/MMUL.2012.24]
[24]
Jiang W, Liu Z. Mining actionlet ensemble for action recognition with depth cameras. IEEE Conference on Computer Vision and Pattern Recognition. 2012; pp. 1290-7.
[http://dx.doi.org/10.1109/CVPR.2012.6247813]
[25]
Luvizon DC, Tabia H. Learning features combination for human action recognition from skeleton sequences. Pattern Recognit Lett 2017; 99(11): 13-20.
[http://dx.doi.org/10.1016/j.patrec.2017.02.001]
[26]
Peng XJ, Cheng J. The spatial laplacian and temporal energy pyramid representation for human action recognition using depth sequence. Knowl -Based Syst 2017; 122: 64-74.
[http://dx.doi.org/10.1016/j.knosys.2017.01.035]
[27]
Zhang P, Lan C. View Adaptive Recurrent Neural Networks for High Performance Human Action Recognition from Skeleton Data IEEE International Conference on Computer Vision (ICCV). 2017; pp. 2136-45.
[28]
Xiao C, Wei Y, Ouyang W, Cheng M, Wang X. Multi-context attention for human pose estimation. Comp Vision Pattern Recognit 2017; 5669-5678.
[29]
Lan Z, Zhu Y, Hauptmann AG, Newsam S. Deep Local Video Feature for Action Recognition 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 2017; pp. 1219-25.
[http://dx.doi.org/10.1109/CVPRW.2017.161]
[30]
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L. Large-Scale Video Classification with Convolutional Neural Networks 2014 IEEE Conference on Computer Vision and Pattern Recognition. 2014; pp. 1725-32.
[http://dx.doi.org/10.1109/CVPR.2014.223]
[31]
Donahue J, Hendricks LA, Guadarrama S, et al. Long-term recurrent convolutional networks for visual recognition and description. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE 2015.
[32]
Wei Y, Ouyang W, Li H, Wang X. End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation Computer Vision & Pattern Recognition 2016; 3073-82.
[33]
Cao Z, Hidalgo G, Simon T, Wei S-E, Sheikh Y. OpenPose: Realtime multi-person 2D pose estimation using part affinity fields. IEEE Trans Pattern Anal Mach Intell 2021; 43(1): 172-86.
[http://dx.doi.org/10.1109/TPAMI.2019.2929257] [PMID: 31331883]
[34]
Si C, Chen W, Wang W, Wang L, Tan T. An attention enhanced graph convolutional lstm network for skeleton-based action recognition. 2019; pp. 1227-36.
[http://dx.doi.org/10.1109/CVPR.2019.00132]
[35]
Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Commun ACM [Internet] 2017; 60(6): 84-90. Available from:
[http://dx.doi.org/10.1145/3065386]
[36]
Lev G, Sadeh G, Klein B, Wolf L. RNN fisher vectors for action recognition and image annotation. European Conference on Computer Vision. 833-50.
[http://dx.doi.org/10.1007/978-3-319-46466-4_50]
[37]
Cheron G, Laptev I, Schmid C. P-CNN: Pose-based CNN features for action recognition. IEEE International Conference on Computer Vision. 3218-26.
[38]
Sankar S, Barfett J, et al. Recent advances in recurrent neural networks. 2018; 354-77.
[39]
Simonyan K, Zisserman A. Two-stream convolutional networks for action recognition in videos. In: Visual Geometry Group. University of Oxford 2014.
[40]
Mi Y. Motion Behavior Analysis Based on Dual Stream Convolution Neural Network. Shanghai Jiao Tong University 2017.
[41]
Yuan Z, Hu H. A fall recognition method based on double stream convolution neural network. Journal of Henan University 2017; 45(03)
[42]
Yin C, Liao Z, Hu H, Chen D. Weakly supervised video action localisation via two-stream action activation network. Electron Lett 2019; 55(21): 1126-7.
[http://dx.doi.org/10.1049/el.2019.2088]
[43]
Kim HG, Park JS, Kim DG, Lee HK. Two‐stream neural networks to detect manipulation of JPEG compressed images. Electron Lett 2018; 54(6): 354-5.
[http://dx.doi.org/10.1049/el.2017.4444]
[44]
Wang Y, Ma C, Mao Z. Behavior recognition based on spatiotemporal dual flow fusion network and attention model. Comp Appl Soft 2020; 37(8): 12-4.
[45]
Yu Xiaoyi, Wei Paifeng, Ma Aiping. Human behavior recognition based on dual flow independent circulation neural network. Laser Journal 2021; 42(4)
[46]
Li S, Li W, Cook C, Zhu C, Gao Y. Independently recurrent neural network (indrnn): Building a longer and deeper RNN. Proceedings of the IEEE conference on computer vision and pattern recognition. 5457-66.
[http://dx.doi.org/10.1109/CVPR.2018.00572]
[47]
Ahmad T. Action Recognition Based on Multi Domain Knowledge Convolution Neural Network. South China University of Technology 2020.
[48]
Wang L, Xiong Y, Wang Z, et al. Temporal segment networks for action recognition in videos. IEEE Trans Pattern Anal Mach Intell 2019; 41(11): 2740-55.
[http://dx.doi.org/10.1109/TPAMI.2018.2868668] [PMID: 30183621]
[49]
Wang L, Qiao Y, Tang X. Action recognition with trajectory-pooled deep-convolutional descriptors. In: CVPR. 2015; pp. 4305-14.
[50]
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, et al. Going deeper with convolutions. In: CVPR. 2015; pp. 1-9.
[51]
Ng JY-H, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G. Beyond short snippets: Deep networks for video classification. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE. 2015.
[52]
Wang L, Xiong Y, Wang Z, et al. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition. ECCV 2016.
[53]
Zheng B, Yun D, Liang Y. Research on behavior recognition based on feature fusion of automatic coder and recurrent neural network. J Intell Fuzzy Syst [Internet] 2020; 39(6): 8927-35. Available from:
[http://dx.doi.org/10.3233/jifs-189290]
[54]
Shi Y, Zeng Z. Time domain segmentation network behavior recognition based on feature propagation. J Comput Aided Des Comput Graph 2020; 32(04)
[55]
Hu Hailong. Research on skeleton extraction and shape analysis of point cloud model for mechanical digital products. Zhejiang Province: Zhejiang Sci-Tech University 2020; p. 137.
[56]
Geng X. Research on human pose skeleton extraction method based on point cloud data. In: Shanxi Province: North University of China. 2021; p. 3.
[57]
Wang H, Zhang J, Zhang X. Skeleton extraction algorithm of triangular mesh model. Jisuan Jishu Yu Zidonghua 2020; 39(02)
[58]
Sun S. Design and Optimization of Skeleton Extraction Algorithm Based on Convolutional Neural Network. Shanxi Province: Xidian University 2020.
[59]
Ma C. Research on Multi Person Skeleton Extraction Algorithm Based on Convolutional Neural Network. Shanxi Province: Xi'an University of Technology 2019.
[60]
Yan FT, Wang ZG, Ding Z, Qiao MY. Real-time multi-person video-based pose estimation. Jiguang Yu Guangdianzixue Jinzhan 2020; 57(02): 021006.
[http://dx.doi.org/10.3788/LOP57.021006]
[61]
Wang R, Li Q, Miao S, Miao K, Hua D. Design of intelligent controller for ship motion with input saturation based on optimized radial basis function neural network. Recent Pat Mech Eng 2021; 14(1): 105-15.
[http://dx.doi.org/10.2174/2212797613999200730211514]
[62]
Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput 1997; 9(8): 1735-80.
[http://dx.doi.org/10.1162/neco.1997.9.8.1735] [PMID: 9377276]
[63]
Song Sijie, Lan Cuiling, Xing Junliang, Zeng Wenjun, Liu Jiaying. An end-to-end spatio-temporal attention model for human action recognition from skeleton data.AAAI 2017; 4263-70.
[64]
Xingjian SHI, Chen Z, Wang H, Yeung DY, Wong WK, Woo WC. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In: Advances in Neural Information Processing Systems. 2015; pp. 802-10.
[65]
Li Z, Gavrilyuk K, Gavves E, Jain M, Snoek CG. Videolstm convolves, attends and flows for action recognition. Comput Vis Image Underst 2018; 166: 41-50.
[http://dx.doi.org/10.1016/j.cviu.2017.10.011]
[66]
LeCun Y, Boser B, Denker JS. Backpropagation applied to handwritten zip code recognition. Neural Computation 1989; 1(4): 541-51.
[http://dx.doi.org/10.1162/neco.1989.1.4.541]
[67]
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M. Learning Spatiotemporal Features with 3D Convolutional Networks. In: 2015 IEEE International Conference on Computer Vision (ICCV). IEEE 2015.
[68]
Carreira J, Zisserman A. Quo vadis, action recognition? a new model and the kinetics dataset. Honolulu, HI: IEEE 2017; pp. 6299-308.
[http://dx.doi.org/10.1109/CVPR.2017.502]
[69]
Diba A, Fayyaz M, Sharma V, et al. Temporal 3D convnets: New architecture and transfer learning for video classification. 2017.
[70]
Qiu Z, Yao T, Mei T. Learning spatio-temporal representation with Pseudo-3D residual networks [Internet]. arXiv [csCV] 2017. Available from: http://arxiv.org/abs/1711.10305
[71]
Jhawar N, Schoenberg E, Wang JV, Saedi N. The growing trend of cannabidiol in skincare products. Clin Dermatol 2019; 37(3): 279-81.
[http://dx.doi.org/10.1016/j.clindermatol.2018.11.002] [PMID: 31178109]
[72]
Kozela E, Juknat A, Gao F, Kaushansky N, Coppola G, Vogel Z. Pathways and gene networks mediating the regulatory effects of cannabidiol, a nonpsychoactive cannabinoid, in autoimmune T cells. J Neuroinflammation 2016; 13(1): 136.
[http://dx.doi.org/10.1186/s12974-016-0603-x] [PMID: 27256343]
[73]
Wang S, Wang H, Lin C, et al. Structure-activity relationship study of dihydroartemisinin C-10 hemiacetal derivatives as Toll-like receptor 4 antagonists. Bioorg Chem 2021; 114: 105107.
[http://dx.doi.org/10.1016/j.bioorg.2021.105107] [PMID: 34175717]
[74]
Yang K, Qiao P, Li D, Lv S, Dou Y. Exploring temporal preservation networks for precise temporal action localization. Proc Conf AAAI Artif Intell 2018; 32(1)
[75]
Shou Z, Chan J, Zareian A, Miyazawa K, Chang S-F. CDC: Convolutional-de-convolutional networks for precise temporal action localization in untrimmed videos. IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[76]
Wang H, Schmid C. Action recognition with improved trajectories. Proceedings of the IEEE international conference on computer vision. 3551-8.
[77]
Chen Y, Gao X. The latest development of deep learning. Computer Science and Applications 2018; 8(4): 565-71.
[http://dx.doi.org/10.12677/CSA.2018.84064]
[78]
Wang H, Kläser A, Schmid C, Liu CL. Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 2013; 103(1): 60-79.
[http://dx.doi.org/10.1007/s11263-012-0594-8]
[79]
Wang H, Klaser A, Schmid C, Liu C-L. Action recognition by dense trajectories. 2013 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI. 3169-76.
[80]
Diba1 Ali, Sharma1 Vivek, Gool Luc Van. Deep temporal linear encoding networks ESAT-PSI, KU Leuven, CVL, ETH Zurich 2016.
[81]
Xu H, Das A, Saenko K. Two-stream region convolutional 3D network for temporal activity detection. IEEE Trans Pattern Anal Mach Intell 2019; 41(10): 2319-32.
[http://dx.doi.org/10.1109/TPAMI.2019.2921539] [PMID: 31180838]
[82]
Li R, Wang L, Ke W. A survey of human body action recognition. Pattern Recognition and Artificial Intelligence 2014; 27(1): 35-48.
[83]
Liu M. Research on human action recognition based on deep learning. Zhengzhou: Henan University 2018.
[84]
Liu S. Video-based action recognition. Shijiazhuang: Hebei Normal University 2017.
[85]
Fuller L. Sport sex scandals: A comparison between penn state and USA gymnastics. Democratic Communique 2020; 29(1): 116-23.
[86]
Cho S, Foroosh H. Spatio-temporal fusion networks for action recognition Computer Vision – ACCV 2018. Cham: Springer International Publishing 2019; pp. 347-64.
[http://dx.doi.org/10.1007/978-3-030-20887-5_22]
[87]
Xu WR, Miao ZJ, Tian Y. A novel mid-level distinctive feature learning for action recognition via diffusion map. Neurocomputing 2016; 218: 185-96.
[http://dx.doi.org/10.1016/j.neucom.2016.08.057]
[88]
Blank M, Gorelick L, Shechtman E, Irani M, Basri R. Actions as space-time shapes. In: Tenth IEEE International Conference on Computer Vision (ICCV’05) Volume 1. IEEE;. 2005.
[89]
Tong M, Wang HY, Tian WJ, Yang S. Action recognition new framework with robust 3D-TCCHOGAC and 3D-HOOFGAC. Multimedia Tools Appl 2017; 76(2): 3011-30.
[http://dx.doi.org/10.1007/s11042-016-3279-4]
[90]
Marszalek M, Laptev I, Schmid C. Actions in context. 22nd IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2929-36.
[91]
Niebles JC, Chen C-W, Fei-Fei L. Modeling temporal structure of decomposable motion segments for activity classification. In: Computer Vision – ECCV 2010. Berlin, Heidelberg: Springer Berlin Heidelberg; 2010; pp. 392-405.
[92]
Li Y, Li W, Mahadevan V, Vasconcelos N. VLAD3: Encoding dynamics of deep features for action recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE. 2016.
[93]
Ghadiyaram D, Tran D, Mahajan D. Large-scale weakly-supervised pre training for video action recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 12046-55.
[http://dx.doi.org/10.1109/CVPR.2019.01232]
[94]
Goyal R, Kahou SE, Michalski V, et al. The “something something” video database for learning and evaluating visual common sense [Internet]. arXiv [csCV] 2017. Available from: http://arxiv.org/abs/1706.04261
[95]
Sudhakaran S, Escalera S, Lanz O. Gate-shift networks for video action recognition. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE. 2020.
[96]
Wang L, Huynh DQ, Koniusz P. A comparative review of recent kinect-based action recognition algorithms. IEEE Trans Image Process 2020; 29: 15-28.
[http://dx.doi.org/10.1109/TIP.2019.2925285] [PMID: 31283506]
[97]
Takano W, Nakamura Y. Statistical mutual conversion between whole body motion primitives and linguistic sentences for human motions. Int J Robot Res 2015; 34(10): 1314-28.
[http://dx.doi.org/10.1177/0278364915587923]
[98]
Damen D, Doughty H, Farinella G, et al. The epic-kitchens dataset: Collection, challenges and baselines. IEEE Trans Pattern Anal Mach Intell 2020; (01): 1.
[PMID: 32365017]
[99]
Gu C, Sun C, Ross DA, Vondrick C, Pantofaru C, Li Y, et al. AVA: A video dataset of spatio-temporally localized atomic visual actions. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE. 2018.
[100]
Fernando B, Gavves E, Jose Oramas M, Ghodrati A, Tuytelaars T. Modeling video evolution for action recognition. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE 2015.
[101]
Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, et al. Temporal segment networks: Towards good practices for deep action recognition [Internet]. arXiv [csCV] 2016. Available from: http://arxiv.org/abs/1608.00859
[102]
Feichtenhofer C, Pinz A, Zisserman A. Convolutional two-stream network fusion for video action recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE. 2016.

Rights & Permissions Print Cite
© 2025 Bentham Science Publishers | Privacy Policy