Abstract
Background: Identification of human actions from video has gathered much attention in past few years. Most of the computer vision tasks such as Health Care Activity Detection, Suspicious Activity detection, Human Computer Interactions etc. are based on the principle of activity detection. Automatic labelling of activity from videos frames is known as activity detection.
Objective: Motivation of this work is to use most out of the data generated from sensors and use them for recognition of classes. Recognition of actions from videos sequences is a growing field with the upcoming trends of deep neural networks.
Methods: Automatic learning capability of Convolutional Neural Network (CNN) make them good choice as compared to traditional handcrafted based approaches. With the increasing demand of RGB-D sensors combination of RGB and depth data is in great demand.
Results: This work comprises of the use of dynamic images generated from RGB combined with depth map for action recognition purpose. We have experimented our approach on pre trained VGG-F model using MSR Daily activity dataset and UTD MHAD Dataset. We achieve state of the art results. To support our research, we have calculated different parameters apart from accuracy such as precision, F score, recall.
Conclusion: Accordingly, the investigation confirms improvement in term of accuracy, precision, F-Score and Recall. The proposed model is 4 Stream model is prone to occlusion, used in real time and also the data from the RGB-D sensor is fully utilized.
Keywords: Convolution, deep learning, RGB-D, CNN, VGG, stream model.
Graphical Abstract