Abstract
Background: Deep neural network based methods have obtained great progress in a variety of computer vision tasks, as described in various patents. But, so far, it is still a challenge task to model temporal dependencies in the tasks of recognizing object movement from videos.
Method: In this paper, we propose a multi-timescale gated neural network for encoding the temporal dependencies from videos. The developed model stacks multiple gated layers in a recurrent pyramid, which makes it possible to hierarchically model not just pairs but long-term dependencies from video frames. Additionally, the model combines the Convolutional Neural Networks into its structure that exploits the pictorial nature of the frames and reduces the number of model parameters. Result: We evaluated the proposed model on the datasets of synthetic bouncing-MNIST, standard actions benchmark of UCF101 and facial expressions benchmark of CK+. The experiment results reveal that on all tasks, the proposed model outperforms the existing approach to build deep stacked gated model and achieves superior performance compared to several recent state-of-the-art techniques. Conclusion: From the experimental results, we can make the conclusion that our proposed model is able to adapt its structure based on different time scales and can be applied in motion estimation, action recognition and tracking, etc.Keywords: Deep learning, neural networks, multiplicative interactions, optimization, model temporal dependencies.
8
1
1