A Framework for Multimedia Event Classification With Convoluted Texture Feature

Kaavya       Kanagaraj; Lakshmi    Priya    G.G.

doi:10.2174/2213275912666190618125748

Abstract

Background: The proposed work uses two approaches as its background. They are (i) LBP approach (ii) Kirsch compass mask.

Texture classification plays a vital role in object discrimination from the original image. LBP is majorly used for classifying texture. Many filtering based methods co-occurrence matrix method, etc., were used, but due to the computational efficiency and invariance to monotonic grey level changes, LBP is adopted majorly. Second, as Edge plays a vital role in discriminating the object visually, Kirsch compass mask was applied to obtain maximum edge strength in 8 compass directions which has the advantage of changing the mask according to users own requirement than any other compass mask.

Objective: The objective of our work was to extract better features and model a classifier for the Multimedia Event Detection task.

Methods: The proposed work consists of two steps for feature extraction. Initially, an LBP based approach is used for object discrimination, later, convolution is used for object magnitude determination using Kirsch Compass mask. Eigenvalue decomposition is adopted for feature representation. Finally, a classifier is modelled using a chi-square kernel for the event classification task.

Result: The proposed event detection work is experimented using Columbia Consumer Video (CCV) dataset. It contains 20 event based videos. The proposed work is evaluated with other existing works using mean Average Precision (mAP). Several experiments have been carried out to evaluate our work, they are LBP vs. non-LBP approach, Kirsch vs. Robinson compass mask, Kirsch masks angle wise analysis, comparison of above approaches are performed in a modeled classifier. Two approaches are used to compare the proposed work with other existing works.They are (i) Non Clustered Events (events were considered individually and one versus one strategy was followed) (ii) Clustered Events (some events were clustered and followed one vs. all strategy and remaining events were non-clustered).

Conclusion: In the proposed work, a method for event detection is described. Feature extraction is performed using LBP based approach and Kirsch compass mask for convolution. For event detection, a classifier model is generated using the chi-square kernel. The accuracy of event classification is further increased using clustered events approach. The proposed work is compared with various state- of- the- art methods and proved that the proposed work obtained outstanding performance.

Keywords: Multimedia event detection, convolution, kirsch compass mask, local binary pattern, keyframe extraction, feature extraction, columbia consumer video.

Graphical Abstract

[1] 
F. Chen, D. Delannay,  and C. De Vleeschouwer, "An autonomous framework to produce and distribute personalized team-sport video summaries: A basketball case study", IEEE Trans. Multimed., vol. 13, no. 6, pp. 1381-1394, 2011.
[http://dx.doi.org/10.1109/TMM.2011.2166379] 
[2] 
V. Mezaris, A. Dimou,  and I. Kompatsiaris, Local invariant feature tracks for high-level video feature extraction., Analysis, Retrieval and Delivery of Multimedia Content. Springer New York, 2013, pp. 165-180.
[http://dx.doi.org/10.1007/978-1-4614-3831-1_10] 
[3] 
E. Younessian, T. Mitamura,  and A. Hauptmann, "Multimodal knowledge-based analysis in multimedia event detection", Proceedings of the 2nd ACM International Conference on Multimedia Retrieval, 2012pp. 1-8 
[http://dx.doi.org/10.1145/2324796.2324855] 
[4] 
Y. Miao, "Improvements to speaker adaptive training of deep neural networks", Spoken Language Technology Workshop (SLT), 2014pp. 165-170 
[http://dx.doi.org/10.1109/SLT.2014.7078568] 
[5] 
Y. Miao, and F. Metze, “Improving low-resource CD-DNN-HMM
using dropout and multilingual DNN training”, In Improving Lowresource CD-DNN-HMM Using Dropout and Multilingual DNN
Training, p. 2237, 2013, 
[6] 
S. I. Yu, L. Jiang, Z. Mao, X. Chang, X. Du, C. Gan, Z. Lan, Z. Xu, X. Li,
Y. Cai, and A. Kumar, “Informedia@ trecvid 2014 med and mer”, In
NIST TRECVID Video Retrieval Evaluation Workshop, Vol. 24, 2014, 
[7] 
D. Xu, "Hierarchical activity discovery within spatio-temporal context for video anomaly detection", In 20th IEEE International Conference on Image Processing (ICIP), 2013pp. 3597-3601 
[http://dx.doi.org/10.1109/ICIP.2013.6738742] 
[8] 
J. Kim,  and K. Grauman, "Observe locally, infer globally: A space-time MRF for detecting abnormal activities with incremental updates", In IEEE Conference on Computer Vision and Pattern Recognition, 2009pp. 2921-2928 
[http://dx.doi.org/10.1109/CVPR.2009.5206569] 
[9] 
R. Mehran, A. Oyama,  and M. Shah, "Abnormal crowd behavior detection using social force model", In IEEE Conference on Computer Vision and Pattern Recognition, 2009pp. 935-942 
[http://dx.doi.org/10.1109/CVPR.2009.5206641] 
[10] 
M. Bertini, A. Del Bimbo,  and L. Seidenari, "Multi-scale and real-time non-parametric approach for anomaly detection and localization", Comput. Vis. Image Underst., vol. 116, no. 3, pp. 320-329, 2012.
[http://dx.doi.org/10.1016/j.cviu.2011.09.009] 
[11] 
N. Serrano, A.E. Savakis,  and J. Luo, "Improved scene classification using efficient low-level features and semantic cues", Pattern Recognit., vol. 37, no. 9, pp. 1773-1784, 2004.
[http://dx.doi.org/10.1016/j.patcog.2004.03.003] 
[12] 
A.B. Chan,  and N. Vasconcelos, "Counting people with low-level features and Bayesian regression", IEEE Trans. Image Process., vol. 21, no. 4, pp. 2160-2177, 2012.
[http://dx.doi.org/10.1109/TIP.2011.2172800] [PMID:  22020684] 
[13] 
I. Laptev, "On space-time interest points", Int. J. Comput. Vis., vol. 64, no. 2-3, pp. 107-123, 2005.
[http://dx.doi.org/10.1007/s11263-005-1838-7] 
[14] 
I. Laptev, "Learning realistic human actions from movies", In IEEE Conference on Computer Vision and Pattern Recognition, 2008pp. 1-8 
[http://dx.doi.org/10.1109/CVPR.2008.4587756] 
[15] 
K.T. Lai, “Video event detection by inferring temporal instance
labels”, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 2243-2250., 
[http://dx.doi.org/10.1109/CVPR.2014.288] 
[16] 
A. Krizhe, "vsky, I. Sutskever, and G.E. Hinton, “Imagenet classification with deep convolutional neural networks", Adv. Neural Inf. Process. Syst., pp. 1097-1105, 2012.
[17] 
M.D. Zeiler, and R. Fergus, “Visualizing and understanding convolutional networks”, In European Conference on Computer Vision,
Springer, Cham, pp. 818-833, 2014., 
[http://dx.doi.org/10.1007/978-3-319-10590-1_53] 
[18] 
R. Trichet, R. Nevatia, and B. Burns, “Video event classification with
temporal partitioning”, In 2015 12th IEEE International Conference on
Advanced Video and Signal Based Surveillance (AVSS), pp. 1-6, 2015, 
[http://dx.doi.org/10.1109/AVSS.2015.7301782] 
[19] 
Q.V. Le, W.Y. Zou, S.Y. Yeung,  and A.Y. Ng, Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis In CVPR, pp. 3361-3368, 2011
[http://dx.doi.org/10.1109/CVPR.2011.5995496] 
[20] 
S. Liao, M.W. Law,  and A.C. Chung, "Dominant local binary patterns for texture classification", IEEE Trans. Image Process., vol. 18, no. 5, pp. 1107-1118, 2009.
[http://dx.doi.org/10.1109/TIP.2009.2015682] [PMID:  19342342] 
[21] 
S. Liao, G. Zhao, V. Kellokumpu, M. Pietikäinen,  and S.Z. Li, "Modeling pixel process with scale invariant local patterns for background subtraction in complex scenes", In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1301-1306 2010, 
[http://dx.doi.org/10.1109/CVPR.2010.5539817] 
[22] 
N.P. Doshi,  and G. Schaefer, "A comprehensive benchmark of local binary pattern algorithms for texture retrieval", Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), pp. 2760-2763  2012
[23] 
G. Zhao,  and M. Pietikainen, "Dynamic texture recognition using local binary patterns with an application to facial expressions", IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 6, 2007.
[http://dx.doi.org/10.1109/TPAMI.2007.1110] 
[24] 
V. Kellokumpu, G. Zhao,  and M. Pietikäinen, Human activity recognition using a dynamic texture based method  In BMVC, vol.1, pp. 2, 2008.
[25] 
J. Trefný,  and J. Matas, Extended set of local binary patterns for rapid object detection In Computer vision winter workshop, pp. 1-7, 2010.
[26] 
A. Satpathy, X. Jiang,  and H.L. Eng, "LBP-based edge-texture features for object recognition", IEEE Trans. Image Process., vol. 23, no. 5, pp. 1953-1964, 2014.
[http://dx.doi.org/10.1109/TIP.2014.2310123] [PMID:  24690574] 
[27] 
W. Li, "Dynamic pooling for complex event recognition", Proceedings of the IEEE International Conference on Computer Vision, 2013pp. 2728-2735 
[28] 
K.T. Lai, F.X. Yu, M.S. Chen,  and S.F. Chang, "Video event detection by inferring temporal instance labels", In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2243-2250  2014
[http://dx.doi.org/10.1109/CVPR.2014.288] 
[29] 
Z. Xu, Y. Yang,  and A.G. Hauptmann, "A discriminative CNN video representation for event detection", In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1798-1807  2015
[http://dx.doi.org/10.1109/CVPR.2015.7298789] 
[30] 
H. Jégou, M. Douze, C. Schmid,  and P. Pérez, "Aggregating local descriptors into a compact image representation", In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3304-3311  2010
[http://dx.doi.org/10.1109/CVPR.2010.5540039] 
[31] 
X. Wang,  and Q. Ji, "Video event recognition with deep hierarchical context model", In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4418-4427  2015
[http://dx.doi.org/10.1109/CVPR.2015.7299071] 
[32] 
E.F. Can,  and R. Manmatha, "Modeling concept dependencies for event detection", In Proceedings of International Conference on Multimedia Retrieval, pp. 289-296  2014
[http://dx.doi.org/10.1145/2578726.2578763]] 
[33] 
H. Li, "Multi-level feature representations for video semantic concept detection", Neurocomputing, vol. 172, pp. 64-70, 2016.
[http://dx.doi.org/10.1016/j.neucom.2014.09.096] 
[34] 
K. Tang, B. Yao, L. Fei-Fei,  and D. Koller, "Combining the right features for complex event recognition", In Proceedings of the IEEE International Conference on Computer Vision, pp. 2696-2703  2013
[http://dx.doi.org/10.1109/ICCV.2013.335] 
[35] 
L. Jiang, A.G. Hauptmann, and G. Xiang, “Leveraging high-level
and low-level features for multimedia event detection”, In Proceedings of the 20th ACM International Conference on Multimedia, pp.
449-458, 2012, 
[http://dx.doi.org/10.1145/2393347.2393412]] 
[36] 
Y.G. Jiang, "Super fast event recognition in internet videos", IEEE Trans. Multimed., vol. 17, no. 8, pp. 1174-1186, 2015.
[http://dx.doi.org/10.1109/TMM.2015.2436813] 
[37] 
X. Zhang, H. Zhang, Y. Zhang, Y. Yang, M. Wang, H. Luan, J. Li,  and T.S. Chua, "Deep fusion of multiple semantic cues for complex event recognition", IEEE Trans. Image Process., vol. 25, no. 3, pp. 1033-1046, 2016.
[http://dx.doi.org/10.1109/TIP.2015.2511585] [PMID:  26780785] 
[38] 
Z. Zhao, Y. Song,  and F. Su, "Specific video identification via joint learning of latent semantic concept, scene and temporal structure", Neurocomputing, vol. 208, pp. 378-386, 2016.
[http://dx.doi.org/10.1016/j.neucom.2016.06.002] 
[39] 
J. Yu-Gang, "Consumer video understanding: A benchmark database and an evaluation of human and machine performance", In Proceedings of the 1st ACM International Conference on Multimedia Retrieval, pp. 1-8  2011
[40] 
I-H. Jhuo, "Discovering joint audio-visual codewords for video event detection", Mach. Vis. Appl., vol. 25, no. 1, pp. 33-47, 2014.
[http://dx.doi.org/10.1007/s00138-013-0567-0] 
[41] 
A.J. Ma,  and P.C. Yuen, "Reduced analytic dependency modeling: Robust fusion for visual recognition", Int. J. Comput. Vis., vol. 109, no. 3, pp. 233-251, 2014.
[http://dx.doi.org/10.1007/s11263-014-0723-7] 
[42] 
K.T. Lai, D. Liu, S.F. Chang,  and M.S. Chen, "Learning sample specific weights for late fusion", IEEE Trans. Image Process., vol. 24, no. 9, pp. 2772-2783, 2015.
[http://dx.doi.org/10.1109/TIP.2015.2423560] [PMID:  25879948] 
[43] 
G. Ye, D. Liu, I.H. Jhuo, and S.F. Chang, “Robust late fusion with
rank minimization”, In 2012 IEEE Conference on Computer Vision
and Pattern Recognition, pp. 3021-3028), 
[44] 
A.R. Rivera, J.A. Rojas Castillo,  and O. Chae, "Recognition of face expressions using local principal texture pattern", In 19th IEEE International Conference on Image Processing, pp. 2609-2612 
[45] 
S. Kaavya,  and G.G. Lakshmi Priya, "Local binary pattern based shot boundary detection for video summarization", In 2017 Second International Conference on Recent Trends and Challenges in Computational Models (ICRTCCM), pp. 165-169  2017
[http://dx.doi.org/10.1109/ICRTCCM.2017.64] 
[46] 
H. Wang, “Action recognition by dense trajectories”, In CVPR. pp.
3169-3176, 2011, 
[http://dx.doi.org/10.1109/CVPR.2011.5995407] 

Rights & Permissions Print Cite

Article Metrics

6

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/2213275912666190618125748	Print ISSN 2666-2558
Publisher Name Bentham Science Publisher	Online ISSN 2666-2566

Recent Advances in Computer Science and Communications

A Framework for Multimedia Event Classification With Convoluted Texture Feature

Abstract Play Pause

Graphical Abstract

Related Journals

Related Books

Abstract