Generic placeholder image

International Journal of Sensors, Wireless Communications and Control

Editor-in-Chief

ISSN (Print): 2210-3279
ISSN (Online): 2210-3287

Research Article

Malicious apps Identification in Android Devices Using Machine Learning Algorithms

Author(s): Ravinder Ahuja*, Vineet Maheshwari, Siddhant Manglik, Abiha Kazmi, Rishika Arora and Anuradha Gupta

Volume 10, Issue 4, 2020

Page: [559 - 569] Pages: 11

DOI: 10.2174/2210327909666191204125100

Price: $65

Abstract

Background & Objective: In this paper, malicious apps detection system is implemented using machine learning algorithms. For this 330 permission based features of 558 android applications are taken into consideration.

Methods: The main motto of this work is to develop a model which can effectively detect the malicious and benign apps. In this we have used six feature selection techniques which will extract important features from 330 permission based features of 558 apps and further fourteen classification algorithms are applied using Python language.

Results: In this paper, an efficient model for detecting malicious apps has been proposed.

Conclusion: Proposed model is able to detect malicious apps approx. 3% better than existing system.

Keywords: Classification techniques, feature selection, malware identification, ensemble algorithms, static analysis, python language.

Graphical Abstract

[1]
Aswini AM, Vinod P. Android Malware Analysis Using Ensemble Features. In: Chakraborty RS, Matyas V, Schaumont P (eds)Security, Privacy, and Applied Cryptography Engineering SPACE. 2014.Lecture Notes in Computer Science,; vol 8804. Springer,Cham..
[http://dx.doi.org/10.1007/978-3-319-12060-7_20]
[2]
Wang W, Li Y, Wang X, Liu J, Zhang X. Detecting android malicious apps and categorizing benign apps with an ensemble of classifiers. Future Gener Comput Syst 2018; 78: 987-94.
[http://dx.doi.org/10.1016/j.future.2017.01.019]
[3]
Pandita R, Xiao X, Yang W, Enck W, Xie T. WHYPER: Towards automating risk assessment of mobile applications the 22nd USENIX Security Symposium (USENIX Security 13),2013.
[4]
Barrera D, Kayacik HG, van Oorschot PC, Somayaji A. A methodology for empirical analysis of permission-based security models and its application to android. In: . Proceedings of the17th ACM conference on Computer and communications security,. 2010.
[5]
Mahindru A, Singh P. Dynamic permissions based android malware detection using machine learning techniques. Proceedings of the 10th Innovations in Software Engineering Conference 2017.
[http://dx.doi.org/10.1145/3021460.3021485]
[6]
Felt AP, Greenwood K, Wagner D. The effectiveness of application permissions. Proceedings of WebApps’11 Proceedings of the 2nd USENIX conference on Web application development 2011.
[7]
Enck W, Ongtang M, McDaniel P. On lightweight mobile phone application certification. Proceedings of the 16th ACM conference on Computer and communications security 2009.
[8]
Zhou Y, Wang Z, Zhou W, Jiang X. Hey, you, get off of my market: Detecting malicious apps in official and alternative android markets. Proceedings of the 19th Network and Distributed System Security Symposium NDSS 2012.
[9]
Shabtai A, Fledel Y, Elovici Y. Automated static code analysis for classifying android applications using machine learning.2010 International Conference on Computational Intelligence and Security In: Nanning, China 2010.
[10]
La Polla M, Martinelli F, Sgandurra D. A survey on security for mobile devices. IEEE Comm Surv Tutor 2013; 15(1): 446-71.
[http://dx.doi.org/10.1109/SURV.2012.013012.00028]
[11]
Nath H, Mehtre B. Static malware analysis using machine learning methodsIn: Communications in Computer and Information Science (CCIS2014), 2014..
[http://dx.doi.org/10.1007/978-3-642-54525-2_39]
[12]
Lindorfer M, Neugschwandtner M, Platzer C. MARVIN: Efficient and comprehensive mobile app classification through static and dynamic analysis. 2015 IEEE 39th Annual Computer Software and Applications Conference (COMPSAC),. 2015.
[13]
Pirscoveanu R, Hansen S, Larsen T, et al. IEEE International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA) 2017.
[14]
Amin M, Zaman M, Hossain M, et al. Behavioral malware detection approaches for Android. 2016 IEEE International Conference on Communications (ICC) In: Kuala Lumpur, Malaysia 2016.
[15]
Wang W, Zhao M, Wang J. Effective android malware detection with a hybrid model based on deep autoencoder and convolutional neural network. J Ambient Intell Humaniz Comput 2019; 10: 3035-43.
[16]
Zhou Y, Wang Z, Zhou W, Jiang X. .Hey, you, get off of my market: Detecting malicious apps in official and alternative android markets. NDSS 2012; (4): 50-2..
[17]
Jain A, Singh AK. Integrated malware analysis using machine learning. 2nd International Conference on Telecommunication and Networks (TEL-NET) In: Noida, India 2017.
[18]
Karbab EB, Debbabi M, Derhab A, Mouheb D. MalDozer: Automatic framework for android malware detection using deep learning. Digit Invest 2018; 24: S48-59.
[http://dx.doi.org/10.1016/j.diin.2018.01.007]
[19]
López CCU, Cadavid AN. Framework for malware analysis in Android. Syst Telemat 2016; 14(37): 45-56.
[20]
Díaz-Uriarte R, Alvarez de Andrés S. Gene selection and classification of microarray data using random forest. BMC Bioinformat 2006; 7(1): 3.
[http://dx.doi.org/10.1186/1471-2105-7-3 PMID: 16398926]
[21]
Ding H, Feng PM, Chen W, Lin H. Identification of bacteriophage virion proteins by the ANOVA feature selection and analysis. Mol Biosyst 2014; 10(8): 2229-35.
[http://dx.doi.org/10.1039/C4MB00316K PMID: 24931825]
[22]
Demir O, Yılmaz CA. Computer-aided detection of lung nodules using outer surface features. Biomed Mater Eng 2015; 26(1): S1213-22.
[http://dx.doi.org/10.3233/BME-151418 PMID: 26405880]
[23]
Koller D, Sahami M. Toward optimal feature selection. Stanford InfoLab 1996.
[24]
Granitto PM, Furlanello C, Biasioli F, Gasperi F. Recursive feature elimination with the random forest for PTR-MS analysis of agro-industrial products. Chemom Intell Lab Syst 2006; 83(2): 83-90.
[http://dx.doi.org/10.1016/j.chemolab.2006.01.007]
[25]
Jin X, Xu A, Bie R, Guo P. Machine learning techniques and chi-square feature selection for cancer classification using SAGE gene expression profiles. International Workshop on Data Mining for Biomedical Applications 2006.
[http://dx.doi.org/10.1007/11691730_11]
[26]
Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf B. Support vector machines. IEEE Intell Syst Their Appl 1998; 13(4): 18-28.
[http://dx.doi.org/10.1109/5254.708428]
[27]
Vincent P, Bengio Y. K-local hyperplane and convex distance nearest neighbour algorithms.In: Advances in Neural Information Processing Systems, 2002.
[28]
McCallum A, Nigam K. A comparison of event models for naive Bayes text classification AAAI-98 workshop on learning for text categorization 1998; 752(1): 41-8..
[29]
Agatonovic-Kustrin S, Beresford R. Basic concepts of Artificial Neural Network (ANN) modeling and its application in pharmaceutical research. J Pharm Biomed Anal 2000; 22(5): 717-27.
[http://dx.doi.org/10.1016/S0731-7085(99)00272-1 PMID: 10815714]
[30]
Liaw A, Wiener M. Classification and regression by random forest. R News 2002; 2(3): 18-22.
[31]
Soucy P, Mineau GW. A simple KNN algorithm for text categorization. Proceedings 2001 IEEE International Conference on Data Mining San Jose, CA, USA. 2001.
[http://dx.doi.org/10.1109/ICDM.2001.989592]
[32]
Sanders SR, Noworolski JM, Liu XZ, Verghese GC. Generalized averaging method for power conversion circuits. IEEE Trans Power Electron 1991; 6(2): 251-9.
[http://dx.doi.org/10.1109/63.76811]
[33]
Dietterich TG. Ensemble methods in machine learning. International workshop on multiple classifier systems, 2000..
[http://dx.doi.org/10.1007/3-540-45014-9_1]
[34]
Breiman L. Bagging predictors. Mach Learn 1996; 24(2): 123-40.
[http://dx.doi.org/10.1007/BF00058655]
[35]
Ke G, Meng Q, Finley T, et al. Lightgbm: A highly efficient gradient boosting decision tree.In: Advances in Neural Information Processing Systems, 2017.
[36]
Korada NK, Kumar NSP. Implementation of naïve Bayesian classifier and ada-boost algorithm using maize expert system. Int J Informat Sci Tech 2012; 2(3): 2.
[37]
Ridgeway G. Generalized Boosted Models: A guide to the GBM package. Update 2007; 2007: 1.
[38]
Naess OE. Superstack- An iterative stacking algorithm. Geophys Prospect 1979; 27(1): 16-28.
[http://dx.doi.org/10.1111/j.1365-2478.1979.tb00956.x]
[39]
Rokach L. Ensemble-based classifiers. Artif Intell Rev 2010; 33(1-2): 1-39.
[http://dx.doi.org/10.1007/s10462-009-9124-7]
[40]
Ruta D, Gabrys B. Classifier selection for majority voting. Inf Fusion 2005; 6(1): 63-81.
[http://dx.doi.org/10.1016/j.inffus.2004.04.008]
[41]
Umanol M, Okamoto H, Hatono I, et al. Fuzzy decision trees by fuzzy ID3 algorithm and its application to diagnosis systems. In: Proceedings of 1994 IEEE 3rd International Fuzzy Systems Conference. Orlando, FL, USA 1994.
[http://dx.doi.org/10.1109/FUZZY.1994.343539]
[42]
Tabaei BP, Herman WH. A multivariate logistic regression equation to screen for diabetes: Development and validation. Diabetes Care 2002; 25(11): 1999-2003.
[http://dx.doi.org/10.2337/diacare.25.11.1999 PMID: 12401746]
[43]
Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: Machine learning in Python. J Mach Learn Res 2011; 12: 2825-30.

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy