Abstract
Background: The entire world is shifting towards electronic communication through Email for fast and secure communication. Millions of people, including organization, government, and others, are using Email services. This growing number of Email users are facing problems; therefore, detecting phishing Email is a challenging task, especially for non-IT users. Automatic detection of phishing Email is essential to deploy along with Email software. Various authors have worked in the field of phishing Email classification with different feature selection and optimization techniques for better performance.
Objectives: This paper attempts to build a model for the detection of phishing Email using data mining techniques. This paper's significant contribution is to develop and apply Feature Selection Technique (FST) to reduce features from the phishing Email benchmark data set.
Methods: The proposed Pruning Based Feature Selection Technique (PBFST) is used to determine the rank of feature based on the level of the tree where feature exists. The proposed algorithm is integrated with already developed Bucket Based Feature Selection Technique (BBFST). BBFST is used as an internal part to rank features in a particular level of the tree.
Results: Experimental work was carried out with open source WEKA data mining software using a 10-fold cross-validation technique. The proposed FST was compared with other ranking based FSTs to check the performance of C4.5 classifier with Phishing Email data set.
Conclusion: The proposed FST reduces 33 features out of 47 features which exist in phishing Email dataset and C4.5 algorithm produces remarkable accuracy of 99.06% with only 11 features and it has been found to be better than other existing FSTs.
Keywords: Phishing e-mail detection, Pruning Based Feature Selection Technique (PBFST), classification, Decision Tree(DT), gain ratio, data mining.
Graphical Abstract
[http://dx.doi.org/10.1016/j.asoc.2016.08.005]
[http://dx.doi.org/10.1016/j.chb.2016.02.065]
[http://dx.doi.org/10.1016/j.procs.2015.04.230]
[http://dx.doi.org/10.1016/j.cose.2015.02.008]
[http://dx.doi.org/10.1016/j.eswa.2014.03.019]
[http://dx.doi.org/10.1016/j.chb.2012.12.018]
[http://dx.doi.org/10.1016/j.eswa.2013.02.009]
[http://dx.doi.org/10.1016/j.jnca.2012.05.009]
[http://dx.doi.org/10.1016/j.dss.2010.08.020]
[http://dx.doi.org/10.1016/j.eswa.2010.04.044]
[http://dx.doi.org/10.1016/j.future.2009.07.012]
[http://dx.doi.org/10.1109/ICC.2008.335]
[http://dx.doi.org/10.1109/ICCCCT.2010.5670593]
[http://dx.doi.org/10.1016/j.procs.2017.05.352]
[http://dx.doi.org/10.1155/2017/5421046]
[http://dx.doi.org/10.1016/j.procs.2018.03.053]
[http://dx.doi.org/10.1016/j.cose.2019.03.018]
[http://dx.doi.org/10.1016/j.cose.2019.02.004]
[http://dx.doi.org/10.1109/ISCAS.2006.1693424]
[http://dx.doi.org/10.1109/ICCP.2010.5606466]
[http://dx.doi.org/10.1007/s10462-010-9166-x]
[http://dx.doi.org/10.4018/978-1-59140-051-6]
[http://dx.doi.org/10.1016/j.procs.2018.05.103]
[http://dx.doi.org/10.1504/IJICS.2018.089594]
[http://dx.doi.org/10.1007/s11235-017-0334-z]
[http://dx.doi.org/10.2139/ssrn.3418542]
[http://dx.doi.org/10.1088/1755-1315/252/4/042051]
[http://dx.doi.org/10.1155/2014/425731]
[http://dx.doi.org/10.1016/j.procs.2020.04.116]
[http://dx.doi.org/10.1016/j.proeng.2012.01.930]
[http://dx.doi.org/10.1504/IJISTA.2016.076495]
[http://dx.doi.org/10.14569/IJACSA.2017.080910]
[http://dx.doi.org/10.1109/CSCI.2016.0214]