Abstract
Objective: Dementia is a progressive neurodegenerative brain disease emerging as a global health problem in adults aged 65 years or above, resulting in the death of nerve cells. The elimination of redundant and irrelevant features from the datasets is however necessary for accurate detection thus timely treatment of dementia.
Methods: For this purpose, an ensemble approach of univariate and multivariate feature selection methods has been proposed in this study. A comparison of four univariate feature selection techniques (t-Test, Wilcoxon, Entropy and ROC) and six multivariate feature selection approaches (ReliefF, Bhattacharyya, CFSSubsetEval, ClassifierAttributeEval, CorrelationAttributeEval, OneRAttributeEval) has been performed. The ensemble of best univariate & multivariate filter algorithms is proposed which helps in acquiring a subset of features that includes only relevant and non-redundant features. The classification is performed using Naïve Bayes, k-NN, and Random Forest algorithms.
Results: Experimental results show that t-Test and ReliefF feature selection is capable of selecting 10 relevant features that give the same accuracy when all features are considered. In addition to it, the accuracy obtained using k-NN with an ensemble approach is 99.96%. The statistical significance of the method has been established using Friedman’s statistical test.
Conclusion: The new ranking criteria computed by the ensemble method efficiently eliminate the insignificant features and reduces the computational cost of the algorithm. The ensemble method has been compared to the other approaches for ensuring the superiority of the proposed model.
Discussion: The percentage gain in accuracy for all three classifiers, Naïve Bayes, k-NN, and Random Forest shows a remarkable difference noted down for the percentage gain in the accuracies after applying feature selection using Naïve Bayes and k-NN. Using univariate filter selection methods, the t-test is outshining among all the methods while selecting only 10 feature subsets.
Keywords: Dementia, machine learning, feature selection, univariate filters, multivariate filters, classification accuracy.
Graphical Abstract
[http://dx.doi.org/10.1007/s00127-017-1410-0] [PMID: 28698926]
[http://dx.doi.org/10.1109/ICCIC.2014.7238300]
[http://dx.doi.org/10.1109/ICITCS.2015.7292973]
[http://dx.doi.org/10.1109/SocialCom.2013.127]
[http://dx.doi.org/10.1016/j.procs.2018.05.102]
[http://dx.doi.org/10.1155/2015/676129] [PMID: 26576199]
[http://dx.doi.org/10.1109/ICHI.2015.68]
[http://dx.doi.org/10.1016/j.procs.2020.03.190]
[http://dx.doi.org/10.2139/ssrn.3356886]
[http://dx.doi.org/10.1155/2019/2492719] [PMID: 30944718]
[http://dx.doi.org/10.1111/coin.12257]
[http://dx.doi.org/10.1007/s40815-017-0305-2]
[http://dx.doi.org/10.1007/11893295_2]
[http://dx.doi.org/10.1155/2014/479289] [PMID: 25276120]
[http://dx.doi.org/10.1007/3-540-57868-4_57]
[http://dx.doi.org/10.1142/S0219622012500095]
[http://dx.doi.org/10.1023/A:1010933404324]
[http://dx.doi.org/10.1016/0022-3956(75)90026-6] [PMID: 1202204]
[http://dx.doi.org/10.1016/j.asoc.2019.105836]
[http://dx.doi.org/10.3390/app7070651]
[http://dx.doi.org/10.1038/s41598-018-27997-8] [PMID: 29950585]
[http://dx.doi.org/10.1016/j.cmpb.2018.08.016] [PMID: 30337069]
[http://dx.doi.org/10.1016/j.swevo.2011.02.002]
[http://dx.doi.org/10.1080/01621459.1937.10503522]