Abstract
Aims: Propose an imputation measure for filling missing data values so as to make the incomplete medical datasets as complete datasets. Apply this imputation measure on imputed datasets to achieve improved classifier accuracies.
Objective: The basic intention of the present study is to present an imputation measure to find the proximity between medical records and an approach for imputation of missing values in medical datasets to improve the accuracy of existing classifiers.
Methods: The performance of proposed approach is compared to existing approaches with respect to classifier accuracy and also by performing non-parametric test called Wilcoxon test.
Results & Conclusion: Experiments are conducted by considering three benchmark datasets CLEVALAND, PIMA, ECOLI and by applying proposed imputation technique with KNN, J48 and SMO classifiers and classifier accuracies are determined. The results obtained are then compared to thirteen existing benchmark imputation techniques available in KEEL repository. Experiment results proved the importance of the proposed imputation technique.
Keywords: Imputation, missing value, classification, accuracy, cross fold validation, RBFN.
Graphical Abstract