Missing Value Imputation and Estimation Methods for Arrhythmia Feature Selection Classification Using Machine Learning Algorithms

Abstract

Electrocardiogram signal analysis is very difficult to classify cardiac arrhythmia using machine learning methods. The ECG datasets normally come with multiple missing values. The reason for the missing values is the faults or distortion. When performing data mining, missing value imputation is the biggest task for data preprocessing. This problem could arise due to incomplete medical datasets if the incomplete missing values and cases were removed from the original database. To produce a good quality dataset for better analyzing the clinical trials, the suitable missing value imputation method is used. In this paper, we explore the different machine-learning techniques for the computed missing value in the electrocardiogram dataset. To estimate the missing imputation values, the collected data contains feature dimensions with their attributes. The experiments to compute the missing values in the dataset are carried out by using the four feature selection methods and imputation methods. The implemented results are shown by combined features using IG (information gain), GA (genetic algorithm) and the different machine learning classifiers such as NB (naïve bayes), KNN (K-nearest neighbor), MLP (Multilayer perception), and RF (Random forest). The GA (genetic algorithm) and IG (information gain) are the best suitable methods for obtaining the results on lower dimensional datasets with RMSE (Root mean square error. It efficiently calculates the best results for missing values. These four classifiers are used to analyze the impact of imputation methods. The best results for missing rate 10% to 40% are obtained by NB that is 0.657, 0.6541, 0.66, 0.657, and 0.657, as computed by RMSE (Root mean Square error). It means that error will efficiently reduced by naïve bayes classifier.

Cite as