Abstract
Background: Gene expression matrix produced by DNA microarray technology inexorably contains multiple missing entries due to experimental problems. Prediction of missing values in gene expression matrix is essential as algorithms analyzing gene expression typically need a matrix without missing values.
Objective: The objective of this paper is to present a novel bicluster-based sequential interpolation imputation method to predict missing values in gene expression data. Method: For each missing entry, this method first generates a bicluster by selecting a number of correlated genes and samples for that missing position and then applies interpolation based approximation technique on that bicluster. This method starts imputation from the gene with the minimum number of missing values and continues imputation by reusing the already imputed values. Results: The result of the proposed method is compared with seven well known existing estimation techniques over nine different data sets. The metric used to compare the performance are normalized root mean square error (NRMSE) and average distance between partition errors (ADBPE). Conclusion: Performance of the proposed method is observed to be better than the well-known methods in a variety of data sets. The novelty of this approach lies in applying interpolation technique in the identified local structure (bicluster) for predicting missing values.Keywords: Biclustering, DNA microarray, gene expression data, missing value estimation.
Graphical Abstract