Abstract
Background: DNA microarray technology allows researchers to measure the expression levels of thousands of genes simultaneously. The main objective of microarray gene expression (GE) data analysis is to detect biomarker genes that are Differentially Expressed (DE) between two or more experimental groups/conditions.
Objective: There are some popular statistical methods in the literature for the selection of biomarker genes. However, most of them often produce misleading results in presence of outliers. Therefore, in this study, we introduce a robust approach to overcome the problems of classical methods.
Methods: We use median and median absolute deviation (MAD) for our robust procedure. In this procedure, a gene was considered as outlying gene if at least one of the expressions of this gene does not belong to a certain interval of the proposed outlier detection rule. Otherwise, this gene was considered as a non-outlying gene.
Results: We investigate the performance of the proposed method in a comparison of the traditional method using both simulated and real gene expression data analysis. From a real colon cancer gene expression data analysis, the proposed method detected an additional fourteen (14) DE genes that were not detected by the traditional methods. Using the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis, we observed that these additional 14 DE genes are involved in three important metabolic pathways of cancer disease. The proposed method also detected nine (9) additional DE genes from another head-and-neck cancer gene expression data analysis; those involved in top ten metabolic pathways obtain from the KEGG pathway database.
Conclusion: The simulation as well as real cancer gene expression datasets results show better performance with our proposed procedure. Therefore, the additional genes detected by the proposed procedure require further wet lab validation.
Keywords: Gene expression data, outlier detection and modification, DE gene, MAD and robustness, KEGG.
Graphical Abstract