Abstract
This article reviews and compares different existing statistical tests utilized in significance analysis of gene expression from DNA microarray data. Microarray gene expression study facilitates the investigation of genes that are differentially expressed with respect to different classes of samples. Many gene expression studies also aim to find groups of genes that do something together, or to find out molecular similarities among a group of samples. In all respects, it is very important to know whether there is a difference between two sets of results, and whether that difference is likely to occur due to random variations of the datasets. Hence instead of simple data analysis, statistical tests are the proper way to check statistical relevance. Basically, the statistical significance test is compulsory for yielding an ordered list of relevant genes in terms of differential expression to the investigator. This article surveys a few important statistical tests which include both parametric and nonparametric tests and compares those statistical tests with respect to different performance metrics. The statistical tests have been applied on several artificial and real-life microarray gene expression datasets. Most of the time, real-life datasets contain noise and noisy data can have undesirable effect on the results of any data mining analysis. Therefore, performance analysis on noisy datasets for different statistical tests has also been performed to facilitate the comparative study. Furthermore, correlation analysis of the resultant genes from different statistical methods has been accomplished.
Keywords: Analysis, correlation analysis, eigenvector, nonparametric tests, P-value, parametric tests, principal component, statistical tests.
Graphical Abstract