Abstract
In this article, we discuss the basic challenges of clustering on gene expression data. In particular, we divide the methods of clustering into eight different categories. Then, we present specific characteristics pertinent to each clustering category. We compare the results of 27 clustering/biclustering algorithms on various gene expression datasets using different cluster validation indices. Comparison is made in terms of P -value on the best and three best clusters obtained by each algorithm along with overall results using z-score. Biclustering algorithms are also compared in terms of their capacity in handling overlapping biclusters. Finally, we provide some guidelines for the development of new clustering algorithms for gene expression data analysis. Availability of the software: The software for most of the existing clustering algorithms has been developed using C and Visual Basic languages, and can be executed on the Microsoft Windows platforms. The software may be downloaded as a zip file from http: //www.isical.ac.in/rajat. Then it needs to be installed. Two word files (included in the zip file) need to be consulted before installation and execution of the software.
Keywords: Density-based clustering, functional enrichment, grid-based clustering, hierarchical clustering, partitional clustering, p-value, z-score, gene-expression data, Traditional hierarchical, transcription factors