Abstract
Gene set enrichment analysis (GSEA) is a statistical method to determine if predefined sets of genes are differentially expressed in different phenotypes. Predefined gene sets may be genes in a known metabolic pathway, located in the same cytogenetic band, sharing the same Gene Ontology category, or any user-defined set. In microarray experiments where no single gene shows statistically significant differential expression between phenotypes, GSEA has identified significant differentially expressed sets of genes, even where the average difference in expression between two phenotypes is only 20% for genes in the gene set. The gene set identified in the first GSEA analysis (oxidative phosphorylation genes differentially expressed in diabetic versus non-diabetic patients) was subsequently confirmed by independent laboratory studies published in the New England Journal of Medicine. Since the first paper on GSEA was published, many extensions and alternative methods have been described in the literature. In this paper, we describe the original GSEA algorithm, subsequent extensions and alternatives, results of some of the applications, some limitations of the methods and caveats for users, and possible future research directions. GSEA and related methods are complementary to conventional single-gene methods. Single gene methods work best when individual genes have large effects and there is small variance within the phenotype. GSEA is likely to be more powerful than conventional single-gene methods for studying the large number of common diseases in which many genes each make subtle contributions. It is a tool that deserves to be in the toolbox of bioinformatics practitioners.
Keywords: Microarray, gene expression, pathway, gene set, profile