Gene Set Correlation Analysis and Visualization Using Gene Expression Data

Chen-An       Tsai; James    J.    Chen

doi:10.2174/1574893615999200629124444

Abstract

Background: Gene set enrichment analyses (GSEA) provide a useful and powerful approach to identify differentially expressed gene sets with prior biological knowledge. Several GSEA algorithms have been proposed to perform enrichment analyses on groups of genes. However, many of these algorithms have focused on the identification of differentially expressed gene sets in a given phenotype.

Objective: In this paper, we propose a gene set analytic framework, Gene Set Correlation Analysis (GSCoA), that simultaneously measures within and between gene sets variation to identify sets of genes enriched for differential expression and highly co-related pathways.

Methods: We apply co-inertia analysis to the comparisons of cross-gene sets in gene expression data to measure the co-structure of expression profiles in pairs of gene sets. Co-inertia analysis (CIA) is one multivariate method to identify trends or co-relationships in multiple datasets, which contain the same samples. The objective of CIA is to seek ordinations (dimension reduction diagrams) of two gene sets such that the square covariance between the projections of the gene sets on successive axes is maximized. Simulation studies illustrate that CIA offers superior performance in identifying corelationships between gene sets in all simulation settings when compared to correlation-based gene set methods.

Result and Conclusion: We also combine between-gene set CIA and GSEA to discover the relationships between gene sets significantly associated with phenotypes. In addition, we provide a graphical technique for visualizing and simultaneously exploring the associations between and within gene sets and their interaction and network. We then demonstrate the integration of within and between gene sets variation using CIA and GSEA, applied to the p53 gene expression data using the c2 curated gene sets. Ultimately, the GSCoA approach provides an attractive tool for the identification and visualization of novel associations between pairs of gene sets by integrating corelationships between gene sets into gene set analysis.

Keywords: Gene set enrichment analyses, gene set correlation analysis, co-inertia analysis, covariance, p53 gene expression data, gene set analysis.

« Previous Next »

Graphical Abstract

[1] 
Mootha VK, Lindgren CM, Eriksson KF, et al. PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet  2003; 34(3): 267-73.
[http://dx.doi.org/10.1038/ng1180] [PMID:  12808457] 
[2] 
Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA  2005; 102(43): 15545-50.
[http://dx.doi.org/10.1073/pnas.0506580102] [PMID:  16199517] 
[3] 
Goeman JJ, van de Geer SA, de Kort F, van Houwelingen HC. A global test for groups of genes: testing association with a clinical outcome. Bioinformatics  2004; 20(1): 93-9.
[http://dx.doi.org/10.1093/bioinformatics/btg382] [PMID:  14693814] 
[4] 
Tian L, Greenberg SA, Kong SW, Altschuler J, Kohane IS, Park PJ. Discovering statistically significant pathways in expression profiling studies. Proc Natl Acad Sci USA  2005; 102(38): 13544-9.
[http://dx.doi.org/10.1073/pnas.0506577102] [PMID:  16174746] 
[5] 
Tomfohr J, Lu J, Kepler TB. Pathway level analysis of gene expression using singular value decomposition. BMC Bioinformatics  2005; 6: 225.
[http://dx.doi.org/10.1186/1471-2105-6-225] [PMID:  16156896] 
[6] 
Chen JJ, Lee T, Delongchamp RR, Chen T, Tsai CA. Significance analysis of groups of genes in expression profiling studies. Bioinformatics  2007; 23(16): 2104-12.
[http://dx.doi.org/10.1093/bioinformatics/btm310] [PMID:  17553853] 
[7] 
Dinu I, Potter JD, Mueller T, et al. Improving gene set analysis of microarray data by SAM-GS. BMC Bioinformatics  2007; 8: 242.
[http://dx.doi.org/10.1186/1471-2105-8-242] [PMID:  17612399] 
[8] 
Efron B, Tibshirani R. On testing the significance of sets of genes. Ann Appl Stat  2007; 1: 107-29.
[http://dx.doi.org/10.1214/07-AOAS101] 
[9] 
Adewale AJ, Dinu I, Potter JD, Liu Q, Yasui Y. Pathway analysis of microarray data via regression. J Comput Biol  2008; 15(3): 269-77.
[http://dx.doi.org/10.1089/cmb.2008.0002] [PMID:  18331198] 
[10] 
Hummel M, Meister R, Mansmann U. GlobalANCOVA: exploration and assessment of gene group effects. Bioinformatics  2008; 24(1): 78-85.
[http://dx.doi.org/10.1093/bioinformatics/btm531] [PMID:  18024976] 
[11] 
Goeman JJ, Mansmann U. Multiple testing on the directed acyclic graph of gene ontology. Bioinformatics  2008; 24(4): 537-44.
[http://dx.doi.org/10.1093/bioinformatics/btm628] [PMID:  18203773] 
[12] 
Tsai CA, Chen JJ. Multivariate analysis of variance test for gene set analysis. Bioinformatics  2009; 25(7): 897-903.
[http://dx.doi.org/10.1093/bioinformatics/btp098] [PMID:  19254923] 
[13] 
Goeman JJ, Bühlmann P. Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics  2007; 23(8): 980-7.
[http://dx.doi.org/10.1093/bioinformatics/btm051] [PMID:  17303618] 
[14] 
Nam D, Kim SY. Gene-set approach for expression pattern analysis. Brief Bioinform  2008; 9(3): 189-97.
[http://dx.doi.org/10.1093/bib/bbn001] [PMID:  18202032] 
[15] 
Brown VM, Ossadtchi A, Khan AH, Cherry SR, Leahy RM, Smith DJ. High-throughput imaging of brain gene expression. Genome Res  2002; 12(2): 244-54.
[http://dx.doi.org/10.1101/gr.204102] [PMID:  11827944] 
[16] 
Lai Y, Wu B, Chen L, Zhao H. A statistical method for identifying differential gene-gene co-expression patterns. Bioinformatics  2004; 20(17): 3146-55.
[http://dx.doi.org/10.1093/bioinformatics/bth379] [PMID:  15231528] 
[17] 
Kostka D, Spang R. Finding disease specific alterations in the co-expression of genes. Bioinformatics  2004; 20(Suppl. 1): i194-9.
[http://dx.doi.org/10.1093/bioinformatics/bth909] [PMID:  15262799] 
[18] 
Oldham MC, Horvath S, Geschwind DH. Conservation and evolution of gene coexpression networks in human and chimpanzee brains. Proc Natl Acad Sci USA  2006; 103(47): 17973-8.
[http://dx.doi.org/10.1073/pnas.0605938103] [PMID:  17101986] 
[19] 
Watson M. CoXpress: differential co-expression in gene expression data. BMC Bioinformatics  2006; 7: 509.
[http://dx.doi.org/10.1186/1471-2105-7-509] [PMID:  17116249] 
[20] 
Choi Y, Kendziorski C. Statistical methods for gene set co-expression analysis. Bioinformatics  2009; 25(21): 2780-6.
[http://dx.doi.org/10.1093/bioinformatics/btp502] [PMID:  19689953] 
[21] 
Hong S, Zhou Z, Zio E, Hong K. Condition assessment for the performance degradation of bearing based on a combinatorial feature extraction method. Digit Signal Process  2014; 27: 159-66.
[http://dx.doi.org/10.1016/j.dsp.2013.12.010] 
[22] 
Hong S, Zhou Z, Zio E, Wang W. An adaptive method for health trend prediction of rotating bearings. Digit Signal Process  2014; 35: 159-66.
[http://dx.doi.org/10.1016/j.dsp.2013.12.010] 
[23] 
Culhane AC, Perrière G, Considine EC, Cotter TG, Higgins DG. Between-group analysis of microarray data. Bioinformatics  2002; 18(12): 1600-8.
[http://dx.doi.org/10.1093/bioinformatics/18.12.1600] [PMID:  12490444] 
[24] 
Kim TM, Yim SH, Jeong YB, Jung YC, Chung YJ. PathCluster: a framework for gene set-based hierarchical clustering. Bioinformatics  2008; 24(17): 1957-8.
[http://dx.doi.org/10.1093/bioinformatics/btn357] [PMID:  18628289] 
[25] 
Donato M, Xu Z, Tomoiaga A, et al. Analysis and correction of crosstalk effects in pathway analysis. Genome Res  2013; 23(11): 1885-93.
[http://dx.doi.org/10.1101/gr.153551.112] [PMID:  23934932] 
[26] 
Del Sorbo MR, Balzano W, Donato M, Draghici S. Assessing co-regulation of directly linked genes in biological networks using microarray time series analysis. Biosystems  2013; 114(2): 149-54.
[http://dx.doi.org/10.1016/j.biosystems.2013.07.006] [PMID:  23876997] 
[27] 
Choi JK, Yu U, Yoo OJ, Kim S. Differential coexpression analysis using microarray data and its application to human cancer. Bioinformatics  2005; 21(24): 4348-55.
[http://dx.doi.org/10.1093/bioinformatics/bti722] [PMID:  16234317] 
[28] 
Rahmatallah Y, Emmert-Streib F, Glazko G. Gene Sets Net Correlations Analysis (GSNCA): a multivariate differential coexpression test for gene sets. Bioinformatics  2014; 30(3): 360-8.
[http://dx.doi.org/10.1093/bioinformatics/btt687] [PMID:  24292935] 
[29] 
Tomoiaga A, Westfall P, Donato M, et al. Pathway crosstalk effects: shrinkage and disentanglement using a Bayesian hierarchical model. Stat Biosci  2016; 8(2): 374-94.
[http://dx.doi.org/10.1007/s12561-016-9160-1] 
[30] 
Dolédec S, Chessel D. Co-inertia analysis: an alternative method for studying species-environment relationships. Freshw Biol  1994; 31: 277-94.
[http://dx.doi.org/10.1111/j.1365-2427.1994.tb01741.x] 
[31] 
Thioulouse J, Lobry JR. Co-inertia analysis of amino-acid physico-chemical properties and protein composition with the ADE package. Comput Appl Biosci  1995; 11(3): 321-9.
[http://dx.doi.org/10.1093/bioinformatics/11.3.321] [PMID:  7583702] 
[32] 
Jeffery IB, Madden SF, McGettigan PA, Perrière G, Culhane AC, Higgins DG. Integrating transcription factor binding site information with gene expression datasets. Bioinformatics  2007; 23(3): 298-305.
[http://dx.doi.org/10.1093/bioinformatics/btl597] [PMID:  17127681] 
[33] 
Culhane AC, Perrière G, Higgins DG. Cross-platform comparison and visualisation of gene expression data using co-inertia analysis. BMC Bioinformatics  2003; 4: 59.
[http://dx.doi.org/10.1186/1471-2105-4-59] [PMID:  14633289] 
[34] 
Culhane AC, Thioulouse J, Perrière G, Higgins DG. MADE4: an R package for multivariate analysis of gene expression data. Bioinformatics  2005; 21(11): 2789-90.
[http://dx.doi.org/10.1093/bioinformatics/bti394] [PMID:  15797915] 
[35] 
Thioulouse J, Chessel D, Dolédec S, et al. ADE-4: a multivariate analysis and graphical display software. Stat Comput  1997; 7(1): 75-83.
[http://dx.doi.org/10.1023/A:1018513530268] 
[36] 
Chessel D, Dufour AB, Thioulouse J. The ADE4 package-I: One-table methods. R News  2004; 4(1): 5-10.
[37] 
Dray S, Chessel D, Thioulouse J. Co-inertia analysis and the linking of ecological tables. Ecology  2003; 84(11): 3078-89.
[http://dx.doi.org/10.1890/03-0178] 
[38] 
Totani L, Piccoli A, Dell’Elba G, et al. Phosphodiesterase type 4 blockade prevents platelet-mediated neutrophil recruitment at the site of vascular injury. Arterioscler Thromb Vasc Biol  2014; 34(8): 1689-96.
[http://dx.doi.org/10.1161/ATVBAHA.114.303939] [PMID:  24925970] 
[39] 
Trivedi CM, Patel RC, Patel CV. Homeobox gene HOXA9 inhibits nuclear factor-κ B dependent activation of endothelium. Atherosclerosis  2007; 195(2): e50-60.
[http://dx.doi.org/10.1016/j.atherosclerosis.2007.04.055] [PMID:  17586512] 
[40] 
Huang RS, Duan S, Bleibel WK, et al. A genome-wide approach to identify genetic variants that contribute to etoposide-induced cytotoxicity. Proc Natl Acad Sci USA  2007; 104(23): 9758-63.
[http://dx.doi.org/10.1073/pnas.0703736104] [PMID:  17537913] 
[41] 
Pickrell JK, Marioni JC, Pai AA, et al. Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature  2010; 464(7289): 768-72.
[http://dx.doi.org/10.1038/nature08872] [PMID:  20220758] 
[42] 
Hänzelmann S, Castelo R, Guinney J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics  2013; 14: 7.
[http://dx.doi.org/10.1186/1471-2105-14-7] [PMID:  23323831] 
[43] 
Wang C, Gong B, Bushel PR, et al. The concordance between RNA-seq and microarray data depends on chemical treatment and transcript abundance. Nat Biotechnol  2014; 32(9): 926-32.
[http://dx.doi.org/10.1038/nbt.3001] [PMID:  25150839] 

Rights & Permissions Print Cite

Article Metrics

28

1

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/1574893615999200629124444	Print ISSN 1574-8936
Publisher Name Bentham Science Publisher	Online ISSN 2212-392X

Current Bioinformatics

Gene Set Correlation Analysis and Visualization Using Gene Expression Data

Abstract Play Pause

Graphical Abstract

Related Journals

Related Books

Abstract