Abstract
Background: Popular gene set enrichment analysis approaches assumed that genes in the gene set contributed to the statistics equally. However, the genes in the transcription factors (TFs) derived gene sets, or gene sets constructed by TF targets identified by the ChIP-Seq experiment, have a rank attribute, as each of these genes have been assigned with a p-value which indicates the true or false possibilities of the ownerships of the genes belong to the gene sets.
Objectives: Ignoring the rank information during the enrichment analysis will lead to improper statistical inference. We address this issue by developing of new method to test the significance of ranked gene sets in genome-wide transcriptome profiling data.
Methods: A method was proposed by first creating ranked gene sets and gene lists and then applying weighted Kendall's tau rank correlation statistics to the test. After introducing top-down weights to the genes in the gene set, a new software called "Flaver" was developed.
Results: Theoretical properties of the proposed method were established, and its differences over the GSEA approach were demonstrated when analyzing the transcriptome profiling data across 55 human tissues and 176 human cell-lines. The results indicated that the TFs identified by our method have higher tendency to be differentially expressed across the tissues analyzed than its competitors. It significantly outperforms the well-known gene set enrichment analyzing tools, GOStats (9%) and GSEA (17%), in analyzing well-documented human RNA transcriptome datasets.
Conclusions: The method is outstanding in detecting gene sets of which the gene ranks were correlated with the expression levels of the genes in the transcriptome data.
Graphical Abstract
[http://dx.doi.org/10.1038/nrg2484] [PMID: 19015660]
[http://dx.doi.org/10.1371/journal.pone.0190152] [PMID: 29267363]
[http://dx.doi.org/10.1039/c3mb70242a] [PMID: 23942525]
[http://dx.doi.org/10.1093/nar/gkaa1113] [PMID: 33290552]
[http://dx.doi.org/10.1038/nrg2641] [PMID: 19736561]
[http://dx.doi.org/10.1186/gb-2003-5-1-201] [PMID: 14709165]
[http://dx.doi.org/10.2174/1574893616666210621100335]
[http://dx.doi.org/10.1093/nar/gkz446] [PMID: 31114921]
[http://dx.doi.org/10.1186/s12859-021-04357-4] [PMID: 34530727]
[http://dx.doi.org/10.1093/bioinformatics/btq466] [PMID: 20709693]
[http://dx.doi.org/10.3389/fgene.2020.00654] [PMID: 32695141]
[http://dx.doi.org/10.1073/pnas.0506580102] [PMID: 16199517]
[http://dx.doi.org/10.1093/bioinformatics/btl567] [PMID: 17098774]
[http://dx.doi.org/10.1089/omi.2011.0118] [PMID: 22455463]
[http://dx.doi.org/10.1093/nar/gkx356] [PMID: 28472511]
[http://dx.doi.org/10.1371/journal.pcbi.1009773] [PMID: 35671296]
[http://dx.doi.org/10.1093/bioinformatics/btr064] [PMID: 21330290]
[http://dx.doi.org/10.1093/nar/gkp464]
[http://dx.doi.org/10.1016/S0167-7152(98)00006-6]
[http://dx.doi.org/10.1111/j.2517-6161.1995.tb02031.x]
[http://dx.doi.org/10.1126/science.1260419] [PMID: 25613900]
[http://dx.doi.org/10.1016/j.cels.2015.12.004] [PMID: 26771021]
[http://dx.doi.org/10.1214/aoms/1177728170]
[http://dx.doi.org/10.1016/B978-012642350-1/50022-9]