An Information Gain-based Method for Evaluating the Classification Power of Features Towards Identifying Enhancers

Tianjiao       Zhang; Rongjie       Wang; Qinghua       Jiang; Yadong       Wang

doi:10.2174/1574893614666191120141032

Abstract

Background: Enhancers are cis-regulatory elements that enhance gene expression on DNA sequences. Since most of enhancers are located far from transcription start sites, it is difficult to identify them. As other regulatory elements, the regions around enhancers contain a variety of features, which can help in enhancer recognition.

Objective: The classification power of features differs significantly, the performances of existing methods that use one or a few features for identifying enhancer vary greatly. Therefore, evaluating the classification power of each feature can improve the predictive performance of enhancers.

Methods: We present an evaluation method based on Information Gain (IG) that captures the entropy change of enhancer recognition according to features. To validate the performance of our method, experiments using the Single Feature Prediction Accuracy (SFPA) were conducted on each feature.

Results: The average IG values of the sequence feature, transcriptional feature and epigenetic feature are 0.068, 0.213, and 0.299, respectively. Through SFPA, the average AUC values of the sequence feature, transcriptional feature and epigenetic feature are 0.534, 0.605, and 0.647, respectively. The verification results are consistent with our evaluation results.

Conclusion: This IG-based method can effectively evaluate the classification power of features for identifying enhancers. Compared with sequence features, epigenetic features are more effective for recognizing enhancers.

Keywords: Enhancer, gene expression regulation, sequence features, transcriptional features, epigenetic features, information gain.

« Previous Next »

Graphical Abstract

[1] 
Corradin O, Scacheri PC. Enhancer variants: evaluating functions in common disease. Genome Med  2014; 6(10): 85.
[http://dx.doi.org/10.1186/s13073-014-0085-3 ] [PMID:  25473424] 
[2] 
Li W, Notani D, Rosenfeld MG. Enhancers as non-coding RNA transcription units: recent insights and future perspectives. Nat Rev Genet  2016; 17(4): 207-23.
[http://dx.doi.org/10.1038/nrg.2016.4 ] [PMID:  26948815] 
[3] 
Hatzis P, Talianidis I. Dynamics of enhancer-promoter communication during differentiation-induced gene activation. Mol Cell  2002; 10(6): 1467-77.
[http://dx.doi.org/10.1016/S1097-2765(02)00786-4 ] [PMID:  12504020] 
[4] 
Cheng L, Hu Y. Human Disease System Biology. Curr Gene Ther  2018; 18(5): 255-6.
[http://dx.doi.org/10.2174/1566523218666181010101114] 
[5] 
Lam MTY, Li W, Rosenfeld MG, Glass CK. Enhancer RNAs and regulated transcriptional programs. Trends Biochem Sci  2014; 39(4): 170-82.
[http://dx.doi.org/10.1016/j.tibs.2014.02.007 ] [PMID:  24674738] 
[6] 
Buecker C, Wysocka J. Enhancers as information integration hubs in development: lessons from genomics. Trends Genet  2012; 28(6): 276-84.
[http://dx.doi.org/10.1016/j.tig.2012.02.008 ] [PMID:  22487374] 
[7] 
Peng J, Zhu L, Wang Y, et al. Mining relationships among multiple entities in biological networks  IEEE/ACM Trans Comput Biol Bioinform  2020; 17(3): 769-.
[http://dx.doi.org/10.1109/TCBB.2019.2904965] 
[8] 
Teng M, Irizarry RA. Accounting for GC-content bias reduces systematic errors and batch effects in ChIP-seq data. Genome Res  2017; 27(11): 1930-8.
[http://dx.doi.org/10.1101/gr.220673.117 ] [PMID:  29025895] 
[9] 
Heintzman ND, Stuart RK, Hon G, et al. Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat Genet  2007; 39(3): 311-8.
[http://dx.doi.org/10.1038/ng1966 ] [PMID:  17277777] 
[10] 
Visel A, Blow MJ, Li Z, et al. ChIP-seq accurately predicts tissue-specific activity of enhancers. Nature  2009; 457(7231): 854-8.
[http://dx.doi.org/10.1038/nature07730 ] [PMID:  19212405] 
[11] 
Arner E, Daub CO, Vitting-Seerup K, et al. FANTOM consortium. transcribed enhancers lead waves of coordinated transcription in transitioning mammalian cells. Science  2015; 347(6225): 1010-4.
[http://dx.doi.org/10.1126/science.1259418 ] [PMID:  25678556] 
[12] 
Peng J, Guan J, Shang X. Predicting Parkinson’s disease genes based on Node2vec and autoencoder. Front Genet  2019; 10: 226.
[http://dx.doi.org/10.3389/fgene.2019.00226 ] [PMID:  31001311] 
[13] 
Pennacchio LA, Ahituv N, Moses AM, et al. In vivo enhancer analysis of human conserved non-coding sequences. Nature  2006; 444(7118): 499-502.
[http://dx.doi.org/10.1038/nature05295 ] [PMID:  17086198] 
[14] 
Cheng L, Hu Y, Sun J, Zhou M, Jiang Q. DincRNA: a comprehensive web-based bioinformatics toolkit for exploring disease associations and ncRNA function. Bioinformatics  2018; 34(11): 1953-6.
[http://dx.doi.org/10.1093/bioinformatics/bty002 ] [PMID:  29365045] 
[15] 
Peng J, Hui W, Li Q, et al. A learning-based framework for miRNA-disease association identification using neural networks. Bioinformatics Oxf Engl  2019; 35(21): 4364-71.
[http://dx.doi.org/10.1093/bioinformatics/btz254] 
[16] 
Wang D, Garcia-Bassets I, Benner C, et al. Reprogramming transcription by distinct classes of enhancers functionally defined by eRNA. Nature  2011; 474(7351): 390-4.
[http://dx.doi.org/10.1038/nature10006 ] [PMID:  21572438] 
[17] 
Ernst J, Kheradpour P, Mikkelsen TS, et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature  2011; 473(7345): 43-9.
[http://dx.doi.org/10.1038/nature09906 ] [PMID:  21441907] 
[18] 
Göke J, Schulz MH, Lasserre J, Vingron M. Estimation of pairwise sequence similarity of mammalian enhancers with word neighbourhood counts. Bioinformatics  2012; 28(5): 656-63.
[http://dx.doi.org/10.1093/bioinformatics/bts028 ] [PMID:  22247280] 
[19] 
Wang G, Wang Y, Feng W, et al. Transcription factor and microRNA regulation in androgen-dependent and -independent prostate cancer cells. BMC Genomics  2008; 9(Suppl. 2): S22.
[http://dx.doi.org/10.1186/1471-2164-9-S2-S22 ] [PMID:  18831788] 
[20] 
Lander ES, Linton LM, Birren B, et al. International human genome sequencing consortium, initial sequencing and analysis of the human genome. Nature  2001; 409(6822): 860-921.
[http://dx.doi.org/10.1038/35057062 ] [PMID:  11237011] 
[21] 
Zhang Y, Liu T, Meyer CA, et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol  2008; 9(9): R137.
[http://dx.doi.org/10.1186/gb-2008-9-9-r137 ] [PMID:  18798982] 
[22] 
Harrow J, Frankish A, Gonzalez JM, et al. GENCODE: the reference human genome annotation for the ENCODE project. Genome Res  2012; 22(9): 1760-74.
[http://dx.doi.org/10.1101/gr.135350.111 ] [PMID:  22955987] 
[23] 
Karolchik D, Hinrichs AS, Furey TS, et al. The UCSC table browser data retrieval tool. Nucleic Acids Res  2004; 32(Database issue): D493-6.
[http://dx.doi.org/10.1093/nar/gkh103 ] [PMID:  14681465] 
[24] 
Wingender E, Dietze P, Karas H, Knüppel R. TRANSFAC: a database on transcription factors and their DNA binding sites. Nucleic Acids Res  1996; 24(1): 238-41.
[http://dx.doi.org/10.1093/nar/24.1.238 ] [PMID:  8594589] 
[25] 
Barrett T, Wilhite SE, Ledoux P, et al. NCBI GEO: archive for functional genomics data sets--update. Nucleic Acids Res  2013; 41(Database issue): D991-5.
[PMID:  23193258] 
[26] 
Firpi HA, Ucar D, Tan K. Discover regulatory DNA elements using chromatin signatures and artificial neural network. Bioinformatics  2010; 26(13): 1579-86.
[http://dx.doi.org/10.1093/bioinformatics/btq248 ] [PMID:  20453004] 

Rights & Permissions Print Cite

Article Metrics

22

1

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/1574893614666191120141032	Print ISSN 1574-8936
Publisher Name Bentham Science Publisher	Online ISSN 2212-392X

Current Bioinformatics

An Information Gain-based Method for Evaluating the Classification Power of Features Towards Identifying Enhancers

Abstract Play Pause

Graphical Abstract

Related Journals

Related Books

Abstract