Abstract
Background: Enhancers are cis-regulatory elements that enhance gene expression on DNA sequences. Since most of enhancers are located far from transcription start sites, it is difficult to identify them. As other regulatory elements, the regions around enhancers contain a variety of features, which can help in enhancer recognition.
Objective: The classification power of features differs significantly, the performances of existing methods that use one or a few features for identifying enhancer vary greatly. Therefore, evaluating the classification power of each feature can improve the predictive performance of enhancers.
Methods: We present an evaluation method based on Information Gain (IG) that captures the entropy change of enhancer recognition according to features. To validate the performance of our method, experiments using the Single Feature Prediction Accuracy (SFPA) were conducted on each feature.
Results: The average IG values of the sequence feature, transcriptional feature and epigenetic feature are 0.068, 0.213, and 0.299, respectively. Through SFPA, the average AUC values of the sequence feature, transcriptional feature and epigenetic feature are 0.534, 0.605, and 0.647, respectively. The verification results are consistent with our evaluation results.
Conclusion: This IG-based method can effectively evaluate the classification power of features for identifying enhancers. Compared with sequence features, epigenetic features are more effective for recognizing enhancers.
Keywords: Enhancer, gene expression regulation, sequence features, transcriptional features, epigenetic features, information gain.
Graphical Abstract
[http://dx.doi.org/10.1186/s13073-014-0085-3 ] [PMID: 25473424]
[http://dx.doi.org/10.1038/nrg.2016.4 ] [PMID: 26948815]
[http://dx.doi.org/10.1016/S1097-2765(02)00786-4 ] [PMID: 12504020]
[http://dx.doi.org/10.2174/1566523218666181010101114]
[http://dx.doi.org/10.1016/j.tibs.2014.02.007 ] [PMID: 24674738]
[http://dx.doi.org/10.1016/j.tig.2012.02.008 ] [PMID: 22487374]
[http://dx.doi.org/10.1109/TCBB.2019.2904965]
[http://dx.doi.org/10.1101/gr.220673.117 ] [PMID: 29025895]
[http://dx.doi.org/10.1038/ng1966 ] [PMID: 17277777]
[http://dx.doi.org/10.1038/nature07730 ] [PMID: 19212405]
[http://dx.doi.org/10.1126/science.1259418 ] [PMID: 25678556]
[http://dx.doi.org/10.3389/fgene.2019.00226 ] [PMID: 31001311]
[http://dx.doi.org/10.1038/nature05295 ] [PMID: 17086198]
[http://dx.doi.org/10.1093/bioinformatics/bty002 ] [PMID: 29365045]
[http://dx.doi.org/10.1093/bioinformatics/btz254]
[http://dx.doi.org/10.1038/nature10006 ] [PMID: 21572438]
[http://dx.doi.org/10.1038/nature09906 ] [PMID: 21441907]
[http://dx.doi.org/10.1093/bioinformatics/bts028 ] [PMID: 22247280]
[http://dx.doi.org/10.1186/1471-2164-9-S2-S22 ] [PMID: 18831788]
[http://dx.doi.org/10.1038/35057062 ] [PMID: 11237011]
[http://dx.doi.org/10.1186/gb-2008-9-9-r137 ] [PMID: 18798982]
[http://dx.doi.org/10.1101/gr.135350.111 ] [PMID: 22955987]
[http://dx.doi.org/10.1093/nar/gkh103 ] [PMID: 14681465]
[http://dx.doi.org/10.1093/nar/24.1.238 ] [PMID: 8594589]
[PMID: 23193258]
[http://dx.doi.org/10.1093/bioinformatics/btq248 ] [PMID: 20453004]