AthEDL: Identifying Enhancers in Arabidopsis thaliana Using an
Attention-based Deep Learning Method

Yiqiong       Chen; Yujia       Gao; Hejie      Zhou; Yanming      Zuo; Youhua      Zhang; Zhenyu      Yue

doi:10.2174/1574893616666211123094301

Abstract

Background: Enhancers are key cis-function elements of DNA structure that are crucial in gene regulation and the function of a promoter in eukaryotic cells. Availability of accurate identification of the enhancers would facilitate the understanding of DNA functions and their physiological roles. Previous studies have revealed the effectiveness of computational methods for identifying enhancers in other organisms. To date, a huge number of enhancers remain unknown, especially in the field of plant species.

Objective: In this study, the aim is to build an efficient attention-based neural network model for the identification of Arabidopsis thaliana enhancers.

Methods: A sequence-based model using convolutional and recurrent neural networks was proposed for the identification of enhancers. The input DNA sequences are represented as feature vectors by 4-mer. A neural network model consists of CNN and Bi-RNN as sequence feature extractors, and the attention mechanism is suggested to improve the prediction performance.

Results: We implemented an ablation study on validation set to select and evaluate the effectiveness of our proposed model. Moreover, our model showed remarkable performance on the test set achieving the Mcc of 0.955, the AUPRC of 0.638, and the AUROC of 0.837, which are significantly higher than state-of-the-art methods, respectively.

Conclusion: The proposed computational framework aims at solving similar problems in non-coding genomic regions, thereby providing valuable insights into the prediction about the enhancers of plants.

Keywords: Enhancer, Arabidopsis thaliana, DNA sequence, deep learning, attention mechanism, transcriptional regulation.

« Previous Next »

Graphical Abstract

[1] 
Shlyueva D, Stampfel G, Stark A. Transcriptional enhancers: From properties to genome-wide predictions. Nat Rev Genet  2014; 15(4): 272-86.
[http://dx.doi.org/10.1038/nrg3682] [PMID: 24614317] 
[2] 
Calo E, Wysocka J. Modification of enhancer chromatin: what, how, and why? Mol Cell  2013; 49(5): 825-37.
[http://dx.doi.org/10.1016/j.molcel.2013.01.038] [PMID: 23473601] 
[3] 
Pennacchio LA, Bickmore W, Dean A, Nobrega MA, Bejerano G. Enhancers: Five essential questions. Nat Rev Genet  2013; 14(4): 288-95.
[http://dx.doi.org/10.1038/nrg3458] [PMID: 23503198] 
[4] 
Kleinjan DA, van Heyningen V. Long-range control of gene expression: Emerging mechanisms and disruption in disease. Am J Hum Genet  2005; 76(1): 8-32.
[http://dx.doi.org/10.1086/426833] [PMID: 15549674] 
[5] 
Firpi HA, Ucar D, Tan K. Discover regulatory DNA elements using chromatin signatures and artificial neural network. Bioinformatics  2010; 26(13): 1579-86.
[http://dx.doi.org/10.1093/bioinformatics/btq248] [PMID: 20453004] 
[6] 
Kulaeva OI, Nizovtseva EV, Polikanov YS, Ulianov SV, Studitsky VM. Distant activation of transcription: mechanisms of enhancer action. Mol Cell Biol  2012; 32(24): 4892-7.
[http://dx.doi.org/10.1128/MCB.01127-12] [PMID: 23045397] 
[7] 
van Duijvenboden K, de Boer BA, Capon N, Ruijter JM, Christoffels VM. EMERGE: A flexible modelling framework to predict genomic regulatory elements from genomic signatures. Nucleic Acids Res  2016; 44(5): 42.
[http://dx.doi.org/10.1093/nar/gkv1144] 
[8] 
Zhu B, Zhang W, Zhang T, Liu B, Jiang J. Genome-wide prediction and validation of intergenic enhancers in Arabidopsis using open chromatin signatures. Plant Cell  2015; 27(9): 2415-26.
[http://dx.doi.org/10.1105/tpc.15.00537] [PMID: 26373455] 
[9] 
Herrmann C, Van de Sande B, Potier D, Aerts S. i-cisTarget: An integrative genomics method for the prediction of regulatory features and cis-regulatory modules. Nucleic Acids Res  2012; 40(15): 114.
[10] 
Jolma A, Yan J, Whitington T, et al. DNA-binding specificities of human transcription factors. Cell  2013; 152(1-2): 327-39.
[http://dx.doi.org/10.1016/j.cell.2012.12.009] [PMID: 23332764] 
[11] 
Furey TS. ChIP-seq and beyond: new and improved methodologies to detect and characterize protein-DNA interactions. Nat Rev Genet  2012; 13(12): 840-52.
[http://dx.doi.org/10.1038/nrg3306] [PMID: 23090257] 
[12] 
Heintzman ND, Ren B. Finding distal regulatory elements in the human genome. Curr Opin Genet Dev  2009; 19(6): 541-9.
[http://dx.doi.org/10.1016/j.gde.2009.09.006] [PMID: 19854636] 
[13] 
May D, Blow MJ, Kaplan T, et al. Large-scale discovery of enhancers from human heart tissue. Nat Genet  2011; 44(1): 89-93.
[http://dx.doi.org/10.1038/ng.1006] [PMID: 22138689] 
[14] 
Larrañaga P, Calvo B, Santana R, et al. Machine learning in bioinformatics. Brief Bioinform  2006; 7(1): 86-112.
[http://dx.doi.org/10.1093/bib/bbk007] [PMID: 16761367] 
[15] 
Lee D, Karchin R, Beer MA. Discriminative prediction of mammalian enhancers from DNA sequence. Genome Res  2011; 21(12): 2167-80.
[http://dx.doi.org/10.1101/gr.121905.111] [PMID: 21875935] 
[16] 
Ghandi M, Lee D, Mohammad-Noori M, Beer MA. Enhanced regulatory sequence prediction using gapped k-mer features. PLOS Comput Biol  2014; 10(7): e1003711.
[http://dx.doi.org/10.1371/journal.pcbi.1003711] [PMID: 25033408] 
[17] 
Liu B, Fang L, Long R, Lan X, Chou K-C. iEnhancer-2L: A two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition. Bioinformatics  2016; 32(3): 362-9.
[http://dx.doi.org/10.1093/bioinformatics/btv604] [PMID: 26476782] 
[18] 
Liu B, Li K, Huang D-S, Chou K-C. iEnhancer-EL: Identifying enhancers and their strength with ensemble learning approach. Bioinformatics  2018; 34(22): 3835-42.
[http://dx.doi.org/10.1093/bioinformatics/bty458] [PMID: 29878118] 
[19] 
Sethi A, Gu M, Gumusgoz E, et al. Supervised enhancer prediction with epigenetic pattern recognition and targeted validation. Nat Methods  2020; 17(8): 807-14.
[http://dx.doi.org/10.1038/s41592-020-0907-8] [PMID: 32737473] 
[20] 
Lim DY, Khanal J, Tayara H, Chong KT. iEnhancer-RF: Identifying enhancers and their strength by enhanced feature representation using random forest. Chemom Intell Lab Syst  2021; 212: 104284.
[21] 
Min S, Lee B, Yoon S. Deep learning in bioinformatics. Brief Bioinform  2017; 18(5): 851-69.
[PMID: 27473064] 
[22] 
Yang B, Liu F, Ren C, et al. BiRen: Predicting enhancers with a deep-learning-based model using the DNA sequence alone. Bioinformatics  2017; 33(13): 1930-6.
[http://dx.doi.org/10.1093/bioinformatics/btx105] [PMID: 28334114] 
[23] 
Erwin GD, Oksenberg N, Truty RM, et al. Integrating diverse datasets improves developmental enhancer prediction. PLOS Comput Biol  2014; 10(6): e1003677.
[http://dx.doi.org/10.1371/journal.pcbi.1003677] [PMID: 24967590] 
[24] 
Nguyen QH, Nguyen-Vo T-H, Le NQK, Do TTT, Rahardja S, Nguyen BP. iEnhancer-ECNN: Identifying enhancers and their strength using ensembles of convolutional neural networks. BMC Genomics  2019; 20(9)(Suppl. 9): 951.
[http://dx.doi.org/10.1186/s12864-019-6336-3] [PMID: 31874637] 
[25] 
Khanal J, Tayara H, Chong KT. Identifying enhancers and their strength by the integration of word embedding and convolution neural network. IEEE Access  2020; 8: 58369-76.
[http://dx.doi.org/10.1109/ACCESS.2020.2982666] 
[26] 
Le NQK, Yapp EKY, Ho QT, Nagasundaram N, Ou Y-Y, Yeh H-Y. iEnhancer-5Step: Identifying enhancers using hidden information of DNA sequences via Chou’s 5-step rule and word embedding. Anal Biochem  2019; 571: 53-61.
[http://dx.doi.org/10.1016/j.ab.2019.02.017] [PMID: 30822398] 
[27] 
Le NQK, Ho Q-T, Nguyen T-T-D, Ou Y-Y. A transformer architecture based on BERT and 2D convolutional neural network to identify DNA enhancers from sequence information. Briefings in Bioinformatics  2021; 22(5): bbab005.
[28] 
Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. Comput Sci  2014; 2014: 1409.0473.
[29] 
Kim Y, Denton C, Hoang L, Rush AM. Structured attention networks. ArXiv  2017; 2017: 1702.
[30] 
Parikh AP, Täckström O, Das D, Uszkoreit J. A decomposable attention model for natural language inference. EMNLP  2016; 2016: 1606.01933.
[http://dx.doi.org/10.18653/v1/D16-1244] 
[31] 
Sullivan AM, Bubb KL, Sandstrom R, Stamatoyannopoulos JA, Queitsch C. DNase I hypersensitivity mapping, genomic footprinting, and transcription factor networks in plants. Curr Plant Biol  2015; 3: 40-7.
[http://dx.doi.org/10.1016/j.cpb.2015.10.001] 
[32] 
Sikic K, Carugo O. Protein sequence redundancy reduction: Comparison of various method. Bioinformation  2010; 5(6): 234-9.
[http://dx.doi.org/10.6026/97320630005234] [PMID: 21364823] 
[33] 
Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: Accelerated for clustering the next-generation sequencing data. Bioinformatics  2012; 28(23): 3150-2.
[http://dx.doi.org/10.1093/bioinformatics/bts565] [PMID: 23060610] 
[34] 
Sievers A, Bosiek K, Bisch M, et al. K-mer content, correlation, and position analysis of genome DNA sequences for the identification of function and evolutionary features. Genes (Basel)  2017; 8(4): 122.
[http://dx.doi.org/10.3390/genes8040122] [PMID: 28422050] 
[35] 
Wang Y, Fu L, Ren J, Yu Z, Chen T, Sun F. Identifying Group-Specific sequences for microbial communities using long k-mer sequence signatures. Front Microbiol  2018; 9: 872.
[http://dx.doi.org/10.3389/fmicb.2018.00872] [PMID: 29774017] 
[36] 
Tan KK, Le NQK, Yeh H-Y, Chua MCH. Ensemble of deep recurrent neural networks for identifying enhancers via dinucleotide physicochemical properties. Cells  2019; 8(7): 767.
[http://dx.doi.org/10.3390/cells8070767] [PMID: 31340596] 
[37] 
Grau J, Grosse I, Keilwagen J. PRROC: Computing and visualizing precision-recall and receiver operating characteristic curves in R. Bioinformatics  2015; 31(15): 2595-7.
[http://dx.doi.org/10.1093/bioinformatics/btv153] [PMID: 25810428] 
[38] 
Fawcett T. ROC graphs: Notes and practical considerations for researchers. Mach Learn  2004; 31(1): 1-38.
[39] 
Wei L, Zhou C, Su R, Zou Q. PEPred-Suite: Improved and robust prediction of therapeutic peptides using adaptive feature representation learning. Bioinformatics  2019; 35(21): 4272-80.
[http://dx.doi.org/10.1093/bioinformatics/btz246] [PMID: 30994882] 
[40] 
Sahiner B, Chen W, Pezeshk A, Petrick N, Eds. Comparison of two classifiers when the data sets are imbalanced: the power of the area under the precision-recall curve as the figure of merit versus the area under the ROC curve Medical Imaging 2017: Image Perception, Observer Performance, and Technology Assessment. Washington: International Society for Optics and Photonics 2017.
[41] 
Chen Z, Lam O, Jacobson A, Milford M. Convolutional neural network-based place recognition. ArXiv  2014; 2014: 1509.
[42] 
Tayara H, Chong KT. Improving the quantification of DNA sequences using evolutionary information based on deep learning. Cells  2019; 8(12): 1635.
[http://dx.doi.org/10.3390/cells8121635] [PMID: 31847308] 
[43] 
Feurer M, Hutter F. Hyperparameter optimization. In: Automated machine learning.  Cham: Springer 2019; pp. 3-33.
[http://dx.doi.org/10.1007/978-3-030-05318-5_1] 
[44] 
Ghulam A, Lei X, Zhang Y, Cheng S, Guo M. Identification of pathway-specific protein domain by incorporating hyperparameter optimization based on 2D convolutional neural network. IEEE Access  2020; 8: 180140-55.
[45] 
Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Eds. Tensorflow: A system for large-scale machine learning. 12th {USENIX} symposium on operating systems design and implementation (OSDI 16).  2016.
[46] 
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Eds. Attention is all you need. Advances in neural information processing systemsMassachusetts, USA: MIT Press 2017.
[47] 
Min X, Zeng W, Chen S, Chen N, Chen T, Jiang R. Predicting enhancers with deep convolutional neural networks. BMC Bioinformatics  2017; 18(13)(Suppl. 13): 478.
[http://dx.doi.org/10.1186/s12859-017-1878-3] [PMID: 29219068] 
[48] 
Cai L, Ren X, Fu X, Peng L, Gao M, Zeng X. iEnhancer-XG: Interpretable sequence-based enhancers and their strength predictor. Bioinformatics  2021; 37(8): 1060-7.
[http://dx.doi.org/10.1093/bioinformatics/btaa914] [PMID: 33119044] 
[49] 
Zhang T-H, Flores M, Huang Y. ES-ARCNN: Predicting enhancer strength by using data augmentation and residual convolutional neural network. Anal Biochem  2021; 618: 114120.
[http://dx.doi.org/10.1016/j.ab.2021.114120] [PMID: 33535061] 
[50] 
Hong Z, Zeng X, Wei L, Liu X. Identifying enhancer-promoter interactions with neural network based on pre-trained DNA vectors and attention mechanism. Bioinformatics  2020; 36(4): 1037-43.
[PMID: 31588505] 

Rights & Permissions Print Cite

Article Metrics

7

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/1574893616666211123094301	Print ISSN 1574-8936
Publisher Name Bentham Science Publisher	Online ISSN 2212-392X

Current Bioinformatics

AthEDL: Identifying Enhancers in Arabidopsis thaliana Using an Attention-based Deep Learning Method

Abstract Play Pause

Graphical Abstract

Related Journals

Related Books

Abstract