Generic placeholder image

Current Bioinformatics

Editor-in-Chief

ISSN (Print): 1574-8936
ISSN (Online): 2212-392X

Research Article

iPSI(2L)-EDL: a Two-layer Predictor for Identifying Promoters and their Types based on Ensemble Deep Learning

Author(s): Xuan Xiao*, Zaihao Hu, ZhenTao Luo and Zhaochun Xu

Volume 19, Issue 4, 2024

Published on: 02 October, 2023

Page: [327 - 340] Pages: 14

DOI: 10.2174/0115748936264316230926073231

Price: $65

Abstract

Promoters are DNA fragments located near the transcription initiation site, they can be divided into strong promoter type and weak promoter type according to transcriptional activation and expression level. Identifying promoters and their strengths in DNA sequences is essential for understanding gene expression regulation. Therefore, it is crucial to further improve predictive quality of predictors for real-world application requirements. Here, we constructed the latest training dataset based on the RegalonDB website, where all the promoters in this dataset have been experimentally validated, and their sequence similarity is less than 85%. We used one-hot and nucleotide chemical property and density (NCPD) to represent DNA sequence samples. Additionally, we proposed an ensemble deep learning framework containing a multi-head attention module, long short-term memory present, and a convolutional neural network module.

The results showed that iPSI(2L)-EDL outperformed other existing methods for both promoter prediction and identification of strong promoter type and weak promoter type, the AUC and MCC for the iPSI(2L)-EDL in identifying promoter were improved by 2.23% and 2.96% compared to that of PseDNC-DL on independent testing data, respectively, while the AUC and MCC for the iPSI(2L)- EDL were increased by 3.74% and 5.86% in predicting promoter strength type, respectively. The results of ablation experiments indicate that CNN plays a crucial role in recognizing promoters, the importance of different input positions and long-range dependency relationships among features are helpful for recognizing promoters.

Furthermore, to make it easier for most experimental scientists to get the results they need, a userfriendly web server has been established and can be accessed at http://47.94.248.117/IPSW(2L)-EDL.

[1]
Shahmuradov IA, Mohamad Razali R, Bougouffa S, Radovanovic A, Bajic VB. bTSSfinder: A novel tool for the prediction of promoters in cyanobacteria and Escherichia coli. Bioinformatics 2017; 33(3): 334-40.
[http://dx.doi.org/10.1093/bioinformatics/btw629] [PMID: 27694198]
[2]
Vo ngoc L, Wang YL, Kassavetis GA, Kadonaga JT. The punctilious RNA polymerase II core promoter. Genes Dev 2017; 31(13): 1289-301.
[http://dx.doi.org/10.1101/gad.303149.117] [PMID: 28808065]
[3]
Lomsadze A, Ter-Hovhannisyan V, Chernoff YO, Borodovsky M. Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Res 2005; 33(20): 6494-506.
[http://dx.doi.org/10.1093/nar/gki937] [PMID: 16314312]
[4]
Carter R, Drouin G. Structural differentiation of the three eukaryotic RNA polymerases. Genomics 2009; 94(6): 388-96.
[http://dx.doi.org/10.1016/j.ygeno.2009.08.011] [PMID: 19720141]
[5]
Trapnell C, Pachter L, Salzberg SL. TopHat: Discovering splice junctions with RNA-Seq. Bioinformatics 2009; 25(9): 1105-11.
[http://dx.doi.org/10.1093/bioinformatics/btp120] [PMID: 19289445]
[6]
Furey TS. ChIP–seq and beyond: New and improved methodologies to detect and characterize protein–DNA interactions. Nat Rev Genet 2012; 13(12): 840-52.
[http://dx.doi.org/10.1038/nrg3306] [PMID: 23090257]
[7]
Lin H, Deng EZ, Ding H, Chen W, Chou KC. iPro54-PseKNC: A sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition. Nucleic Acids Res 2014; 42(21): 12961-72.
[http://dx.doi.org/10.1093/nar/gku1019] [PMID: 25361964]
[8]
He W, Jia C, Duan Y, Zou Q. 70ProPred: A predictor for discovering sigma70 promoters based on combining multiple features. BMC Syst Biol 2018; 12(S4): 44.
[http://dx.doi.org/10.1186/s12918-018-0570-1] [PMID: 29745856]
[9]
Liu B, Yang F, Huang DS, Chou KC. iPromoter-2L: A two-layer predictor for identifying promoters and their types by multi-window-based PseKNC. Bioinformatics 2018; 34(1): 33-40.
[http://dx.doi.org/10.1093/bioinformatics/btx579] [PMID: 28968797]
[10]
Lyu Y, He W, Li S, et al. iPro2L-PSTKNC: A two-layer predictor for discovering various types of promoters by position specific of nucleotide composition. IEEE J Biomed Health Inform 2021; 25(6): 2329-37.
[PMID: 32976109]
[11]
Liu B, Li K. iPromoter-2L2. 0: Identifying promoters and their types by combining smoothing cutting window algorithm and sequence-based features. Mol Ther Nucleic Acids 2019; 18: 80-7.
[http://dx.doi.org/10.1016/j.omtn.2019.08.008] [PMID: 31536883]
[12]
Ernst J, Kellis M. ChromHMM: Automating chromatin-state discovery and characterization. Nat Methods 2012; 9(3): 215-6.
[http://dx.doi.org/10.1038/nmeth.1906] [PMID: 22373907]
[13]
Chan RCW, Libbrecht MW, Roberts EG, Bilmes JA, Noble WS, Hoffman MM. Segway 2.0: Gaussian mixture models and minibatch training. Bioinformatics 2018; 34(4): 669-71.
[http://dx.doi.org/10.1093/bioinformatics/btx603] [PMID: 29028889]
[14]
Amin R, Rahman CR, Ahmed S, et al. iPromoter-BnCNN: A novel branched CNN-based predictor for identifying and classifying sigma promoters. Bioinformatics 2020; 36(19): 4869-75.
[http://dx.doi.org/10.1093/bioinformatics/btaa609] [PMID: 32614400]
[15]
Yang B, Liu F, Ren C, et al. BiRen: Predicting enhancers with a deep-learning-based model using the DNA sequence alone. Bioinformatics 2017; 33(13): 1930-6.
[http://dx.doi.org/10.1093/bioinformatics/btx105] [PMID: 28334114]
[16]
Tahir M, Tayara H, Chong KT. iRNA-PseKNC(2methyl): Identify RNA 2′-O-methylation sites by convolution neural network and Chou’s pseudo components. J Theor Biol 2019; 465: 1-6.
[http://dx.doi.org/10.1016/j.jtbi.2018.12.034] [PMID: 30590059]
[17]
Umarov RK, Solovyev VV. Recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks. PLoS One 2017; 12(2): e0171410.
[http://dx.doi.org/10.1371/journal.pone.0171410] [PMID: 28158264]
[18]
Le NQK, Yapp EKY, Nagasundaram N, Yeh HY. Classifying promoters by interpreting the hidden information of DNA sequences via deep learning and combination of continuous fasttext N-grams. Front Bioeng Biotechnol 2019; 7(305): 305.
[http://dx.doi.org/10.3389/fbioe.2019.00305] [PMID: 31750297]
[19]
Zhu Y, Li F, Xiang D, Akutsu T, Song J, Jia C. Computational identification of eukaryotic promoters based on cascaded deep capsule neural networks. Brief Bioinform 2021; 22(4): bbaa299.
[http://dx.doi.org/10.1093/bib/bbaa299] [PMID: 33227813]
[20]
Ma ZW, Zhao JP, Tian J, Zheng CH. DeeProPre: A promoter predictor based on deep learning. Comput Biol Chem 2022; 101: 107770.
[http://dx.doi.org/10.1016/j.compbiolchem.2022.107770] [PMID: 36116322]
[21]
Nguyen-Vo TH, Trinh QH, Nguyen L, Nguyen-Hoang PU, Rahardja S, Nguyen BP. iPromoter-Seqvec: Identifying promoters using bidirectional long short-term memory and sequence-embedded features. BMC Genomics 2022; 23(S5): 681.
[http://dx.doi.org/10.1186/s12864-022-08829-6] [PMID: 36192696]
[22]
Xiao X, Xu ZC, Qiu WR, Wang P, Ge HT, Chou KC. iPSW(2L)-PseKNC: A two-layer predictor for identifying promoters and their strength by hybrid features via pseudo K-tuple nucleotide composition. Genomics 2019; 111(6): 1785-93.
[http://dx.doi.org/10.1016/j.ygeno.2018.12.001] [PMID: 30529532]
[23]
Liang Y, Zhang S, Qiao H, Yao Y. iPromoter-ET: Identifying promoters and their strength by extremely randomized trees-based feature selection. Anal Biochem 2021; 630: 114335.
[http://dx.doi.org/10.1016/j.ab.2021.114335] [PMID: 34389299]
[24]
Tayara H, Tahir M, Chong KT. Identification of prokaryotic promoters and their strength by integrating heterogeneous features. Genomics 2020; 112(2): 1396-403.
[http://dx.doi.org/10.1016/j.ygeno.2019.08.009] [PMID: 31437540]
[25]
Le NQK, Ho QT, Nguyen VN, Chang JS. BERT-Promoter: An improved sequence-based predictor of DNA promoter using BERT pre-trained model and SHAP feature selection. Comput Biol Chem 2022; 99: 107732.
[http://dx.doi.org/10.1016/j.compbiolchem.2022.107732] [PMID: 35863177]
[26]
Tierrafría VH, Rioualen C, Salgado H, et al. RegulonDB 11.0: Comprehensive high-throughput datasets on transcriptional regulation in Escherichia coli K-12. Microb Genom 2022; 8(5): 000833.
[http://dx.doi.org/10.1099/mgen.0.000833] [PMID: 35584008]
[27]
Shepelev V, Fedorov A. Advances in the Exon-Intron Database (EID). Brief Bioinform 2006; 7(2): 178-85.
[http://dx.doi.org/10.1093/bib/bbl003] [PMID: 16772261]
[28]
Le NQK, Yapp EKY, Ho QT, Nagasundaram N, Ou YY, Yeh HY. iEnhancer-5Step: Identifying enhancers using hidden information of DNA sequences via Chou’s 5-step rule and word embedding. Anal Biochem 2019; 571: 53-61.
[http://dx.doi.org/10.1016/j.ab.2019.02.017] [PMID: 30822398]
[29]
Rahman MS, Aktar U, Jani MR, Shatabda S. iPromoter-FSEn: Identification of bacterial σ70 promoter sequences using feature subspace based ensemble classifier. Genomics 2019; 111(5): 1160-6.
[http://dx.doi.org/10.1016/j.ygeno.2018.07.011] [PMID: 30059731]
[30]
Li H, Shi L, Gao W, et al. dPromoter-XGBoost: Detecting promoters and strength by combining multiple descriptors and feature selection using XGBoost. Methods 2022; 204: 215-22.
[http://dx.doi.org/10.1016/j.ymeth.2022.01.001] [PMID: 34998983]
[31]
Wang M, Li F, Wu H, Liu Q, Li S. PredPromoter-MF(2L): A novel approach of promoter prediction based on multi-source feature fusion and deep forest. Interdiscip Sci 2022; 14(3): 697-711.
[http://dx.doi.org/10.1007/s12539-022-00520-4] [PMID: 35488998]
[32]
Bhukya R, Kumari A, Amilpur S, Dasari CM. PPred-PCKSM: A multi-layer predictor for identifying promoter and its variants using position based features. Comput Biol Chem 2022; 97: 107623.
[http://dx.doi.org/10.1016/j.compbiolchem.2022.107623] [PMID: 35065417]
[33]
Kim J, Shujaat M, Tayara H. iProm-Zea: A two-layer model to identify plant promoters and their types using convolutional neural network. Genomics 2022; 114(3): 110384.
[http://dx.doi.org/10.1016/j.ygeno.2022.110384] [PMID: 35533969]
[34]
Tahir M, Hayat M, Gul S, Chong KT. An intelligent computational model for prediction of promoters and their strength via natural language processing. Chemom Intell Lab Syst 2020; 202: 104034.
[http://dx.doi.org/10.1016/j.chemolab.2020.104034]
[35]
Shariati FS, Keramati M, Valizadeh V, Cohan RA, Norouzian D. Comparison of E. coli based self-inducible expression systems containing different human heat shock proteins. Sci Rep 2021; 11(1): 4576.
[http://dx.doi.org/10.1038/s41598-021-84188-8] [PMID: 33633341]
[36]
Arsène F, Tomoyasu T, Bukau B. The heat shock response of escherichia coli. Int J Food Microbiol 2000; 55(1-3): 3-9.
[http://dx.doi.org/10.1016/S0168-1605(00)00206-3] [PMID: 10791710]
[37]
Lalwani MA, Ip SS, Carrasco-López C, et al. Optogenetic control of the lac operon for bacterial chemical and protein production. Nat Chem Biol 2021; 17(1): 71-9.
[http://dx.doi.org/10.1038/s41589-020-0639-1] [PMID: 32895498]
[38]
Greenfield L, Boone T, Wilcox G. DNA sequence of the araBAD promoter in escherichia coli B/r. Proc Natl Acad Sci 1978; 75(10): 4724-8.
[http://dx.doi.org/10.1073/pnas.75.10.4724] [PMID: 368797]

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy