Generic placeholder image

Current Bioinformatics

Editor-in-Chief

ISSN (Print): 1574-8936
ISSN (Online): 2212-392X

Research Article

iProm70: A Convolutional Neural Network-based Tool for σ70 Promoter Classification

Author(s): Muhammad Shujaat, Hilal Tayara* and Kil To Chong*

Volume 17, Issue 7, 2022

Published on: 26 August, 2022

Page: [615 - 623] Pages: 9

DOI: 10.2174/1574893617666220405133520

Price: $65

Abstract

Background: A promoter is a DNA regulatory region typically found upstream of a gene that plays a significant role in gene transcription regulation. Due to their function in transcription initiation, sigma (σ) promoter sequences in bacterial genomes are important. σ70 is among the most notable sigma factors. Therefore, the precise recognition of the σ70 promoter is essential in bioinformatics.

Objective: Several methods for predicting σ70 promoters have been developed. However, the performance of these approaches needs to be enhanced. This study proposes a convolutional neural network (CNN) based model iProm70 to predict σ70 promoter sequences from a bacterial genome.

Methods: This CNN-based method employs a one-hot encoding scheme to identify promoters. The CNN model comprises three convolution layers, followed by max-pooling and a dropout layer. The architecture tool was trained and tested on a benchmark and an independent dataset. We used four assessment measures to determine the prediction performance.

Results: It achieved 96.10% accuracy, and the area under the receiver operating characteristic curve was 0.99.

Conclusion: According to the comparative results, iProm70 outperforms the current approaches for defining σ70 promoter. A publicly accessible online web server is created, and it is accessible at the website: http://nsclbio.jbnu.ac.kr/tools/Prom70-CNN/.

Keywords: Bioinformatics, Computational Biology, Convolution Neural Network (CNN), Promoters, iProm70, σ70 promoter

[1]
Juven-Gershon T, Hsu JY, Theisen JW, Kadonaga JT. The RNA polymerase II core promoter - the gateway to transcription. Curr Opin Cell Biol 2008; 20(3): 253-9.
[http://dx.doi.org/10.1016/j.ceb.2008.03.003] [PMID: 18436437]
[2]
Oubounyt M, Louadi Z, Tayara H, Chong KT. Deepromoter: Robust promoter predictor using deep learning. Front Genet 2019; 10: 286.
[http://dx.doi.org/10.3389/fgene.2019.00286] [PMID: 31024615]
[3]
Jishage M, Iwata A, Ueda S, Ishihama A. Regulation of RNA polymerase sigma subunit synthesis in Escherichia coli: Intracellular levels of four species of sigma subunit under various growth conditions. J Bacteriol 1996; 178(18): 5447-51.
[http://dx.doi.org/10.1128/jb.178.18.5447-5451.1996] [PMID: 8808934]
[4]
Janaszak A, Majczak W, Nadratowska B, Szalewska-Palasz A, Konopa G, Taylor A A. σ54-dependent promoter in the regulatory region of the Escherichia coli rpoH gene Microbiology 2007; 153(Pt 1): 111-23.
[http://dx.doi.org/10.1099/mic.0.2006/000463-0] [PMID: 17185540]
[5]
Jishage M, Ishihama A. Transcriptional organization and in vivo role of the Escherichia coli RSD gene, encoding the regulator of RNA polymerase sigma D. J Bacteriol 1999; 181(12): 3768-76.
[http://dx.doi.org/10.1128/JB.181.12.3768-3776.1999] [PMID: 10368152]
[6]
Goldman SR, Nair NU, Wells CD, Nickels BE, Hochschild A. The primary σ factor in Escherichia coli can access the transcription elongation complex from solution in vivo. eLife 2015; 4: e10514.
[http://dx.doi.org/10.7554/eLife.10514] [PMID: 26371553]
[7]
Matsumine H, Yamamura Y, Hattori N, et al. A microdeletion of D6S305 in a family of autosomal recessive juvenile parkinsonism (PARK2). Genomics 1998; 49(1): 143-6.
[http://dx.doi.org/10.1006/geno.1997.5196] [PMID: 9570960]
[8]
Kim JW, Zeller KI, Wang Y, et al. Evaluation of myc E-box phylogenetic footprints in glycolytic genes by chromatin immunoprecipitation assays. Mol Cell Biol 2004; 24(13): 5923-36.
[http://dx.doi.org/10.1128/MCB.24.13.5923-5936.2004] [PMID: 15199147]
[9]
Dahl JA, Collas P. A rapid micro chromatin immunoprecipitation assay (microChIP). Nat Protoc 2008; 3(6): 1032-45.
[http://dx.doi.org/10.1038/nprot.2008.68] [PMID: 18536650]
[10]
Towsey M, Timms P, Hogan J, Mathews SA. The cross-species prediction of bacterial promoters using a support vector machine. Comput Biol Chem 2008; 32(5): 359-66.
[http://dx.doi.org/10.1016/j.compbiolchem.2008.07.009] [PMID: 18703385]
[11]
Audic S, Claverie JM. Detection of eukaryotic promoters using Markov transition matrices. Comput Chem 1997; 21(4): 223-7.
[http://dx.doi.org/10.1016/S0097-8485(96)00040-X] [PMID: 9440929]
[12]
Demeler B, Zhou GW. Neural network optimization for E. coli promoter prediction. Nucleic Acids Res 1991; 19(7): 1593-9.
[http://dx.doi.org/10.1093/nar/19.7.1593] [PMID: 2027766]
[13]
Baji’c VB, Baji’c IV. Neural network system for promoter recognition.Future directions for intelligent systems and information sciences. Springer 2000; pp. 288-305.
[http://dx.doi.org/10.1007/978-3-7908-1856-7_14]
[14]
Silva SD, Forte F, Sartor IT, et al. DNA duplex stability as discriminative characteristic for Escherichia coli σ(54)- and σ(28)- dependent promoter sequences. Biologicals 2014; 42(1): 22-8.
[http://dx.doi.org/10.1016/j.biologicals.2013.10.001] [PMID: 24172230]
[15]
Lin H, Deng EZ, Ding H, Chen W, Chou KC. iPro54-PseKNC: A sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition. Nucleic Acids Res 2014; 42(21): 12961-72.
[http://dx.doi.org/10.1093/nar/gku1019] [PMID: 25361964]
[16]
Lin H, Liang ZY, Tang H, Chen W. Identifying sigma70 promoters with novel pseudo nucleotide composition. IEEE/ACM Trans Comput Biol Bioinformatics 2019; 16(4): 1316-21.
[http://dx.doi.org/10.1109/TCBB.2017.2666141] [PMID: 28186907]
[17]
Grech B, Maetschke S, Mathews S, Timms P. Genome-wide analysis of chlamydiae for promoters that phylogenetically footprint. Res Microbiol 2007; 158(8-9): 685-93.
[http://dx.doi.org/10.1016/j.resmic.2007.08.005] [PMID: 18039561]
[18]
Gordon L, Chervonenkis AY, Gammerman AJ, Shahmuradov IA, Solovyev VV. Sequence alignment kernel for recognition of promoter regions. Bioinformatics 2003; 19(15): 1964-71.
[19]
Lin H, Li QZ. Eukaryotic and prokaryotic promoter prediction using hybrid approach. Theory Biosci 2011; 130(2): 91-100.
[http://dx.doi.org/10.1007/s12064-010-0114-8] [PMID: 21046474]
[20]
Rahman MS, Aktar U, Jani MR, Shatabda S. iPromoter-FSEn: Identification of bacterial σ70 promoter sequences using feature subspace based ensemble classifier. Genomics 2019; 111(5): 1160-6.
[http://dx.doi.org/10.1016/j.ygeno.2018.07.011] [PMID: 30059731]
[21]
Rahman MS, Aktar U, Jani MR, Shatabda S. iPro70-FMWin: Identifying Sigma70 promoters using multiple windowing and minimal features. Mol Genet Genomics 2019; 294(1): 69-84.
[http://dx.doi.org/10.1007/s00438-018-1487-5] [PMID: 30187132]
[22]
Liu B, Yang F, Huang DS, Chou KC. iPromoter-2L: A two-layer predictor for identifying promoters and their types by multi-window-based PseKNC. Bioinformatics 2018; 34(1): 33-40.
[http://dx.doi.org/10.1093/bioinformatics/btx579] [PMID: 28968797]
[23]
Zhang M, Li F, Marquez-Lago TT, et al. MULTiPly: A novel multi-layer predictor for discovering general and specific types of promoters. Bioinformatics 2019; 35(17): 2957-65.
[http://dx.doi.org/10.1093/bioinformatics/btz016] [PMID: 30649179]
[24]
Amin R, Rahman CR, Ahmed S, et al. iPromoter-BnCNN: A novel branched CNN-based predictor for identifying and classifying sigma promoters. Bioinformatics 2020; 36(19): 4869-75.
[http://dx.doi.org/10.1093/bioinformatics/btaa609] [PMID: 32614400]
[25]
Shujaat M, Wahab A, Tayara H, Chong KT. pcpromoter-cnn: A cnn- based prediction and classification of promoters. Genes (Basel) 2020; 11(12): 1529.
[http://dx.doi.org/10.3390/genes11121529] [PMID: 33371507]
[26]
Santos-Zavaleta A, Salgado H, Gama-Castro S, et al. RegulonDB v 10.5: Tackling challenges to unify classic and high throughput knowledge of gene regulation in E. coli K-12. Nucleic Acids Res 2019; 47(D1): D212-20.
[http://dx.doi.org/10.1093/nar/gky1077] [PMID: 30395280]
[27]
Coppens L, Lavigne R. SAPPHIRE: A neural network based classifier for σ70 promoter prediction in Pseudomonas. BMC Bioinformatics 2020; 21(1): 415.
[http://dx.doi.org/10.1186/s12859-020-03730-z] [PMID: 32962628]
[28]
Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: Accelerated for clustering the next-generation sequencing data. Bioinformatics 2012; 28(23): 3150-2.
[http://dx.doi.org/10.1093/bioinformatics/bts565] [PMID: 23060610]
[29]
Jeong BS, Golam Bari AT, Rokeya Reaz M, Jeon S, Lim CG, Choi HJ. Codon-based encoding for DNA sequence analysis. Methods 2014; 67(3): 373-9.
[http://dx.doi.org/10.1016/j.ymeth.2014.01.016] [PMID: 24530970]
[30]
Khan A, Kim H, Chua L. Pmed-net: Pyramid based multi-scale encoder- decoder network for medical image segmentation. IEEE Access 2021; 9: 55988-98.
[http://dx.doi.org/10.1109/ACCESS.2021.3071754]
[31]
Ilyas T, Umraiz M, Khan A, Kim H. Dam: Hierarchical adaptive feature selection using convolution encoder decoder network for strawberry segmentation. Front Plant Sci 2021; 12: 591333.
[http://dx.doi.org/10.3389/fpls.2021.591333] [PMID: 33692814]
[32]
Lim DY, Rehman MU, Chong KT. irg-4mc: Neural network based tool for identification of DNA 4mc sites in rosaceae genome. Symmetry (Basel) 2021; 13(5): 899.
[http://dx.doi.org/10.3390/sym13050899]
[33]
Rehman MU, Hong KJ, Tayara H, Chong KT. m6a-neuraltool: Convolution neural tool for rna n6-methyladenosine site identification in different species. IEEE Access 2021; 9: 17779-86.
[http://dx.doi.org/10.1109/ACCESS.2021.3054361]
[34]
Abbas Z, Tayara H, Chong K. ZayyuNet A unified deep learning model for the identification of epigenetic modifications using raw genomic sequences. IEEE/ACM Trans Comput Biol Bioinform 2021. [Epub ahead of Print]
[35]
Sua JN, Lim SY, Yulius MH, et al. Incorporating convolutional neural networks and sequence graph transform for identifying multilabel protein Lysine PTM sites. Chemom Intell Lab Syst 2020; 206: 104171.
[36]
Nguyen Quoc Khanh Le. Quang-Thai Ho, Edward Kien Yee Yapp, Yu-Yen Ou, Hui-Yuan Yeh. DeepETC: A deep convolutional neural network architecture for investigating and classifying electron transport chain’s complexes. Neurocomputing 2020; 375: 71-9.
[37]
Sielemann J, Wulf D, Schmidt R, Bräutigam A. Local DNA shape is a general principle of transcription factor binding specificity in Arabidopsis thaliana. Nat Commun 2021; 12(1): 6549.
[http://dx.doi.org/10.1038/s41467-021-26819-2] [PMID: 34772949]
[38]
Zhao Z, Zhang X, Chen F, Fang L, Li J. Accurate prediction of DNA N4-methylcytosine sites via boost-learning various types of sequence features. BMC Genomics 2020; 21(1): 627.
[http://dx.doi.org/10.1186/s12864-020-07033-8] [PMID: 32917152]
[39]
Nguyen Quoc Khanh Le, Quang-Thai Ho, Trinh-Trung-Duong Nguyen, Yu-Yen Ou. A transformer architecture based on BERT and 2D convolutional neural network to identify DNA enhancers from sequence information. Briefings Bioinf 2021; 22(5): bbab005.
[http://dx.doi.org/10.1093/bib/bbab005]
[40]
Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural networks. Proceedings of the fourteenth international conference on artificial intelligence and statistics 2011. Available from: https://proceedings.mlr.press/v15/glorot11a/glorot11a.pdf

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy