iProm70: A Convolutional Neural Network-based Tool for σ70 Promoter
Classification

Muhammad      Shujaat; Hilal      Tayara; Kil   To   Chong

doi:10.2174/1574893617666220405133520

Abstract

Background: A promoter is a DNA regulatory region typically found upstream of a gene that plays a significant role in gene transcription regulation. Due to their function in transcription initiation, sigma (σ) promoter sequences in bacterial genomes are important. σ70 is among the most notable sigma factors. Therefore, the precise recognition of the σ70 promoter is essential in bioinformatics.

Objective: Several methods for predicting σ70 promoters have been developed. However, the performance of these approaches needs to be enhanced. This study proposes a convolutional neural network (CNN) based model iProm70 to predict σ70 promoter sequences from a bacterial genome.

Methods: This CNN-based method employs a one-hot encoding scheme to identify promoters. The CNN model comprises three convolution layers, followed by max-pooling and a dropout layer. The architecture tool was trained and tested on a benchmark and an independent dataset. We used four assessment measures to determine the prediction performance.

Results: It achieved 96.10% accuracy, and the area under the receiver operating characteristic curve was 0.99.

Conclusion: According to the comparative results, iProm70 outperforms the current approaches for defining σ70 promoter. A publicly accessible online web server is created, and it is accessible at the website: http://nsclbio.jbnu.ac.kr/tools/Prom70-CNN/.

Keywords: Bioinformatics, Computational Biology, Convolution Neural Network (CNN), Promoters, iProm70, σ70 promoter

« Previous Next »

[1]
Juven-Gershon T, Hsu JY, Theisen JW, Kadonaga JT. The RNA polymerase II core promoter - the gateway to transcription. Curr Opin Cell Biol  2008; 20(3): 253-9.
 [http://dx.doi.org/10.1016/j.ceb.2008.03.003] [PMID:  18436437]

[2]
Oubounyt M, Louadi Z, Tayara H, Chong KT. Deepromoter: Robust promoter predictor using deep learning. Front Genet  2019; 10: 286.
 [http://dx.doi.org/10.3389/fgene.2019.00286] [PMID:  31024615]

[3]
Jishage M, Iwata A, Ueda S, Ishihama A. Regulation of RNA polymerase sigma subunit synthesis in Escherichia coli: Intracellular levels of four species of sigma subunit under various growth conditions. J Bacteriol  1996; 178(18): 5447-51.
 [http://dx.doi.org/10.1128/jb.178.18.5447-5451.1996] [PMID:  8808934]

[4]
Janaszak A, Majczak W, Nadratowska B, Szalewska-Palasz A, Konopa G, Taylor A A. σ54-dependent promoter in the regulatory region of the Escherichia coli rpoH gene Microbiology  2007; 153(Pt 1): 111-23.
 [http://dx.doi.org/10.1099/mic.0.2006/000463-0] [PMID:  17185540]

[5]
Jishage M, Ishihama A. Transcriptional organization and in vivo role of the Escherichia coli RSD gene, encoding the regulator of RNA polymerase sigma D. J Bacteriol  1999; 181(12): 3768-76.
 [http://dx.doi.org/10.1128/JB.181.12.3768-3776.1999] [PMID:  10368152]

[6]
Goldman SR, Nair NU, Wells CD, Nickels BE, Hochschild A. The primary σ factor in Escherichia coli can access the transcription elongation complex from solution in vivo. eLife  2015; 4: e10514.
 [http://dx.doi.org/10.7554/eLife.10514] [PMID:  26371553]

[7]
Matsumine H, Yamamura Y, Hattori N, et al. A microdeletion of D6S305 in a family of autosomal recessive juvenile parkinsonism (PARK2). Genomics  1998; 49(1): 143-6.
 [http://dx.doi.org/10.1006/geno.1997.5196] [PMID:  9570960]

[8]
Kim JW, Zeller KI, Wang Y, et al. Evaluation of myc E-box phylogenetic footprints in glycolytic genes by chromatin immunoprecipitation assays. Mol Cell Biol  2004; 24(13): 5923-36.
 [http://dx.doi.org/10.1128/MCB.24.13.5923-5936.2004] [PMID:  15199147]

[9]
Dahl JA, Collas P. A rapid micro chromatin immunoprecipitation assay (microChIP). Nat Protoc  2008; 3(6): 1032-45.
 [http://dx.doi.org/10.1038/nprot.2008.68] [PMID:  18536650]

[10]
Towsey M, Timms P, Hogan J, Mathews SA. The cross-species prediction of bacterial promoters using a support vector machine. Comput Biol Chem  2008; 32(5): 359-66.
 [http://dx.doi.org/10.1016/j.compbiolchem.2008.07.009] [PMID:  18703385]

[11]
Audic S, Claverie JM. Detection of eukaryotic promoters using Markov transition matrices. Comput Chem  1997; 21(4): 223-7.
 [http://dx.doi.org/10.1016/S0097-8485(96)00040-X] [PMID:  9440929]

[12]
Demeler B, Zhou GW. Neural network optimization for E. coli promoter prediction. Nucleic Acids Res  1991; 19(7): 1593-9.
 [http://dx.doi.org/10.1093/nar/19.7.1593] [PMID:  2027766]

[13]
Baji’c VB, Baji’c IV. Neural network system for promoter recognition.Future directions for intelligent systems and information sciences.  Springer 2000; pp. 288-305.
 [http://dx.doi.org/10.1007/978-3-7908-1856-7_14]

[14]
Silva SD, Forte F, Sartor IT, et al. DNA duplex stability as discriminative characteristic for Escherichia coli σ(54)- and σ(28)- dependent promoter sequences. Biologicals  2014; 42(1): 22-8.
 [http://dx.doi.org/10.1016/j.biologicals.2013.10.001] [PMID:  24172230]

[15]
Lin H, Deng EZ, Ding H, Chen W, Chou KC. iPro54-PseKNC: A sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition. Nucleic Acids Res  2014; 42(21): 12961-72.
 [http://dx.doi.org/10.1093/nar/gku1019] [PMID:  25361964]

[16]
Lin H, Liang ZY, Tang H, Chen W. Identifying sigma70 promoters with novel pseudo nucleotide composition. IEEE/ACM Trans Comput Biol Bioinformatics  2019; 16(4): 1316-21.
 [http://dx.doi.org/10.1109/TCBB.2017.2666141] [PMID:  28186907]

[17]
Grech B, Maetschke S, Mathews S, Timms P. Genome-wide analysis of chlamydiae for promoters that phylogenetically footprint. Res Microbiol  2007; 158(8-9): 685-93.
 [http://dx.doi.org/10.1016/j.resmic.2007.08.005] [PMID:  18039561]

[18]
Gordon L, Chervonenkis AY, Gammerman AJ, Shahmuradov IA, Solovyev VV. Sequence alignment kernel for recognition of promoter regions. Bioinformatics  2003; 19(15): 1964-71.

[19]
Lin H, Li QZ. Eukaryotic and prokaryotic promoter prediction using hybrid approach. Theory Biosci  2011; 130(2): 91-100.
 [http://dx.doi.org/10.1007/s12064-010-0114-8] [PMID:  21046474]

[20]
Rahman MS, Aktar U, Jani MR, Shatabda S. iPromoter-FSEn: Identification of bacterial σ70 promoter sequences using feature subspace based ensemble classifier. Genomics  2019; 111(5): 1160-6.
 [http://dx.doi.org/10.1016/j.ygeno.2018.07.011] [PMID:  30059731]

[21]
Rahman MS, Aktar U, Jani MR, Shatabda S. iPro70-FMWin: Identifying Sigma70 promoters using multiple windowing and minimal features. Mol Genet Genomics  2019; 294(1): 69-84.
 [http://dx.doi.org/10.1007/s00438-018-1487-5] [PMID:  30187132]

[22]
Liu B, Yang F, Huang DS, Chou KC. iPromoter-2L: A two-layer predictor for identifying promoters and their types by multi-window-based PseKNC. Bioinformatics  2018; 34(1): 33-40.
 [http://dx.doi.org/10.1093/bioinformatics/btx579] [PMID:  28968797]

[23]
Zhang M, Li F, Marquez-Lago TT, et al. MULTiPly: A novel multi-layer predictor for discovering general and specific types of promoters. Bioinformatics  2019; 35(17): 2957-65.
 [http://dx.doi.org/10.1093/bioinformatics/btz016] [PMID:  30649179]

[24]
Amin R, Rahman CR, Ahmed S, et al. iPromoter-BnCNN: A novel branched CNN-based predictor for identifying and classifying sigma promoters. Bioinformatics  2020; 36(19): 4869-75.
 [http://dx.doi.org/10.1093/bioinformatics/btaa609] [PMID:  32614400]

[25]
Shujaat M, Wahab A, Tayara H, Chong KT. pcpromoter-cnn: A cnn- based prediction and classification of promoters. Genes (Basel)  2020; 11(12): 1529.
 [http://dx.doi.org/10.3390/genes11121529] [PMID:  33371507]

[26]
Santos-Zavaleta A, Salgado H, Gama-Castro S, et al. RegulonDB v 10.5: Tackling challenges to unify classic and high throughput knowledge of gene regulation in E. coli K-12. Nucleic Acids Res  2019; 47(D1): D212-20.
 [http://dx.doi.org/10.1093/nar/gky1077] [PMID:  30395280]

[27]
Coppens L, Lavigne R. SAPPHIRE: A neural network based classifier for σ70 promoter prediction in Pseudomonas. BMC Bioinformatics  2020; 21(1): 415.
 [http://dx.doi.org/10.1186/s12859-020-03730-z] [PMID:  32962628]

[28]
Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: Accelerated for clustering the next-generation sequencing data. Bioinformatics  2012; 28(23): 3150-2.
 [http://dx.doi.org/10.1093/bioinformatics/bts565] [PMID:  23060610]

[29]
Jeong BS, Golam Bari AT, Rokeya Reaz M, Jeon S, Lim CG, Choi HJ. Codon-based encoding for DNA sequence analysis. Methods  2014; 67(3): 373-9.
 [http://dx.doi.org/10.1016/j.ymeth.2014.01.016] [PMID:  24530970]

[30]
Khan A, Kim H, Chua L.  Pmed-net: Pyramid based multi-scale	encoder- decoder network for medical image segmentation. IEEE	Access  2021; 9: 55988-98.
 [http://dx.doi.org/10.1109/ACCESS.2021.3071754]

[31]
Ilyas T, Umraiz M, Khan A, Kim H. Dam: Hierarchical adaptive feature selection using convolution encoder decoder network for strawberry segmentation. Front Plant Sci  2021; 12: 591333.
 [http://dx.doi.org/10.3389/fpls.2021.591333] [PMID:  33692814]

[32]
Lim DY, Rehman MU, Chong KT. irg-4mc: Neural network based tool for identification of DNA 4mc sites in rosaceae genome. Symmetry (Basel)  2021; 13(5): 899.
 [http://dx.doi.org/10.3390/sym13050899]

[33]
Rehman MU, Hong KJ, Tayara H, Chong KT. m6a-neuraltool: Convolution neural tool for rna n6-methyladenosine site identification in different species. IEEE Access  2021; 9: 17779-86.
 [http://dx.doi.org/10.1109/ACCESS.2021.3054361]

[34]
Abbas Z, Tayara H, Chong K. ZayyuNet A unified deep learning model for the identification of epigenetic modifications using raw genomic sequences. IEEE/ACM Trans Comput Biol Bioinform 2021. [Epub ahead of Print]

[35]
Sua JN, Lim SY, Yulius MH, et al. Incorporating convolutional neural networks and sequence graph transform for identifying multilabel protein Lysine PTM sites. Chemom Intell Lab Syst  2020; 206: 104171.

[36]
Nguyen Quoc Khanh Le. Quang-Thai Ho, Edward Kien Yee Yapp, Yu-Yen Ou, Hui-Yuan Yeh. DeepETC: A deep convolutional neural network architecture for investigating and classifying electron transport chain’s complexes. Neurocomputing  2020; 375: 71-9.

[37]
Sielemann J, Wulf D, Schmidt R, Bräutigam A. Local DNA shape is a general principle of transcription factor binding specificity in Arabidopsis thaliana. Nat Commun  2021; 12(1): 6549.
 [http://dx.doi.org/10.1038/s41467-021-26819-2] [PMID:  34772949]

[38]
Zhao Z, Zhang X, Chen F, Fang L, Li J. Accurate prediction of DNA N4-methylcytosine sites via boost-learning various types of sequence features. BMC Genomics  2020; 21(1): 627.
 [http://dx.doi.org/10.1186/s12864-020-07033-8] [PMID:  32917152]

[39]
Nguyen Quoc Khanh Le, Quang-Thai Ho, Trinh-Trung-Duong Nguyen, Yu-Yen Ou. A transformer architecture based on BERT and 2D convolutional neural network to identify DNA enhancers from sequence information. Briefings Bioinf  2021; 22(5): bbab005.
 [http://dx.doi.org/10.1093/bib/bbab005]

[40]
Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural networks. Proceedings of the fourteenth international conference on artificial intelligence and statistics 2011. Available from: https://proceedings.mlr.press/v15/glorot11a/glorot11a.pdf

Rights & Permissions Print Cite

Article Metrics

6

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/1574893617666220405133520	Print ISSN 1574-8936
Publisher Name Bentham Science Publisher	Online ISSN 2212-392X

Current Bioinformatics

iProm70: A Convolutional Neural Network-based Tool for σ⁷⁰ Promoter Classification

Abstract

Current Bioinformatics

iProm70: A Convolutional Neural Network-based Tool for σ70 Promoter Classification

Abstract Play Pause

Related Journals

Related Books

iProm70: A Convolutional Neural Network-based Tool for σ⁷⁰ Promoter Classification

Abstract