Abstract
Background: A promoter is a DNA regulatory region typically found upstream of a gene that plays a significant role in gene transcription regulation. Due to their function in transcription initiation, sigma (σ) promoter sequences in bacterial genomes are important. σ70 is among the most notable sigma factors. Therefore, the precise recognition of the σ70 promoter is essential in bioinformatics.
Objective: Several methods for predicting σ70 promoters have been developed. However, the performance of these approaches needs to be enhanced. This study proposes a convolutional neural network (CNN) based model iProm70 to predict σ70 promoter sequences from a bacterial genome.
Methods: This CNN-based method employs a one-hot encoding scheme to identify promoters. The CNN model comprises three convolution layers, followed by max-pooling and a dropout layer. The architecture tool was trained and tested on a benchmark and an independent dataset. We used four assessment measures to determine the prediction performance.
Results: It achieved 96.10% accuracy, and the area under the receiver operating characteristic curve was 0.99.
Conclusion: According to the comparative results, iProm70 outperforms the current approaches for defining σ70 promoter. A publicly accessible online web server is created, and it is accessible at the website: http://nsclbio.jbnu.ac.kr/tools/Prom70-CNN/.
Keywords: Bioinformatics, Computational Biology, Convolution Neural Network (CNN), Promoters, iProm70, σ70 promoter
[http://dx.doi.org/10.1016/j.ceb.2008.03.003] [PMID: 18436437]
[http://dx.doi.org/10.3389/fgene.2019.00286] [PMID: 31024615]
[http://dx.doi.org/10.1128/jb.178.18.5447-5451.1996] [PMID: 8808934]
[http://dx.doi.org/10.1099/mic.0.2006/000463-0] [PMID: 17185540]
[http://dx.doi.org/10.1128/JB.181.12.3768-3776.1999] [PMID: 10368152]
[http://dx.doi.org/10.7554/eLife.10514] [PMID: 26371553]
[http://dx.doi.org/10.1006/geno.1997.5196] [PMID: 9570960]
[http://dx.doi.org/10.1128/MCB.24.13.5923-5936.2004] [PMID: 15199147]
[http://dx.doi.org/10.1038/nprot.2008.68] [PMID: 18536650]
[http://dx.doi.org/10.1016/j.compbiolchem.2008.07.009] [PMID: 18703385]
[http://dx.doi.org/10.1016/S0097-8485(96)00040-X] [PMID: 9440929]
[http://dx.doi.org/10.1093/nar/19.7.1593] [PMID: 2027766]
[http://dx.doi.org/10.1007/978-3-7908-1856-7_14]
[http://dx.doi.org/10.1016/j.biologicals.2013.10.001] [PMID: 24172230]
[http://dx.doi.org/10.1093/nar/gku1019] [PMID: 25361964]
[http://dx.doi.org/10.1109/TCBB.2017.2666141] [PMID: 28186907]
[http://dx.doi.org/10.1016/j.resmic.2007.08.005] [PMID: 18039561]
[http://dx.doi.org/10.1007/s12064-010-0114-8] [PMID: 21046474]
[http://dx.doi.org/10.1016/j.ygeno.2018.07.011] [PMID: 30059731]
[http://dx.doi.org/10.1007/s00438-018-1487-5] [PMID: 30187132]
[http://dx.doi.org/10.1093/bioinformatics/btx579] [PMID: 28968797]
[http://dx.doi.org/10.1093/bioinformatics/btz016] [PMID: 30649179]
[http://dx.doi.org/10.1093/bioinformatics/btaa609] [PMID: 32614400]
[http://dx.doi.org/10.3390/genes11121529] [PMID: 33371507]
[http://dx.doi.org/10.1093/nar/gky1077] [PMID: 30395280]
[http://dx.doi.org/10.1186/s12859-020-03730-z] [PMID: 32962628]
[http://dx.doi.org/10.1093/bioinformatics/bts565] [PMID: 23060610]
[http://dx.doi.org/10.1016/j.ymeth.2014.01.016] [PMID: 24530970]
[http://dx.doi.org/10.1109/ACCESS.2021.3071754]
[http://dx.doi.org/10.3389/fpls.2021.591333] [PMID: 33692814]
[http://dx.doi.org/10.3390/sym13050899]
[http://dx.doi.org/10.1109/ACCESS.2021.3054361]
[http://dx.doi.org/10.1038/s41467-021-26819-2] [PMID: 34772949]
[http://dx.doi.org/10.1186/s12864-020-07033-8] [PMID: 32917152]
[http://dx.doi.org/10.1093/bib/bbab005]