Generic placeholder image

Current Bioinformatics

Editor-in-Chief

ISSN (Print): 1574-8936
ISSN (Online): 2212-392X

Research Article

iAnt: Combination of Convolutional Neural Network and Random Forest Models Using PSSM and BERT Features to Identify Antioxidant Proteins

Author(s): Hoang V. Tran and Quang H. Nguyen*

Volume 17, Issue 2, 2022

Published on: 07 December, 2021

Page: [184 - 195] Pages: 12

DOI: 10.2174/1574893616666210820095144

Price: $65

Abstract

Background: Reactive Oxygen Species (ROS) play many roles in the body, such as cell signaling, homeostasis, or protection from harmful bacteria. However, an excess of ROS in the body will damage lipids, proteins, and DNA. Many studies have shown that various environmental factors increase the amount of ROS produced in the body. Antioxidant proteins are responsible for neutralizing these ROS or free radicals. Although the amount of data on protein sequences has increased over the last two decades, we still lack bioinformatics tools to be able to accurately identify antioxidant protein sequences. Furthermore, biochemical methods to determine antioxidant proteins are very expensive and time-consuming. Therefore, a machine learning approach must be used to speed up the computation.

Methods: In this study, we propose a new method that combines a convolutional neural network and Random Forest using two features, the normalized PSSM and the best-selected feature of the ProtBert output.

Results: Our model gave very good results on the independent test dataset with 97.3% sensitivity and 95.9% specificity. Comparison with current state-of-the-art models shows that our model is superior. We have also installed iAnt as an online website with a friendly interface available at a website: http: //antixiodant.nguyenhongquang.edu.vn.

Conclusion: iAnt has been developed to accurately identify the antioxidant protein. It shows results outperforming the existing state-of-the-art methods; it is also available online.

Keywords: Antioxidant protein, convolutional neural network, random forest, feature selection, position-specific scoring matrix, ProtBert.

Graphical Abstract

[1]
Halliwell B, Chirico S. Lipid peroxidation: Its mechanism, measurement, and significance. Am J Clin Nutr 1993; 57(5)(Suppl.): 715S-24S.
[http://dx.doi.org/10.1093/ajcn/57.5.715S] [PMID: 8475889]
[2]
Powers SK, Jackson MJ. Exercise-induced oxidative stress: Cellular mechanisms and impact on muscle force production. Physiol Rev 2008; 88(4): 1243-76.
[http://dx.doi.org/10.1152/physrev.00031.2007] [PMID: 18923182]
[3]
Butt AH, Rasool N, Khan YD. Prediction of antioxidant proteins by incorporating statistical moments based features into Chou’s PseAAC. J Theor Biol 2019; 473: 1-8.
[http://dx.doi.org/10.1016/j.jtbi.2019.04.019] [PMID: 31005614]
[4]
Barrera G. Oxidative stress and lipid peroxidation products in cancer progression and therapy. ISRN Oncol 2012; 2012: 137289.
[http://dx.doi.org/10.5402/2012/137289] [PMID: 23119185]
[5]
Simon HU, Haj-Yehia A, Levi-Schaffer F. Role of reactive oxygen species (ROS) in apoptosis induction. Apoptosis 2000; 5(5): 415-8.
[http://dx.doi.org/10.1023/A:1009616228304] [PMID: 11256882]
[6]
Sivapriya ML. Srinivas. Isolation and purification of a novel antioxidant protein from the water extract of Sundakai (Solanum torvum) seeds. Food Chem X 2007; 104: 510-7.
[http://dx.doi.org/10.1016/j.foodchem.2006.11.060]
[7]
Piconi L, Quagliaro L, Ceriello A. Oxidative stress in diabetes. Clin Chem Lab Med 2003; 41(9): 1144-9.
[http://dx.doi.org/10.1515/CCLM.2003.177] [PMID: 14598863]
[8]
Snezhkina AV, Kudryavtseva AV, Kardymon OL, et al. ROS generation and antioxidant defense systems in normal and malignant cells. Oxid Med Cell Longev 2019; 2019: 6175804.
[http://dx.doi.org/10.1155/2019/6175804] [PMID: 31467634]
[9]
Feng PM, Lin H, Chen W. Identification of antioxidants from sequence information using naïve Bayes. Comput Math Methods Med 2013; 2013: 567529.
[http://dx.doi.org/10.1155/2013/567529] [PMID: 24062796]
[10]
Feng P, Chen W, Lin H. Identifying antioxidant proteins by using optimal dipeptide compositions. Interdiscip Sci 2016; 8(2): 186-91.
[http://dx.doi.org/10.1007/s12539-015-0124-9] [PMID: 26345449]
[11]
Zhang L, Zhang C, Gao R, Yang R, Song Q. Sequence based prediction of antioxidant proteins using a classifier selection strategy. PLoS One 2016; 11(9): e0163274.
[http://dx.doi.org/10.1371/journal.pone.0163274] [PMID: 27662651]
[12]
Xu L, Liang G, Shi S, Liao C. SeqSVM: A sequence-based support vector machine method for identifying antioxidant proteins. Int J Mol Sci 2018; 19(6): 1773.
[http://dx.doi.org/10.3390/ijms19061773] [PMID: 29914044]
[13]
Meng C, Jin S, Wang L, Guo F, Zou Q. AOPs-SVM: A Sequence-based classifier of antioxidant proteins using a support vector machine. Front Bioeng Biotechnol 2019; 7: 224.
[http://dx.doi.org/10.3389/fbioe.2019.00224] [PMID: 31620433]
[14]
Li X, Tang Q, Tang H, Chen W. Identifying antioxidant proteins by combining multiple methods. Front Bioeng Biotechnol 2020; 8: 858.
[http://dx.doi.org/10.3389/fbioe.2020.00858] [PMID: 32793581]
[15]
Ao C, Zhou W, Gao L, Dong B, Yu L. Prediction of antioxidant proteins using hybrid feature representation method and random forest. Genomics 2020; 112(6): 4666-74.
[http://dx.doi.org/10.1016/j.ygeno.2020.08.016] [PMID: 32818637]
[16]
Ho Thanh Lam L, Le NH, Van Tuan L, et al. Machine learning model for identifying antioxidant proteins using features calculated from primary sequences. Biology (Basel) 2020; 9(10): 325.
[http://dx.doi.org/10.3390/biology9100325] [PMID: 33036150]
[17]
Altschul SF, Madden TL, Schäffer AA, et al. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res 1997; 25(17): 3389-402.
[http://dx.doi.org/10.1093/nar/25.17.3389] [PMID: 9254694]
[18]
Devlin J, Chang M, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint 2018.
[19]
Elnaggar A, Heinzinger M, Dallago C, et al. ProtTrans: Towards cracking the language of Life's code through self-supervised deep learning and high performance computing. arXiv preprint 2020.
[20]
Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: Machine Learning in Python. J Mach Learn Res 2011; 12: 2825-30.
[21]
Mukherjee O, Khare A, Verma A. A simple dynamic learning rate tuning algorithm for automated training of DNNs. arXiv preprint 2018.
[22]
Prechelt L. Automatic early stopping using cross validation: Quantifying the criteria. Neural Netw 1998; 11(4): 761-7.
[http://dx.doi.org/10.1016/S0893-6080(98)00010-0] [PMID: 12662814]
[23]
Breiman L. Random forests. Mach Learn 2001; 45: 5-32.
[http://dx.doi.org/10.1023/A:1010933404324]
[24]
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: Synthetic minority over-sampling technique. J Artif Intell Res 2002; 16: 321-57.
[http://dx.doi.org/10.1613/jair.953]

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy