iAnt: Combination of Convolutional Neural Network and Random Forest Models Using PSSM and BERT Features to Identify Antioxidant Proteins

Hoang   V.   Tran; Quang   H.   Nguyen

doi:10.2174/1574893616666210820095144

Abstract

Background: Reactive Oxygen Species (ROS) play many roles in the body, such as cell signaling, homeostasis, or protection from harmful bacteria. However, an excess of ROS in the body will damage lipids, proteins, and DNA. Many studies have shown that various environmental factors increase the amount of ROS produced in the body. Antioxidant proteins are responsible for neutralizing these ROS or free radicals. Although the amount of data on protein sequences has increased over the last two decades, we still lack bioinformatics tools to be able to accurately identify antioxidant protein sequences. Furthermore, biochemical methods to determine antioxidant proteins are very expensive and time-consuming. Therefore, a machine learning approach must be used to speed up the computation.

Methods: In this study, we propose a new method that combines a convolutional neural network and Random Forest using two features, the normalized PSSM and the best-selected feature of the ProtBert output.

Results: Our model gave very good results on the independent test dataset with 97.3% sensitivity and 95.9% specificity. Comparison with current state-of-the-art models shows that our model is superior. We have also installed iAnt as an online website with a friendly interface available at a website: http: //antixiodant.nguyenhongquang.edu.vn.

Conclusion: iAnt has been developed to accurately identify the antioxidant protein. It shows results outperforming the existing state-of-the-art methods; it is also available online.

Keywords: Antioxidant protein, convolutional neural network, random forest, feature selection, position-specific scoring matrix, ProtBert.

« Previous Next »

Graphical Abstract

[1] 
Halliwell B, Chirico S. Lipid peroxidation: Its mechanism, measurement, and significance. Am J Clin Nutr  1993; 57(5)(Suppl.): 715S-24S.
[http://dx.doi.org/10.1093/ajcn/57.5.715S] [PMID:  8475889] 
[2] 
Powers SK, Jackson MJ. Exercise-induced oxidative stress: Cellular mechanisms and impact on muscle force production. Physiol Rev  2008; 88(4): 1243-76.
[http://dx.doi.org/10.1152/physrev.00031.2007] [PMID:  18923182] 
[3] 
Butt AH, Rasool N, Khan YD. Prediction of antioxidant proteins by incorporating statistical moments based features into Chou’s PseAAC. J Theor Biol  2019; 473: 1-8.
[http://dx.doi.org/10.1016/j.jtbi.2019.04.019] [PMID:  31005614] 
[4] 
Barrera G. Oxidative stress and lipid peroxidation products in cancer progression and therapy. ISRN Oncol  2012; 2012: 137289.
[http://dx.doi.org/10.5402/2012/137289] [PMID:  23119185] 
[5] 
Simon HU, Haj-Yehia A, Levi-Schaffer F. Role of reactive oxygen species (ROS) in apoptosis induction. Apoptosis  2000; 5(5): 415-8.
[http://dx.doi.org/10.1023/A:1009616228304] [PMID:  11256882] 
[6] 
Sivapriya ML. Srinivas. Isolation and purification of a novel antioxidant protein from the water extract of Sundakai (Solanum torvum) seeds. Food Chem X  2007; 104: 510-7.
[http://dx.doi.org/10.1016/j.foodchem.2006.11.060] 
[7] 
Piconi L, Quagliaro L, Ceriello A. Oxidative stress in diabetes. Clin Chem Lab Med  2003; 41(9): 1144-9.
[http://dx.doi.org/10.1515/CCLM.2003.177] [PMID:  14598863] 
[8] 
Snezhkina AV, Kudryavtseva AV, Kardymon OL, et al. ROS generation and antioxidant defense systems in normal and malignant cells. Oxid Med Cell Longev  2019; 2019: 6175804.
[http://dx.doi.org/10.1155/2019/6175804] [PMID:  31467634] 
[9] 
Feng PM, Lin H, Chen W. Identification of antioxidants from sequence information using naïve Bayes. Comput Math Methods Med  2013; 2013: 567529.
[http://dx.doi.org/10.1155/2013/567529] [PMID:  24062796] 
[10] 
Feng P, Chen W, Lin H. Identifying antioxidant proteins by using optimal dipeptide compositions. Interdiscip Sci  2016; 8(2): 186-91.
[http://dx.doi.org/10.1007/s12539-015-0124-9] [PMID:  26345449] 
[11] 
Zhang L, Zhang C, Gao R, Yang R, Song Q. Sequence based prediction of antioxidant proteins using a classifier selection strategy. PLoS One  2016; 11(9): e0163274.
[http://dx.doi.org/10.1371/journal.pone.0163274] [PMID:  27662651] 
[12] 
Xu L, Liang G, Shi S, Liao C. SeqSVM: A sequence-based support vector machine method for identifying antioxidant proteins. Int J Mol Sci  2018; 19(6): 1773.
[http://dx.doi.org/10.3390/ijms19061773] [PMID:  29914044] 
[13] 
Meng C, Jin S, Wang L, Guo F, Zou Q. AOPs-SVM: A Sequence-based classifier of antioxidant proteins using a support vector machine. Front Bioeng Biotechnol  2019; 7: 224.
[http://dx.doi.org/10.3389/fbioe.2019.00224] [PMID:  31620433] 
[14] 
Li X, Tang Q, Tang H, Chen W. Identifying antioxidant proteins by combining multiple methods. Front Bioeng Biotechnol  2020; 8: 858.
[http://dx.doi.org/10.3389/fbioe.2020.00858] [PMID:  32793581] 
[15] 
Ao C, Zhou W, Gao L, Dong B, Yu L. Prediction of antioxidant proteins using hybrid feature representation method and random forest. Genomics  2020; 112(6): 4666-74.
[http://dx.doi.org/10.1016/j.ygeno.2020.08.016] [PMID:  32818637] 
[16] 
Ho Thanh Lam L, Le NH, Van Tuan L, et al. Machine learning model for identifying antioxidant proteins using features calculated from primary sequences. Biology (Basel)  2020; 9(10): 325.
[http://dx.doi.org/10.3390/biology9100325] [PMID:  33036150] 
[17] 
Altschul SF, Madden TL, Schäffer AA, et al. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res  1997; 25(17): 3389-402.
[http://dx.doi.org/10.1093/nar/25.17.3389] [PMID:  9254694] 
[18] 
Devlin J, Chang M, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint 2018.
[19] 
Elnaggar A, Heinzinger M, Dallago C, et al. ProtTrans: Towards cracking the language of Life's code through self-supervised deep learning and high performance computing. arXiv preprint 2020.
[20] 
Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: Machine Learning in Python. J Mach Learn Res  2011; 12: 2825-30.
[21] 
Mukherjee O, Khare A, Verma A. A simple dynamic learning rate tuning algorithm for automated training of DNNs. arXiv preprint 2018.
[22] 
Prechelt L. Automatic early stopping using cross validation: Quantifying the criteria. Neural Netw  1998; 11(4): 761-7.
[http://dx.doi.org/10.1016/S0893-6080(98)00010-0] [PMID:  12662814] 
[23] 
Breiman L. Random forests. Mach Learn  2001; 45: 5-32.
[http://dx.doi.org/10.1023/A:1010933404324] 
[24] 
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: Synthetic minority over-sampling technique. J Artif Intell Res  2002; 16: 321-57.
[http://dx.doi.org/10.1613/jair.953] 

Rights & Permissions Print Cite

Article Metrics

25

1

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/1574893616666210820095144	Print ISSN 1574-8936
Publisher Name Bentham Science Publisher	Online ISSN 2212-392X

Current Bioinformatics

iAnt: Combination of Convolutional Neural Network and Random Forest Models Using PSSM and BERT Features to Identify Antioxidant Proteins

Abstract Play Pause

Graphical Abstract

Related Journals

Related Books

Abstract