Abstract
Background: Reactive Oxygen Species (ROS) play many roles in the body, such as cell signaling, homeostasis, or protection from harmful bacteria. However, an excess of ROS in the body will damage lipids, proteins, and DNA. Many studies have shown that various environmental factors increase the amount of ROS produced in the body. Antioxidant proteins are responsible for neutralizing these ROS or free radicals. Although the amount of data on protein sequences has increased over the last two decades, we still lack bioinformatics tools to be able to accurately identify antioxidant protein sequences. Furthermore, biochemical methods to determine antioxidant proteins are very expensive and time-consuming. Therefore, a machine learning approach must be used to speed up the computation.
Methods: In this study, we propose a new method that combines a convolutional neural network and Random Forest using two features, the normalized PSSM and the best-selected feature of the ProtBert output.
Results: Our model gave very good results on the independent test dataset with 97.3% sensitivity and 95.9% specificity. Comparison with current state-of-the-art models shows that our model is superior. We have also installed iAnt as an online website with a friendly interface available at a website: http: //antixiodant.nguyenhongquang.edu.vn.
Conclusion: iAnt has been developed to accurately identify the antioxidant protein. It shows results outperforming the existing state-of-the-art methods; it is also available online.
Keywords: Antioxidant protein, convolutional neural network, random forest, feature selection, position-specific scoring matrix, ProtBert.
Graphical Abstract
[http://dx.doi.org/10.1093/ajcn/57.5.715S] [PMID: 8475889]
[http://dx.doi.org/10.1152/physrev.00031.2007] [PMID: 18923182]
[http://dx.doi.org/10.1016/j.jtbi.2019.04.019] [PMID: 31005614]
[http://dx.doi.org/10.5402/2012/137289] [PMID: 23119185]
[http://dx.doi.org/10.1023/A:1009616228304] [PMID: 11256882]
[http://dx.doi.org/10.1016/j.foodchem.2006.11.060]
[http://dx.doi.org/10.1515/CCLM.2003.177] [PMID: 14598863]
[http://dx.doi.org/10.1155/2019/6175804] [PMID: 31467634]
[http://dx.doi.org/10.1155/2013/567529] [PMID: 24062796]
[http://dx.doi.org/10.1007/s12539-015-0124-9] [PMID: 26345449]
[http://dx.doi.org/10.1371/journal.pone.0163274] [PMID: 27662651]
[http://dx.doi.org/10.3390/ijms19061773] [PMID: 29914044]
[http://dx.doi.org/10.3389/fbioe.2019.00224] [PMID: 31620433]
[http://dx.doi.org/10.3389/fbioe.2020.00858] [PMID: 32793581]
[http://dx.doi.org/10.1016/j.ygeno.2020.08.016] [PMID: 32818637]
[http://dx.doi.org/10.3390/biology9100325] [PMID: 33036150]
[http://dx.doi.org/10.1093/nar/25.17.3389] [PMID: 9254694]
[http://dx.doi.org/10.1016/S0893-6080(98)00010-0] [PMID: 12662814]
[http://dx.doi.org/10.1023/A:1010933404324]
[http://dx.doi.org/10.1613/jair.953]