Abstract
Background: Many malarial infections are caused by Plasmodium falciparum. Accurate classification of the proteins secreted by the malaria parasite, which are essential for the development of anti-malarial drugs, is necessary.
Objective: This study aimed at accurately classifying the proteins secreted by the malaria parasite.
Methods: Therefore, in order to improve the accuracy of the prediction of Plasmodium secreted proteins, we established a classification model MGAP-SGD. MonodikGap features (k=7) of the secreted proteins were extracted, and then the optimal features were selected by the AdaBoost method. Finally, based on the optimal set of secreted proteins, the model was used to predict the secreted proteins using the Stochastic Gradient Descent (SGD) algorithm.
Results: We used a 10-fold cross-validation set and independent test set in the stochastic gradient descent (SGD) classifier to validate the model, and the accuracy rates were found to be 98.5859% and 97.973%, respectively.
Conclusion: This study confirms the effectiveness and robustness of the prediction results of the MGAP-SGD model that can meet the prediction requirements of the secreted proteins of Plasmodium.
Keywords: Plasmodium, Top-n-gram, MonoDiKGap, dimensionality reduction, cross-validation, features.
[http://dx.doi.org/10.1016/j.vaccine.2018.05.082] [PMID: 29921492]
[http://dx.doi.org/10.1038/nature03370] [PMID: 15758999]
[http://dx.doi.org/10.1182/blood-2007-09-115279] [PMID: 18057226]
[http://dx.doi.org/10.1074/mcp.M900029-MCP200] [PMID: 19494339]
[http://dx.doi.org/10.1002/jcc.24210] [PMID: 26484844]
[http://dx.doi.org/10.1093/bioinformatics/btaa131] [PMID: 32105326]
[http://dx.doi.org/10.1093/bib/bbaa367] [PMID: 33313672]
[http://dx.doi.org/10.1093/bib/bbaa043] [PMID: 32363401]
[http://dx.doi.org/10.1007/s00726-009-0292-1] [PMID: 19387791]
[http://dx.doi.org/10.1371/journal.pone.0049040] [PMID: 23189138]
[http://dx.doi.org/10.1007/BF01886884] [PMID: 8561854]
[http://dx.doi.org/10.1007/s12539-015-0112-0] [PMID: 26286010]
[http://dx.doi.org/10.3389/fpsyg.2013.00863] [PMID: 24324449]
[http://dx.doi.org/10.1021/ci025620t] [PMID: 12653536]
[http://dx.doi.org/10.1093/bioinformatics/btz165] [PMID: 30850831]
[http://dx.doi.org/10.1016/j.omtn.2020.07.035] [PMID: 33294291]
[http://dx.doi.org/10.1186/1471-2105-9-201] [PMID: 18416838]
[http://dx.doi.org/10.1002/9781118445112.stat00408.pub2]
[http://dx.doi.org/10.4028/www.scientific.net/AMM.411-414.1185]
[http://dx.doi.org/10.1186/1471-2105-9-510] [PMID: 19046430]
[http://dx.doi.org/10.1093/bioinformatics/bth141] [PMID: 14988126]
[http://dx.doi.org/10.1093/bioinformatics/bti801] [PMID: 16317074]
[http://dx.doi.org/10.2174/1574893615999200503030350]
[http://dx.doi.org/10.1016/j.knosys.2018.10.007]
[http://dx.doi.org/10.7150/ijbs.24174] [PMID: 29989085]
[http://dx.doi.org/10.1016/j.inffus.2021.02.015]
[http://dx.doi.org/10.1016/j.neucom.2014.12.123]
[http://dx.doi.org/10.1088/1681-7575/ab2d53]
[http://dx.doi.org/10.1093/bioinformatics/btv413] [PMID: 26163693]
[http://dx.doi.org/10.1111/j.2517-6161.1996.tb02080.x]
[http://dx.doi.org/10.1142/S0219720005001004] [PMID: 15852500]
[http://dx.doi.org/10.1093/bioinformatics/bty931] [PMID: 30407530]
[http://dx.doi.org/10.1093/nar/15.5.2343] [PMID: 3550697]
[http://dx.doi.org/10.1007/978-1-4842-2766-4_8]
[http://dx.doi.org/10.1016/j.neucom.2019.11.103]
[http://dx.doi.org/10.1016/j.knosys.2020.106254]
[http://dx.doi.org/10.1007/s00521-019-04569-z]
[http://dx.doi.org/10.1016/j.neucom.2018.10.028]
[http://dx.doi.org/10.1109/TCBB.2017.2670558] [PMID: 28222000]
[http://dx.doi.org/10.1109/TCBB.2013.146] [PMID: 26355518]
[http://dx.doi.org/10.1016/j.artmed.2017.03.001] [PMID: 28320624]
[http://dx.doi.org/10.1016/j.artmed.2017.02.005] [PMID: 28245947]
[http://dx.doi.org/10.1016/j.jpdc.2017.08.009]
[http://dx.doi.org/10.1155/2021/6664362] [PMID: 33505515]
[http://dx.doi.org/10.1093/bioinformatics/btz418] [PMID: 31116390]
[PMID: 31588505]
[http://dx.doi.org/10.1371/journal.pcbi.1005420] [PMID: 28594838]
[http://dx.doi.org/10.3389/fbioe.2020.584807] [PMID: 33195148]
[http://dx.doi.org/10.1371/journal.pone.0177678] [PMID: 28574989]
[http://dx.doi.org/10.1093/bib/bbaa409] [PMID: 33443536]
[http://dx.doi.org/10.1093/bib/bbaa159] [PMID: 32778871]
[http://dx.doi.org/10.1016/j.jtbi.2018.11.012] [PMID: 30452958]
[http://dx.doi.org/10.1109/TCBB.2018.2858756] [PMID: 30040651]
[http://dx.doi.org/10.1016/j.knosys.2019.04.025]
[http://dx.doi.org/10.1093/bib/bby124] [PMID: 30649170]
[http://dx.doi.org/10.1039/C9SC04336E] [PMID: 34123272]
[http://dx.doi.org/10.1093/bib/bbz080] [PMID: 31612203]
[http://dx.doi.org/10.1093/bioinformatics/bty002] [PMID: 29365045]
[http://dx.doi.org/10.1504/IJDMB.2013.056078] [PMID: 24417022]
[http://dx.doi.org/10.1186/1752-0509-4-S1-S2] [PMID: 20522252]
[http://dx.doi.org/10.3389/fcell.2020.591487] [PMID: 33195258]
[http://dx.doi.org/10.1093/bioinformatics/btz432] [PMID: 31135038]