A Brief Survey of Machine Learning Methods in Identification of Mitochondria Proteins in Malaria Parasite

Ting	      Liu; Hua	      Tang
doi:10.2174/1381612826666200310122324
Abstract

The number of human deaths caused by malaria is increasing day-by-day. In fact, the mitochondrial proteins of the malaria parasite play vital roles in the organism. For developing effective drugs and vaccines against infection, it is necessary to accurately identify mitochondrial proteins of the malaria parasite. Although precise details for the mitochondrial proteins can be provided by biochemical experiments, they are expensive and time-consuming. In this review, we summarized the machine learning-based methods for mitochondrial proteins identification in the malaria parasite and compared the construction strategies of these computational methods. Finally, we also discussed the future development of mitochondrial proteins recognition with algorithms.
Keywords: Mitochondria proteins, malaria parasite, machine learning, database, feature, infection.
« Previous Next »
[1] 
Vaidya AB, Mather MW. Mitochondrial evolution and functions in malaria parasites. Annu Rev Microbiol  2009; 63: 249-67.
[http://dx.doi.org/10.1146/annurev.micro.091208.073424] [PMID: 19575561] 
[2] 
Hikosaka K, Komatsuya K, Suzuki S, Kita K. Mitochondria of Malaria Parasites as a Drug Target.  An Overview of Tropical Diseases   2015;;  17-38.
[http://dx.doi.org/10.5772/61283] 
[3] 
Mather MW, Vaidya AB. Mitochondria in malaria and related parasites: ancient, diverse and streamlined. J Bioenerg Biomembr  2008; 40(5): 425-33.
[http://dx.doi.org/10.1007/s10863-008-9176-4] [PMID: 18814021] 
[4] 
Bender A, van Dooren GG, Ralph SA, McFadden GI, Schneider G. Properties and prediction of mitochondrial transit peptides from Plasmodium falciparum. Mol Biochem Parasitol  2003; 132(2): 59-66.
[http://dx.doi.org/10.1016/j.molbiopara.2003.07.001] [PMID: 14599665] 
[5] 
Verma R, Varshney GC, Raghava GP. Prediction of mitochondrial proteins of malaria parasite using split amino acid composition and PSSM profile. Amino Acids  2010; 39(1): 101-10.
[http://dx.doi.org/10.1007/s00726-009-0381-1] [PMID: 19908123] 
[6] 
Zuo YC, Peng Y, Liu L, Chen W, Yang L, Fan GL. Predicting peroxidase subcellular location by hybridizing different descriptors of Chou’ pseudo amino acid patterns. Anal Biochem  2014; 458: 14-9.
[http://dx.doi.org/10.1016/j.ab.2014.04.032] [PMID: 24802134] 
[7] 
Afridi TH, Khan A, Lee YS. Mito-GSAAC: mitochondria prediction using genetic ensemble classifier and split amino acid composition. Amino Acids  2012; 42(4): 1443-54.
[http://dx.doi.org/10.1007/s00726-011-0888-0] [PMID: 21445589] 
[8] 
Jia C, Liu T, Chang AK, Zhai Y. Prediction of mitochondrial proteins of malaria parasite using bi-profile Bayes feature extraction. Biochimie  2011; 93(4): 778-82.
[http://dx.doi.org/10.1016/j.biochi.2011.01.013] [PMID: 21281691] 
[9] 
Zuo YC, Li QZ. Using reduced amino acid composition to predict defensin family and subfamily: Integrating similarity measure and structural alphabet. Peptides  2009; 30(10): 1788-93.
[http://dx.doi.org/10.1016/j.peptides.2009.06.032] [PMID: 19591890] 
[10] 
Zuo Y, Lv Y, Wei Z, Yang L, Li G, Fan G. iDPF-PseRAAAC: A web-server for identifying the defensin peptide family and subfamily using pseudo reduced amino acid alphabet composition. PLoS One  2015; 10(12)e0145541
[http://dx.doi.org/10.1371/journal.pone.0145541] [PMID: 26713618] 
[11] 
Chen YL, Li QZ, Zhang LQ. Using increment of diversity to predict mitochondrial proteins of malaria parasite: integrating pseudo-amino acid composition and structural alphabet. Amino Acids  2012; 42(4): 1309-16.
[http://dx.doi.org/10.1007/s00726-010-0825-7] [PMID: 21191803] 
[12] 
Mirza MT, Khan A, Tahir M, Lee YS. MitProt-Pred: Predicting mitochondrial proteins of Plasmodium falciparum parasite using diverse physiochemical properties and ensemble classification. Comput Biol Med  2013; 43(10): 1502-11.
[http://dx.doi.org/10.1016/j.compbiomed.2013.07.024] [PMID: 24034742] 
[13] 
Ding H, Li D. Identification of mitochondrial proteins of malaria parasite using analysis of variance. Amino Acids  2015; 47(2): 329-33.
[http://dx.doi.org/10.1007/s00726-014-1862-4] [PMID: 25385313] 
[14] 
Feng YG, Xie WX. Identification of mitochondrial proteins of malaria parasite adding the new parameter. Lett Org Chem  2019; 16: 258-62.
[http://dx.doi.org/10.2174/1570178615666180608100348] 
[15] 
Cui T, Zhang L, Huang Y, et al. MNDR v2.0: an updated resource of ncRNA-disease associations in mammals. Nucleic Acids Res  2018; 46(D1): D371-4.
[PMID: 29106639] 
[16] 
Zhang T, Tan P, Wang L, et al. RNALocate: a resource for RNA subcellular localizations. Nucleic Acids Res  2017; 45(D1): D135-8.
[PMID: 27543076] 
[17] 
Yi Y, Zhao Y, Li C, et al. RAID v2.0: an updated resource of RNA-associated interactions across organisms. Nucleic Acids Res  2017; 45(D1): D115-8.
[http://dx.doi.org/10.1093/nar/gkw1052] [PMID: 27899615] 
[18] 
Feng P, Ding H, Lin H, Chen W. AOD: the antioxidant protein database. Sci Rep  2017; 7(1): 7449.
[http://dx.doi.org/10.1038/s41598-017-08115-6] [PMID: 28784999] 
[19] 
Tang H, Zou P, Zhang C, Chen R, Chen W, Lin H. Identification of apolipoprotein using feature selection technique. Sci Rep  2016; 6: 30441.
[http://dx.doi.org/10.1038/srep30441] [PMID: 27443605] 
[20] 
Liang ZY, Lai HY, Yang H, et al. Pro54DB: a database for experimentally verified sigma-54 promoters. Bioinformatics  2017; 33(3): 467-9.
[PMID: 28171531] 
[21] 
Cheng L, Wang P, Tian R, et al. LncRNA2Target v2.0: a comprehensive database for target genes of lncRNAs in human and mouse. Nucleic Acids Res  2019; 47(D1): D140-4.
[http://dx.doi.org/10.1093/nar/gky1051] [PMID: 30380072] 
[22] 
Cheng L, Yang H, Zhao H, et al. MetSigDis: a manually curated resource for the metabolic signatures of diseases. Brief Bioinform  2019; 20(1): 203-9.
[http://dx.doi.org/10.1093/bib/bbx103] [PMID: 28968812] 
[23] 
Deng L, Wang J, Zhang J. Predicting gene ontology function of human micrornas by integrating multiple networks. Front Genet  2019; 10: 3.
[http://dx.doi.org/10.3389/fgene.2019.00003] [PMID: 30761178] 
[24] 
Hu B, Zheng L, Long C, et al. EmExplorer: a database for exploring time activation of gene expression in mammalian embryos. Open Biol  2019; 9(6)190054
[http://dx.doi.org/10.1098/rsob.190054] [PMID: 31164042] 
[25] 
Long CS, Li W, Liang PF, Liu S, Zuo YC. Transcriptome comparisons of multi-species identify differential genome activation of mammals embryogenesis.  IEEE Access   2019;;  7:  7794-802.
[http://dx.doi.org/10.1109/ACCESS.2018.2889809] 
[26] 
Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics  2006; 22(13): 1658-9.
[http://dx.doi.org/10.1093/bioinformatics/btl158] [PMID: 16731699] 
[27] 
Huang Y, Niu B, Gao Y, Fu L, Li W. CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics  2010; 26(5): 680-2.
[http://dx.doi.org/10.1093/bioinformatics/btq003] [PMID: 20053844] 
[28] 
Zou Q, Lin G, Jiang X, Liu X, Zeng X. Sequence clustering in bioinformatics: an empirical study.  Brief Bioinform  2018..  Online ahead of print.
[http://dx.doi.org/10.1093/bib/bby090] [PMID: 30239587] 
[29] 
Chou KC, Zhang CT. Predicting protein folding types by distance functions that make allowances for amino acid interactions. J Biol Chem  1994; 269(35): 22014-20.
[PMID: 8071322] 
[30] 
Chou KC. A novel approach to predicting protein structural classes in a (20-1)-D amino acid composition space. Proteins  1995; 21(4): 319-44.
[http://dx.doi.org/10.1002/prot.340210406] [PMID: 7567954] 
[31] 
Hayat M, Khan A. Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition. J Theor Biol  2011; 271(1): 10-7.
[http://dx.doi.org/10.1016/j.jtbi.2010.11.017] [PMID: 21110985] 
[32] 
Chou KC, Shen HB. Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization. Biochem Biophys Res Commun  2006; 347(1): 150-7.
[http://dx.doi.org/10.1016/j.bbrc.2006.06.059] [PMID: 16808903] 
[33] 
Chauhan JS, Mishra NK, Raghava GP. Identification of ATP binding residues of a protein from its primary sequence. BMC Bioinformatics  2009; 10: 434.
[http://dx.doi.org/10.1186/1471-2105-10-434] [PMID: 20021687] 
[34] 
Wang X, Mi G, Wang C, et al. Prediction of flavin mono-nucleotide binding sites using modified PSSM profile and ensemble support vector machine. Comput Biol Med  2012; 42(11): 1053-9.
[http://dx.doi.org/10.1016/j.compbiomed.2012.08.005] [PMID: 22985817] 
[35] 
Wang CC, Fang Y, Xiao J, Li M. Identification of RNA-binding sites in proteins by integrating various sequence information. Amino Acids  2011; 40(1): 239-48.
[http://dx.doi.org/10.1007/s00726-010-0639-7] [PMID: 20549269] 
[36] 
Guang X, Guo Y, Xiao J, et al. Predicting the state of cysteines based on sequence information. J Theor Biol  2010; 267(3): 312-8.
[http://dx.doi.org/10.1016/j.jtbi.2010.09.002] [PMID: 20826168] 
[37] 
Xiong W, Guo Y, Li M. Prediction of lipid-binding sites based on support vector machine and position specific scoring matrix. Protein J  2010; 29(6): 427-31.
[http://dx.doi.org/10.1007/s10930-010-9269-x] [PMID: 20658312] 
[38] 
Chou KC. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins-Structure Function Genetics   2001;;  44( 60)
[http://dx.doi.org/10.1002/prot.1072] 
[39] 
Naveed M, Khan A. GPCR-MPredictor: multi-level prediction of G protein-coupled receptors using genetic ensemble. Amino Acids  2012; 42(5): 1809-23.
[http://dx.doi.org/10.1007/s00726-011-0902-6] [PMID: 21505826] 
[40] 
Fan GL, Li QZ. Predicting protein submitochondria locations by combining different descriptors into the general form of Chou’s pseudo amino acid composition. Amino Acids  2012; 43(2): 545-55.
[http://dx.doi.org/10.1007/s00726-011-1143-4] [PMID: 22102053] 
[41] 
Ur-Rehman Z, Khan A. G-protein-coupled receptor prediction using pseudo-amino-acid composition and multiscale energy representation of different physiochemical properties. Anal Biochem  2011; 412(2): 173-82.
[http://dx.doi.org/10.1016/j.ab.2011.01.040] [PMID: 21295004] 
[42] 
Tang H, Chen W, Lin H. Identification of immunoglobulins using Chou’s pseudo amino acid composition with feature selection technique. Mol Biosyst  2016; 12(4): 1269-75.
[http://dx.doi.org/10.1039/C5MB00883B] [PMID: 26883492] 
[43] 
Zuo Y, Li Y, Chen Y, Li G, Yan Z, Yang L. PseKRAAC: a flexible web server for generating pseudo K-tuple reduced amino acids composition. Bioinformatics  2017; 33(1): 122-4.
[http://dx.doi.org/10.1093/bioinformatics/btw564] [PMID: 27565583] 
[44] 
Nanni L, Lumini A, Gupta D, Garg A. Identifying bacterial virulent proteins by fusing a set of classifiers based on variants of Chou’s pseudo amino acid composition and on evolutionary information. IEEE/ACM Trans Comput Biol Bioinformatics  2012; 9(2): 467-75.
[http://dx.doi.org/10.1109/TCBB.2011.117] [PMID: 21860064] 
[45] 
Mohabatkar H, Beigi MM, Abdolahi K, Mohsenzadeh S. Prediction of allergenic proteins by means of the concept of Chou’s pseudo amino acid composition and a machine learning approach. Med Chem  2013; 9(1): 133-7.
[http://dx.doi.org/10.2174/157340613804488341] [PMID: 22931491] 
[46] 
Hajisharifi Z, Piryaiee M, Mohammad Beigi M, Behbahani M, Mohabatkar H. Predicting anticancer peptides with Chou’s pseudo amino acid composition and investigating their mutagenicity via Ames test. J Theor Biol  2014; 341: 34-40.
[http://dx.doi.org/10.1016/j.jtbi.2013.08.037] [PMID: 24035842] 
[47] 
Khosravian M, Faramarzi FK, Beigi MM, Behbahani M, Mohabatkar H. Predicting antibacterial peptides by the concept of Chou’s pseudo-amino acid composition and machine learning methods. Protein Pept Lett  2013; 20(2): 180-6.
[http://dx.doi.org/10.2174/092986613804725307] [PMID: 22894156] 
[48] 
Esmaeili M, Mohabatkar H, Mohsenzadeh S. Using the concept of Chou’s pseudo amino acid composition for risk type prediction of human papillomaviruses. J Theor Biol  2010; 263(2): 203-9.
[http://dx.doi.org/10.1016/j.jtbi.2009.11.016] [PMID: 19961864] 
[49] 
Feng PM, Ding H, Chen W, Lin H. Naïve Bayes classifier with feature selection to identify phage virion proteins. Comput Math Methods Med  2013; 2013530696
[http://dx.doi.org/10.1155/2013/530696] [PMID: 23762187] 
[50] 
Feng PM, Lin H, Chen W. Identification of antioxidants from sequence information using naïve Bayes. Comput Math Methods Med  2013; 2013567529
[http://dx.doi.org/10.1155/2013/567529] [PMID: 24062796] 
[51] 
Yang H, Tang H, Chen XX, et al. Identification of secretory proteins in Mycobacterium tuberculosis using pseudo amino acid composition. BioMed Res Int  2016; 20165413903
[http://dx.doi.org/10.1155/2016/5413903] [PMID: 27597968] 
[52] 
Chen XX, Tang H, Li WC, et al. Identification of bacterial cell wall lyases via pseudo amino acid composition. BioMed Res Int  2016; 20161654623
[http://dx.doi.org/10.1155/2016/1654623] [PMID: 27437396] 
[53] 
Pace CN, Fu H, Fryar KL, et al. Contribution of hydrophobic interactions to protein stability. J Mol Biol  2011; 408(3): 514-28.
[http://dx.doi.org/10.1016/j.jmb.2011.02.053] [PMID: 21377472] 
[54] 
Hopp TP, Woods KR. Prediction of protein antigenic determinants from amino acid sequences. Proc Natl Acad Sci USA  1981; 78(6): 3824-8.
[http://dx.doi.org/10.1073/pnas.78.6.3824] [PMID: 6167991] 
[55] 
Hofmann HJ, Hädge D. On the theoretical prediction of protein antigenic determinants from amino acid sequences. Biomed Biochim Acta  1987; 46(11): 855-66.
[PMID: 2451516] 
[56] 
Laxton RR. The measure of diversity. J Theor Biol  1978; 70(1): 51-67.
[http://dx.doi.org/10.1016/0022-5193(78)90302-8] [PMID: 625122] 
[57] 
Li QZ, Lu ZQ. The prediction of the structural class of protein: application of the measure of diversity. J Theor Biol  2001; 213(3): 493-502.
[http://dx.doi.org/10.1006/jtbi.2001.2441] [PMID: 11735294] 
[58] 
Shi R, Hu X. Predicting enzyme subclasses by using support vector machine with composite vectors. Protein Pept Lett  2010; 17(5): 599-604.
[http://dx.doi.org/10.2174/092986610791112710] [PMID: 19645687] 
[59] 
Shao J, Xu D, Tsai SN, Wang Y, Ngai SM. Computational identification of protein methylation sites through bi-profile Bayes feature extraction. PLoS One  2009; 4(3)e4920
[http://dx.doi.org/10.1371/journal.pone.0004920] [PMID: 19290060] 
[60] 
Song J, Tan H, Shen H, et al. Cascleave: towards more accurate prediction of caspase substrate cleavage sites. Bioinformatics  2010; 26(6): 752-60.
[http://dx.doi.org/10.1093/bioinformatics/btq043] [PMID: 20130033] 
[61] 
Liu W, Chou KC. Prediction of protein secondary structure content. Protein Eng  1999; 12(12): 1041-50.
[http://dx.doi.org/10.1093/protein/12.12.1041] [PMID: 10611397] 
[62] 
Liu D, Li G, Zuo Y. Function determinants of TET proteins: the arrangements of sequence motifs with specific codes. Brief Bioinform  2018; 2018: 10.
[http://dx.doi.org/10.1093/bib/bby053] [PMID: 29947743] 
[63] 
Sibley AB, Cosman M, Krishnan VV. An empirical correlation between secondary structure content and averaged chemical shifts in proteins. Biophys J  2003; 84(2 Pt 1): 1223-7.
[http://dx.doi.org/10.1016/S0006-3495(03)74937-6] [PMID: 12547802] 
[64] 
Mielke SP, Krishnan VV. Protein structural class identification directly from NMR spectra using averaged chemical shifts. Bioinformatics  2003; 19(16): 2054-64.
[http://dx.doi.org/10.1093/bioinformatics/btg280] [PMID: 14594710] 
[65] 
Zhu XJ, Feng CQ, Lai HY, Chen W, Lin H. Predicting protein structural classes for low-similarity sequences by evaluating different features. Knowl Base Syst  2019; 163: 787-93.
[http://dx.doi.org/10.1016/j.knosys.2018.10.007] 
[66] 
Ding H, Deng EZ, Yuan LF, et al. iCTX-type: a sequence-based predictor for identifying the types of conotoxins in targeting ion channels. BioMed Res Int  2014; 2014286419
[http://dx.doi.org/10.1155/2014/286419] [PMID: 24991545] 
[67] 
Tan JX, Li SH, Zhang ZM, et al. Identification of hormone binding proteins based on machine learning methods. Math Biosci Eng  2019; 16(4): 2466-80.
[http://dx.doi.org/10.3934/mbe.2019123] [PMID: 31137222] 
[68] 
Liu B, Chen J, Wang X. Protein remote homology detection by combining Chou’s distance-pair pseudo amino acid composition and principal component analysis. Mol Genet Genomics  2015; 290(5): 1919-31.
[http://dx.doi.org/10.1007/s00438-015-1044-4] [PMID: 25896721] 
[69] 
Zou Q, Wan S, Ju Y, Tang J, Zeng X. Pretata: predicting TATA binding proteins with novel features and dimensionality reduction strategy. BMC Syst Biol  2016; 10(Suppl. 4): 114.
[http://dx.doi.org/10.1186/s12918-016-0353-5] [PMID: 28155714] 
[70] 
Feng P, Lin H, Chen W, Zuo Y. Predicting the types of J-proteins using clustered amino acids. BioMed Res Int  2014; 2014935719
[http://dx.doi.org/10.1155/2014/935719] [PMID: 24804260] 
[71] 
Yu L, Sun X, Tian SW, Shi XY, Yan YL. Drug and nondrug classification based on deep learning with various feature selection strategies. Curr Bioinform  2018; 13: 253-9.
[http://dx.doi.org/10.2174/1574893612666170125124538] 
[72] 
Wang L, Wang Y, Chang Q. Feature selection methods for big data bioinformatics: A survey from the search perspective. Methods  2016; 111: 21-31.
[http://dx.doi.org/10.1016/j.ymeth.2016.08.014] [PMID: 27592382] 
[73] 
Zou Q, Zeng J, Cao L, Ji R. A novel features ranking metric with application to scalable visual and bioinformatics data classification. Neurocomputing  2016; 173: 346-54.
[http://dx.doi.org/10.1016/j.neucom.2014.12.123] 
[74] 
Hess AS, Hess JR. Analysis of variance. Transfusion  2018; 58(10): 2255-6.
[http://dx.doi.org/10.1111/trf.14790] [PMID: 30203486] 
[75] 
Mitra V, Govorukhina N, Zwanenburg G, et al. Identification of analytical factors affecting complex proteomics profiles acquired in a factorial design study with analysis of variance: simultaneous component analysis. Anal Chem  2016; 88(8): 4229-38.
[http://dx.doi.org/10.1021/acs.analchem.5b03483] [PMID: 26959230] 
[76] 
Yang H, Lv H, Ding H, Chen W, Lin H. iRNA-2OM: A sequence-based predictor for identifying 2′-o-methylation sites in homo sapiens. J Comput Biolational  2018; 25: 1266-77.
[77] 
Feng CQ, Zhang ZY, Zhu XJ, et al. iTerm-PseKNC: a sequence-based tool for predicting bacterial transcriptional terminators. Bioinformatics  2019; 35(9): 1469-77.
[PMID: 30247625] 
[78] 
Dao FY, Lv H, Wang F, et al. Identify origin of replication in Saccharomyces cerevisiae using two-step feature selection technique. Bioinformatics  2018; 35(12)
[PMID: 30428009] 
[79] 
Rocchi L, Chiari L, Cappello A. Feature selection of stabilometric parameters based on principal component analysis. Med Biol Eng Comput  2004; 42(1): 71-9.
[http://dx.doi.org/10.1007/BF02351013] [PMID: 14977225] 
[80] 
Jia C, Zuo Y, Zou Q. O-GlcNAcPRED-II: an integrated classification algorithm for identifying O-GlcNAcylation sites based on fuzzy undersampling and a K-means PCA oversampling technique. Bioinformatics  2018; 34(12): 2029-36.
[http://dx.doi.org/10.1093/bioinformatics/bty039] [PMID: 29420699] 
[81] 
Lin H, Ding H, Guo FB, Huang J. Prediction of subcellular location of mycobacterial protein using feature selection techniques. Mol Divers  2010; 14(4): 667-71.
[http://dx.doi.org/10.1007/s11030-009-9205-1] [PMID: 19908156] 
[82] 
Zou Q, Xing P, Wei L, Liu B. Gene2vec: gene subsequence embedding for prediction of mammalian N6-methyladenosine sites from mRNA. RNA  2019; 25(2): 205-18.
[http://dx.doi.org/10.1261/rna.069112.118] [PMID: 30425123] 
[83] 
Supriya M, Deepa AJ. A novel approach for breast cancer prediction using optimized ANN classifier based on big data environment. Health Care Manage Sci 2019.
[http://dx.doi.org/10.1007/s10729-019-09498-w] [PMID: 31686276] 
[84] 
Jiang LM, Liao ZJ, Su R, Wei LY. Improved identification of cytokines using feature selection techniques. Lett Org Chem  2017; 14: 632-41.
[http://dx.doi.org/10.2174/1570178614666170227143434] 
[85] 
Lei GC, Tang JJ, Du PF. Predicting S-sulfenylation sites using physicochemical properties differences. Lett Org Chem  2017; 14: 665-72.
[http://dx.doi.org/10.2174/1570178614666170421164731] 
[86] 
Lin H, Liang ZY, Tang H, Chen W. Identifying sigma70 promoters with novel pseudo nucleotide composition. IEEE/ACM Trans Comput Biol Bioinformatics  2019; 16: 1316-21.
[PMID: 28186907] 
[87] 
Zhang J, Feng P, Lin H, Chen W. Identifying RNA N6-methyladenosine sites in Escherichia coli genome. Front Microbiol  2018; 9: 955.
[http://dx.doi.org/10.3389/fmicb.2018.00955] [PMID: 29867860] 
[88] 
Tang H, Zhao YW, Zou P, et al. HBPred: a tool to identify growth hormone-binding proteins. Int J Biol Sci  2018; 14(8): 957-64.
[http://dx.doi.org/10.7150/ijbs.24174] [PMID: 29989085] 
[89] 
Yang H, Qiu WR, Liu G, et al. iRSpot-Pse6NC: Identifying recombination spots in Saccharomyces cerevisiae by incorporating hexamer composition into general PseKNC. Int J Biol Sci  2018; 14(8): 883-91.
[http://dx.doi.org/10.7150/ijbs.24616] [PMID: 29989083] 
[90] 
Manavalan B, Shin TH, Lee G. PVP-SVM: Sequence-based prediction of phage virion proteins using a support vector machine. Front Microbiol  2018; 9: 476.
[http://dx.doi.org/10.3389/fmicb.2018.00476] [PMID: 29616000] 
[91] 
Manavalan B, Lee J. SVMQA: support-vector-machine-based protein single-model quality assessment. Bioinformatics  2017; 33(16): 2496-503.
[http://dx.doi.org/10.1093/bioinformatics/btx222] [PMID: 28419290] 
[92] 
Ye J, Chen W, Jin DC. Predicting the types of plant heat shock proteins. Lett Org Chem  2017; 14: 684-9.
[http://dx.doi.org/10.2174/1570178614666170221144023] 
[93] 
Yang H, Yang W, Dao FY, et al. A comparison and assessment of computational method for identifying recombination hotspots in Saccharomyces cerevisiae. Brief Bioinform 2019.bbz123
[http://dx.doi.org/10.1093/bib/bbz123] [PMID: 31633777] 
[94] 
Wu J, Zhang Q, Wu W, et al. WDL-RF: predicting bioactivities of ligand molecules acting with G protein-coupled receptors by combining weighted deep learning and random forest. Bioinformatics  2018; 34(13): 2271-82.
[http://dx.doi.org/10.1093/bioinformatics/bty070] [PMID: 29432522] 
[95] 
Xu L, Liang G, Liao C, Chen GD, Chang CC. k-Skip-n-Gram-RF: a random forest based method for alzheimer’s disease protein identification. Front Genet  2019; 10: 33.
[http://dx.doi.org/10.3389/fgene.2019.00033] [PMID: 30809242] 
[96] 
Ru X, Li L, Zou Q. Incorporating distance-based top-n-gram and random forest to identify electron transport proteins. J Proteome Res  2019; 18(7): 2931-9.
[http://dx.doi.org/10.1021/acs.jproteome.9b00250] [PMID: 31136183] 
[97] 
Su R, Liu X, Wei L, Zou Q. Deep-Resp-Forest: A deep forest model to predict anti-cancer drug response. Methods  2019; 166: 91-102.
[http://dx.doi.org/10.1016/j.ymeth.2019.02.009] [PMID: 30772464] 
[98] 
Lv H, Zhang ZM, Li SH, Tan JX, Chen W, Lin H. Evaluation of different computational methods on 5-methylcytosine sites identification. Brief Bioinform  2019; •••bbz048
[PMID: 31157855] 
[99] 
Cheng L, Hu Y, Sun J, Zhou M, Jiang Q. DincRNA: a comprehensive web-based bioinformatics toolkit for exploring disease associations and ncRNA function. Bioinformatics  2018; 34(11): 1953-6.
[http://dx.doi.org/10.1093/bioinformatics/bty002] [PMID: 29365045] 
[100] 
Cheng L, Jiang Y, Ju H, et al. InfAcrOnt: calculating cross-ontology term similarities using information flow by a random walk. BMC Genomics  2018; 19(Suppl. 1): 919.
[http://dx.doi.org/10.1186/s12864-017-4338-6] [PMID: 29363423] 
[101] 
Ferrando L, Cirmena G, Garuti A, et al. Development of a long non-coding RNA signature for prediction of response to neoadjuvant chemoradiotherapy in locally advanced rectal adenocarcinoma. PLoS One  2020; 15(2)e0226595
[http://dx.doi.org/10.1371/journal.pone.0226595] [PMID: 32023246] 
[102] 
Yang W, Zhu XJ, Huang J, Ding H, Lin H. A brief survey of machine learning methods in protein sub-Golgi localization. Curr Bioinform  2019; 14: 234-40.
[http://dx.doi.org/10.2174/1574893613666181113131415] 
[103] 
Lai HY, Zhang ZY, Su ZD, et al. iProEP: a computational predictor for predicting promoter. Mol Ther Nucleic Acids  2019; 17: 337-46.
[http://dx.doi.org/10.1016/j.omtn.2019.05.028] [PMID: 31299595] 
[104] 
Chen W, Yang H, Feng P, Ding H, Lin H. iDNA4mC: identifying DNA N4-methylcytosine sites based on nucleotide chemical properties. Bioinformatics  2017; 33(22): 3518-23.
[http://dx.doi.org/10.1093/bioinformatics/btx479] [PMID: 28961687] 
[105] 
Zhang W, Liu J, Xiong Y, Ke M, Zhang K. Predicting immunogenic T-cell epitopes by combining various sequence-derived features 2013.
[http://dx.doi.org/10.1109/BIBM.2013.6732451] 
[106] 
Zhu PP, Li WC, Zhong ZJ, et al. Predicting the subcellular localization of mycobacterial proteins by incorporating the optimal tripeptides into the general form of pseudo amino acid composition. Mol Biosyst  2015; 11(2): 558-63.
[http://dx.doi.org/10.1039/C4MB00645C] [PMID: 25437899] 
[107] 
Manavalan B, Shin TH, Lee G. DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest. Oncotarget  2017; 9(2): 1944-56.
[PMID: 29416743] 
[108] 
Manavalan B, Basith S, Shin TH, Choi S, Kim MO, Lee G. MLACP: machine-learning-based prediction of anticancer peptides. Oncotarget  2017; 8(44): 77121-36.
[http://dx.doi.org/10.18632/oncotarget.20365] [PMID: 29100375] 
[109] 
Lin YQ, Min XP, Li LL, et al. Using a machine-learning approach to predict discontinuous antibody-specific b-cell epitopes. Curr Bioinform  2017; 12: 406-15.
[http://dx.doi.org/10.2174/1574893611666160815102521] 
[110] 
Zuo YC, Li QZ. Using K-minimum increment of diversity to predict secretory proteins of malaria parasite based on groupings of amino acids. Amino Acids  2010; 38(3): 859-67.
[http://dx.doi.org/10.1007/s00726-009-0292-1] [PMID: 19387791] 
[111] 
Ding H, Yang W, Tang H, et al. PHYPred: a tool for identifying bacteriophage enzymes and hydrolases. Virol Sin  2016; 31(4): 350-2.
[http://dx.doi.org/10.1007/s12250-016-3740-6] [PMID: 27151186] 
[112] 
Chen W, Lv H, Nie F, Lin H. i6mA-Pred: identifying DNA N6-methyladenine sites in the rice genome. Bioinformatics  2019; 35(16): 2796-800.
[http://dx.doi.org/10.1093/bioinformatics/btz015] [PMID: 30624619] 
[113] 
Feng PM, Lin H, Chen W. Identification of antioxidants from sequence information using naïve Bayes. Comput Math Methods Med  2013; 2013567529
[http://dx.doi.org/10.1155/2013/567529] [PMID: 24062796] 
[114] 
Feng PM, Ding H, Chen W, Lin H. Naïve Bayes classifier with feature selection to identify phage virion proteins. Comput Math Methods Med  2013; 2013530696
[http://dx.doi.org/10.1155/2013/530696] [PMID: 23762187] 
[115] 
Tan JX, Lv H, Wang F, Dao FY, Chen W, Ding H. A survey for predicting enzyme family classes using machine learning methods. Curr Drug Targets  2019; 20(5): 540-50.
[http://dx.doi.org/10.2174/1389450119666181002143355] [PMID: 30277150] 
[116] 
Lane N, Martin W. The energetics of genome complexity. Nature  2010; 467(7318): 929-34.
[http://dx.doi.org/10.1038/nature09486] [PMID: 20962839] 
[117] 
Letunic I, Copley RR, Pils B, Pinkert S, Schultz J, Bork P. SMART 5: domains in the context of genomes and networks. Nucleic Acids Res  2006; 34(Database issue): D257-60.
[http://dx.doi.org/10.1093/nar/gkj079] [PMID: 16381859] 
[118] 
Tatusov RL, Fedorova ND, Jackson JD, et al. The COG database: an updated version includes eukaryotes. BMC Bioinformatics  2003; 4: 41.
[http://dx.doi.org/10.1186/1471-2105-4-41] [PMID: 12969510] 
[119] 
Marchler-Bauer A, Anderson JB, Derbyshire MK, et al. CDD: a conserved domain database for interactive domain family analysis. Nucleic Acids Res  2007; 35(Database issue): D237-40.
[http://dx.doi.org/10.1093/nar/gkl951] [PMID: 17135202] 
[120] 
Qiu JD, Huang JH, Shi SP, Liang RP. Using the concept of Chou’s pseudo amino acid composition to predict enzyme family classes: an approach with support vector machine based on discrete wavelet transform. Protein Pept Lett  2010; 17(6): 715-22.
[http://dx.doi.org/10.2174/092986610791190372] [PMID: 19961429] 
[121] 
González-Díaz H, González-Díaz Y, Santana L, Ubeira FM, Uriarte E. Proteomics, networks and connectivity indices. Proteomics  2008; 8(4): 750-78.
[http://dx.doi.org/10.1002/pmic.200700638] [PMID: 18297652] 
[122] 
Concu R, Podda G, Uriarte E, González-Díaz H. Computational chemistry study of 3D-structure-function relationships for enzymes based on Markov models for protein electrostatic, HINT, and van der Waals potentials. J Comput Chem  2009; 30(9): 1510-20.
[http://dx.doi.org/10.1002/jcc.21170] [PMID: 19086060] 
[123] 
González-Díaz H, Prado-Prado F, Ubeira FM. Predicting antimicrobial drugs and targets with the MARCH-INSIDE approach. Curr Top Med Chem  2008; 8(18): 1676-90.
[http://dx.doi.org/10.2174/156802608786786543] [PMID: 19075774] 
[124] 
Tang SN, Sun JM, Xiong WW, Cong PS, Li TH. Identification of the subcellular localization of mycobacterial proteins using localization motifs. Biochimie  2012; 94(3): 847-53.
[http://dx.doi.org/10.1016/j.biochi.2011.12.003] [PMID: 22182488] 
[125] 
Peng H, Long F, Ding C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell  2005; 27(8): 1226-38.
[http://dx.doi.org/10.1109/TPAMI.2005.159] [PMID: 16119262] 
[126] 
Mezghani N, Husse S, Boivin K, et al. Automatic classification of asymptomatic and osteoarthritis knee gait patterns using kinematic data features and the nearest neighbor classifier Ieee T Bio-Med Eng 2008; 55: 1230-2 2008.
[http://dx.doi.org/10.1109/TBME.2007.905388] 
[127] 
Li BQ, Zhang YH, Jin ML, Huang T, Cai YD. Prediction of protein-peptide interactions with a nearest neighbor algorithm. Curr Bioinform  2018; 13: 14-24.
[http://dx.doi.org/10.2174/1574893611666160711162006] 
[128] 
Yuan LZ, Yong EF, Wei Z, Shan KG. Using quadratic discriminant analysis to predict protein secondary structure based on chemical shifts. Curr Bioinform  2017; 12: 52-6.
[http://dx.doi.org/10.2174/1574893611666160628074537] 
[129] 
Wei LY, Su R, Wang B, Li XT, Zou Q, Gao X. Integration of deep feature representations and handcrafted features to improve the prediction of N-6-methyladenosine sites. Neurocomputing  2019; 324: 3-9.
[http://dx.doi.org/10.1016/j.neucom.2018.04.082] 
[130] 
Kerkech M, Hafiane A, Canals R. Deep leaning approach with colorimetric spaces and vegetation indices for vine diseases detection in UAV images. Comput Electron Agric  2018; 155: 237-43.
[http://dx.doi.org/10.1016/j.compag.2018.10.006] 
[131] 
Li Y, Niu M, Zou Q. ELM-MHC: An improved MHC identification method with extreme learning machine algorithm. J Proteome Res  2019; 18(3): 1392-401.
[http://dx.doi.org/10.1021/acs.jproteome.9b00012] [PMID: 30698979] 
[132] 
Behjati Ardakani F, Schmidt F, Schulz MH. Predicting transcription factor binding using ensemble random forest models. F1000 Res  2018; 7: 1603.
[http://dx.doi.org/10.12688/f1000research.16200.1] [PMID: 31723409] 
[133] 
Zou Q, Guo J, Ju Y, Wu M, Zeng X, Hong Z. Improving tRNAscan-SE annotation results via ensemble classifiers. Mol Inform  2015; 34(11-12): 761-70.
[http://dx.doi.org/10.1002/minf.201500031] [PMID: 27491037] 
Rights & Permissions Print Cite
Article Metrics
12
1
Journal Information
For Authors
For Editors
For Reviewers
Explore Articles
Open Access
Open Access Articles
For Visitors
DOI https://dx.doi.org/10.2174/1381612826666200310122324	Print ISSN 1381-6128
Publisher Name Bentham Science Publisher	Online ISSN 1873-4286
Current Pharmaceutical Design

A Brief Survey of Machine Learning Methods in Identification of Mitochondria Proteins in Malaria Parasite

Abstract Play Pause

Related Journals

Related Books

Abstract