A Survey for Predicting ATP Binding Residues of Proteins Using
Machine Learning Methods

doi:10.2174/0929867328666210910125802
摘要

蛋白质-配体相互作用是大多数蛋白质功能所必需的。 5'-三磷酸腺苷 (ATP) 就是这样一种配体，它作为辅酶在为细胞活动提供能量、催化生物反应和信号传导方面起着至关重要的作用。了解蛋白质的 ATP 结合残基有助于注释蛋白质功能和药物设计。然而，由于后基因组时代大量的蛋白质序列涌入数据库，通过实验识别 ATP 结合残基既费钱又费时。为了解决这个问题，已经开发了计算方法来预测 ATP 结合残基。在这篇综述中，我们简要总结了机器学习方法在检测蛋白质 ATP 结合残基中的应用。我们希望这篇综述有助于进一步的研究。
关键词: 5'-三磷酸腺苷 (ATP)、结合残基、预测、机器学习、特征提取、蛋白质
« Previous Next »
[1] 
Bergamini, C.M.; Dondi, A.; Lanzara, V.; Squerzanti, M.; Cervellati, C.; Montin, K.; Mischiati, C.; Tasco, G.; Collighan, R.; Griffin, M.; Casadio, R. Thermodynamics of binding of regulatory ligands to tissue transglutaminase. Amino Acids,  2010, 39(1), 297-304.
[http://dx.doi.org/10.1007/s00726-009-0442-5] [PMID:  20033238] 
[2] 
Talavera, D.; Robertson, D.L.; Lovell, S.C. Characterization of protein-protein interaction interfaces from a single species. PLoS One,  2011, 6(6) ,e21053
[http://dx.doi.org/10.1371/journal.pone.0021053] [PMID:  21738603] 
[3] 
Bartoli, L.; Martelli, P.L.; Rossi, I.; Fariselli, P.; Casadio, R. The prediction of protein-protein interacting sites in genome-wide protein interaction networks: the test case of the human cell cycle. Curr. Protein Pept. Sci.,  2010, 11(7), 601-608.
[http://dx.doi.org/10.2174/138920310794109157] [PMID:  20887257] 
[4] 
Jakhar, R.; Dangi, M.; Khichi, A.; Chhillar, A.K. Relevance of molecular docking studies in drug designing. Curr. Bioinform.,  2020, 15(4), 270-278.
[http://dx.doi.org/10.2174/1574893615666191219094216] 
[5] 
Liu, B.; Gao, X.; Zhang, H. BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches. Nucleic Acids Res.,  2019, 47(20) ,e127
[http://dx.doi.org/10.1093/nar/gkz740] [PMID:  31504851] 
[6] 
Zhao, X.; Wang, H.; Li, H.; Wu, Y.; Wang, G. Identifying plant pentatricopeptide repeat proteins using a variable selection method. Front. Plant Sci.,  2021, 12 ,506681
[http://dx.doi.org/10.3389/fpls.2021.506681] [PMID:  33732270] 
[7] 
Maruyama, K. The discovery of adenosine-triphosphate and the establishment of its structure. J. Hist. Biol.,  1991, 24(1), 145-154.
[http://dx.doi.org/10.1007/BF00130477] 
[8] 
Bunney, T.D.; van Walraven, H.S.; de Boer, A.H. 14-3-3 protein is a regulator of the mitochondrial and chloroplast ATP synthase. Proc. Natl. Acad. Sci. USA,  2001, 98(7), 4249-4254.
[http://dx.doi.org/10.1073/pnas.061437498] [PMID:  11274449] 
[9] 
Maruyama, K. The discovery of adenosine triphosphate and the establishment of its structure. J. Hist. Biol.,  1991, 24, 145-154.
[http://dx.doi.org/10.1007/BF00130477] 
[10] 
Maxwell, A.; Lawson, D.M. The ATP-binding site of type II topoisomerases as a target for antibacterial drugs. Curr. Top. Med. Chem.,  2003, 3(3), 283-303.
[http://dx.doi.org/10.2174/1568026033452500] [PMID:  12570764] 
[11] 
Rock, F.L.; Mao, W.; Yaremchuk, A.; Tukalo, M.; Crépin, T.; Zhou, H.; Zhang, Y.K.; Hernandez, V.; Akama, T.; Baker, S.J.; Plattner, J.J.; Shapiro, L.; Martinis, S.A.; Benkovic, S.J.; Cusack, S.; Alley, M.R. An antifungal agent inhibits an aminoacyl-tRNA synthetase by trapping tRNA in the editing site. Science,  2007, 316(5832), 1759-1761.
[http://dx.doi.org/10.1126/science.1142189] [PMID:  17588934] 
[12] 
Yu, L.; Wang, M.; Yang, Y.; Xu, F.; Zhang, X.; Xie, F.; Gao, L.; Li, X. Predicting therapeutic drugs for hepatocellular carcinoma based on tissue-specific pathways. PLOS Comput. Biol.,  2021, 17(2) ,e1008696
[http://dx.doi.org/10.1371/journal.pcbi.1008696] [PMID:  33561121] 
[13] 
Guo, T.; Shi, Y.; Sun, Z. A novel statistical ligand-binding site predictor: application to ATP-binding sites. Protein Eng. Des. Sel.,  2005, 18(2), 65-70.
[http://dx.doi.org/10.1093/protein/gzi006] [PMID:  15799998] 
[14] 
Saito, M.; Go, M.; Shirai, T. An empirical approach for detecting nucleotide-binding sites on proteins. Protein Eng. Des. Sel.,  2006, 19(2), 67-75.
[http://dx.doi.org/10.1093/protein/gzj002] [PMID:  16403825] 
[15] 
Jiménez, J.; Škalič, M.; Martínez-Rosell, G.; De Fabritiis, G. KDEEP: Protein-ligand absolute binding affinity prediction via 3D-convolutional neural networks. J. Chem. Inf. Model.,  2018, 58(2), 287-296.
[http://dx.doi.org/10.1021/acs.jcim.7b00650] [PMID:  29309725] 
[16] 
Qazi, S.R. HSEAT: A tool for plant heat shock element analysis, motif identification and analysis. Curr. Bioinform.,  2020, 15(3), 196-203.
[http://dx.doi.org/10.2174/1574893614666190102151956] 
[17] 
Tang, Y-J.; Pang, Y-H.; Liu, B. IDP-Seq2Seq: Identification of intrinsically disordered regions based on sequence to sequence learning. Bioinformaitcs,  2020, 36(21), 5177-5186.
[http://dx.doi.org/10.1093/bioinformatics/btaa667] [PMID:  32702119] 
[18] 
Chauhan, J.S.; Mishra, N.K.; Raghava, G.P.S. Identification of ATP binding residues of a protein from its primary sequence. BMC Bioinformatics,  2009, 10, 434.
[http://dx.doi.org/10.1186/1471-2105-10-434] [PMID:  20021687] 
[19] 
Chen, K.; Mizianty, M.J.; Kurgan, L. ATPsite: sequence-based prediction of ATP-binding residues. Proteome Sci.,  2011, 9(Suppl. 1), S4.
[http://dx.doi.org/10.1186/1477-5956-9-S1-S4] [PMID:  22165846] 
[20] 
Chen, K.; Mizianty, M.J.; Kurgan, L. Prediction and analysis of nucleotide-binding residues using sequence and sequence-derived structural descriptors. Bioinformatics,  2012, 28(3), 331-341.
[http://dx.doi.org/10.1093/bioinformatics/btr657] [PMID:  22130595] 
[21] 
Firoz, A.; Malik, A.; Joplin, K.H.; Ahmad, Z.; Jha, V.; Ahmad, S. Residue propensities, discrimination and binding site prediction of adenine and guanine phosphates. BMC Biochem.,  2011, 12, 20.
[http://dx.doi.org/10.1186/1471-2091-12-20] [PMID:  21569447] 
[22] 
Zhang, Y.N.; Yu, D.J.; Li, S.S.; Fan, Y.X.; Huang, Y.; Shen, H.B. Predicting protein-ATP binding sites from primary sequence through fusing bi-profile sampling of multi-view features. BMC Bioinformatics,  2012, 13, 118.
[http://dx.doi.org/10.1186/1471-2105-13-118] [PMID:  22651691] 
[23] 
Yu, D.J.; Hu, J.; Huang, Y.; Shen, H.B.; Qi, Y.; Tang, Z.M.; Yang, J.Y. TargetATPsite: a template-free method for ATP-binding sites prediction with residue evolution image sparse representation and classifier ensemble. J. Comput. Chem.,  2013, 34(11), 974-985.
[http://dx.doi.org/10.1002/jcc.23219] [PMID:  23288787] 
[24] 
Yu, D.J. Improving protein-ATP binding residues prediction by boosting SVMs with random under-sampling. Neurocomputing,  2013, 104, 180-190.
[http://dx.doi.org/10.1016/j.neucom.2012.10.012] 
[25] 
Ma, X.; Sun, X. Sequence-based predictor of ATP-binding residues using random forest and mRMR-IFS feature selection. J. Theor. Biol.,  2014, 360, 59-66.
[http://dx.doi.org/10.1016/j.jtbi.2014.06.037] [PMID:  25014477] 
[26] 
Fang, C.; Noguchi, T.; Yamana, H. Simplified sequence-based method for ATP-binding prediction using contextual local evolutionary conservation. Algorithms Mol. Biol.,  2014, 9(1), 7.
[http://dx.doi.org/10.1186/1748-7188-9-7] [PMID:  24618258] 
[27] 
Andrews, B.J.; Hu, J. TSC_ATP: A two-stage classifier for predicting protein-ATP binding sites from protein sequence. 2015 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (Cibcb),  2015, pp. 153-157.
[http://dx.doi.org/10.1109/CIBCB.2015.7300330] 
[28] 
Chen, W.; Feng, P.; Yang, H.; Ding, H.; Lin, H.; Chou, K.C. iRNA-3typeA: identifying three types of modification at RNA’s adenosine sites. Mol. Ther. Nucleic Acids,  2018, 11, 468-474.
[http://dx.doi.org/10.1016/j.omtn.2018.03.012] [PMID:  29858081] 
[29] 
Nguyen, T.T.D.; Le, N.Q.; Kusuma, R.M.I.; Ou, Y.Y. Prediction of ATP-binding sites in membrane proteins using a two-dimensional convolutional neural network. J. Mol. Graph. Model.,  2019, 92, 86-93.
[http://dx.doi.org/10.1016/j.jmgm.2019.07.003] [PMID:  31344547] 
[30] 
Libbrecht, M.W.; Noble, W.S. Machine learning applications in genetics and genomics. Nat. Rev. Genet.,  2015, 16(6), 321-332.
[http://dx.doi.org/10.1038/nrg3920] [PMID:  25948244] 
[31] 
Schrider, D.R.; Kern, A.D. Supervised machine learning for population genetics: a new paradigm. Trends Genet.,  2018, 34(4), 301-312.
[http://dx.doi.org/10.1016/j.tig.2017.12.005] [PMID:  29331490] 
[32] 
Feng, P.; Ding, H.; Lin, H.; Chen, W. AOD: the antioxidant protein database. Sci. Rep.,  2017, 7(1), 7449.
[http://dx.doi.org/10.1038/s41598-017-08115-6] [PMID:  28784999] 
[33] 
Liang, Z.Y.; Lai, H.Y.; Yang, H.; Zhang, C.J.; Yang, H.; Wei, H.H.; Chen, X.X.; Zhao, Y.W.; Su, Z.D.; Li, W.C.; Deng, E.Z.; Tang, H.; Chen, W.; Lin, H. Pro54DB: a database for experimentally verified sigma-54 promoters. Bioinformatics,  2017, 33(3), 467-469.
[PMID:  28171531] 
[34] 
Ning, L.; Cui, T.; Zheng, B.; Wang, N.; Luo, J.; Yang, B.; Du, M.; Cheng, J.; Dou, Y.; Wang, D. MNDR v3.0: mammal ncRNA-disease repository with increased coverage and annotation. Nucleic Acids Res.,  2020, 49(D1), D160-D164.
[PMID:  32833025] 
[35] 
He, B.; Chai, G.; Duan, Y.; Yan, Z.; Qiu, L.; Zhang, H.; Liu, Z.; He, Q.; Han, K.; Ru, B.; Guo, F.B.; Ding, H.; Lin, H.; Wang, X.; Rao, N.; Zhou, P.; Huang, J. BDB: biopanning data bank. Nucleic Acids Res.,  2016, 44(D1), D1127-D1132.
[http://dx.doi.org/10.1093/nar/gkv1100] [PMID:  26503249] 
[36] 
Hasan, M.A.M. Citrullination site prediction by incorporating sequence coupled effects into PseAAC and resolving data imbalance issue. Curr. Bioinform.,  2020, 15(3), 235-245.
[http://dx.doi.org/10.2174/1574893614666191202152328] 
[37] 
Zhao, T.; Hu, Y.; Peng, J.; Cheng, L. DeepLGP: a novel deep learning method for prioritizing lncRNA target genes. Bioinformatics,  2020, 36(16), 4466-4472.
[http://dx.doi.org/10.1093/bioinformatics/btaa428] [PMID:  32467970] 
[38] 
Zhao, T.; Hu, Y.; Cheng, L. Deep-DRM: a computational method for identifying disease-related metabolites based on graph deep learning approaches. Brief. Bioinform.,  2021, 22(4), 10.
[http://dx.doi.org/10.1093/bib/bbaa212] [PMID:  33048110] 
[39] 
Jin, Q. DUNet: A deformable network for retinal vessel segmentation. Knowl. Base. Syst.,  2019, 178, 149-162.
[http://dx.doi.org/10.1016/j.knosys.2019.04.025] 
[40] 
Su, R.; Wu, H.; Xu, B.; Liu, X.; Wei, L. Developing a multi-dose computational model for drug-induced hepatotoxicity prediction based on toxicogenomics data. IEEE/ACM Trans. Comput. Biol. Bioinformatics,  2019, 16(4), 1231-1239.
[http://dx.doi.org/10.1109/TCBB.2018.2858756] [PMID:  30040651] 
[41] 
Wei, L. Computational prediction and interpretation of cell-specific replication origin sites from multiple eukaryotes by exploiting stacking framework.Brief. Bioinform.,  2021, 22(4), bbaa275.
[PMID:  33152766] 
[42] 
Wu, X.; Yu, L. EPSOL: Sequence-based protein solubility
 prediction using multidimensional embedding. Bioinformatics
 (Oxford, England), 2021, btab463., 
[43] 
Huang, Y.; Niu, B.; Gao, Y.; Fu, L.; Li, W. CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics,  2010, 26(5), 680-682.
[http://dx.doi.org/10.1093/bioinformatics/btq003] [PMID:  20053844] 
[44] 
Wang, G.; Dunbrack, R.L. Jr PISCES: a protein sequence culling server. Bioinformatics,  2003, 19(12), 1589-1591.
[http://dx.doi.org/10.1093/bioinformatics/btg224] [PMID:  12912846] 
[45] 
Luscombe, N.M.; Laskowski, R.A.; Thornton, J.M. Amino acid-base interactions: a three-dimensional analysis of protein-DNA interactions at an atomic level. Nucleic Acids Res.,  2001, 29(13), 2860-2874.
[http://dx.doi.org/10.1093/nar/29.13.2860] [PMID:  11433033] 
[46] 
Sobolev, V.; Sorokine, A.; Prilusky, J.; Abola, E.E.; Edelman, M. Automated analysis of interatomic contacts in proteins. Bioinformatics,  1999, 15(4), 327-332.
[http://dx.doi.org/10.1093/bioinformatics/15.4.327] [PMID:  10320401] 
[47] 
Bauer, R.A.; Günther, S.; Jansen, D.; Heeger, C.; Thaben, P.F.; Preissner, R. SuperSite: dictionary of metabolite and drug binding sites in proteins. Nucleic Acids Res.,  2009, 37(Database issue), D195-D200.
[http://dx.doi.org/10.1093/nar/gkn618] [PMID:  18842629] 
[48] 
Berman, H.M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T.N.; Weissig, H.; Shindyalov, I.N.; Bourne, P.E. The Protein Data Bank. Nucleic Acids Res.,  2000, 28(1), 235-242.
[http://dx.doi.org/10.1093/nar/28.1.235] [PMID:  10592235] 
[49] 
Wu, C.H.; Apweiler, R.; Bairoch, A.; Natale, D.A.; Barker, W.C.; Boeckmann, B.; Ferro, S.; Gasteiger, E.; Huang, H.; Lopez, R.; Magrane, M.; Martin, M.J.; Mazumder, R.; O’Donovan, C.; Redaschi, N.; Suzek, B. The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Res.,  2006, 34(Database issue), D187-D191.
[http://dx.doi.org/10.1093/nar/gkj161] [PMID:  16381842] 
[50] 
Altschul, S.F.; Madden, T.L.; Schäffer, A.A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D.J. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res.,  1997, 25(17), 3389-3402.
[http://dx.doi.org/10.1093/nar/25.17.3389] [PMID:  9254694] 
[51] 
He, H.B.; Garcia, E.A. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng.,  2009, 21(9), 1263-1284.
[http://dx.doi.org/10.1109/TKDE.2008.239] 
[52] 
Zhang, J.; Liu, B. A review on the recent developments of sequence-based protein feature extraction methods. Curr. Bioinform.,  2019, 14(3), 190-199.
[http://dx.doi.org/10.2174/1574893614666181212102749] 
[53] 
Cheng, L.; Zhao, H.; Wang, P.; Zhou, W.; Luo, M.; Li, T.; Han, J.; Liu, S.; Jiang, Q. Computational methods for identifying similar diseases. Mol. Ther. Nucleic Acids,  2019, 18, 590-604.
[http://dx.doi.org/10.1016/j.omtn.2019.09.019] [PMID:  31678735] 
[54] 
Cheng, L. Computational and biological methods for gene therapy. Curr. Gene Ther.,  2019, 19(4), 210-210.
[http://dx.doi.org/10.2174/156652321904191022113307] [PMID:  31762421] 
[55] 
Zuo, Y.; Li, Y.; Chen, Y.; Li, G.; Yan, Z.; Yang, L. PseKRAAC: a flexible web server for generating pseudo K-tuple reduced amino acids composition. Bioinformatics,  2017, 33(1), 122-124.
[http://dx.doi.org/10.1093/bioinformatics/btw564] [PMID:  27565583] 
[56] 
Win, T.S.; Malik, A.A.; Prachayasittikul, V.; Wikberg, S. J.E.; Nantasenamat, C.; Shoombuatong, W. HemoPred: a web server for predicting the hemolytic activity of peptides. Future Med. Chem.,  2017, 9(3), 275-291.
[http://dx.doi.org/10.4155/fmc-2016-0188] [PMID:  28211294] 
[57] 
Shoombuatong, W.; Hongjaisee, S.; Barin, F.; Chaijaruwanich, J.; Samleerat, T. HIV-1 CRF01_AE coreceptor usage prediction using kernel methods based logistic model trees. Comput. Biol. Med.,  2012, 42(9), 885-889.
[http://dx.doi.org/10.1016/j.compbiomed.2012.06.011] [PMID:  22824642] 
[58] 
Hasan, M.M.; Schaduangrat, N.; Basith, S.; Lee, G.; Shoombuatong, W.; Manavalan, B. HLPpred-Fuse: improved and robust prediction of hemolytic peptide and its activity by fusing multiple feature representation. Bioinformatics,  2020, 36(11), 3350-3356.
[http://dx.doi.org/10.1093/bioinformatics/btaa160] [PMID:  32145017] 
[59] 
Charoenkwan, P.; Shoombuatong, W.; Lee, H.C.; Chaijaruwanich, J.; Huang, H.L.; Ho, S.Y. SCMCRYS: predicting protein crystallization using an ensemble scoring card method with estimating propensity scores of P-collocated amino acid pairs. PLoS One,  2013, 8(9) ,e72368
[http://dx.doi.org/10.1371/journal.pone.0072368] [PMID:  24019868] 
[60] 
Shoombuatong, W.; Schaduangrat, N.; Nantasenamat, C. Unraveling the bioactivity of anticancer peptides as deduced from machine learning. EXCLI J.,  2018, 17, 734-752.
[PMID:  30190664] 
[61] 
Shao, J.; Yan, K.; Liu, B. FoldRec-C2C: protein fold recognition by combining cluster-to-cluster model and protein
 similarity network. Brief. Bioinform.,  2021, 22(3), bbaa144.
[http://dx.doi.org/10.1093/bib/bbaa144] [PMID:  32685972] 
[62] 
Shang, Y. Prediction of drug-target interactions based on multi-layer network representation learning. Neurocomputing,  2021, 434, 80-89.
[http://dx.doi.org/10.1016/j.neucom.2020.12.068] 
[63] 
Shen, J.; Zhang, J.; Luo, X.; Zhu, W.; Yu, K.; Chen, K.; Li, Y.; Jiang, H. Predicting protein-protein interactions based only on sequences information. Proc. Natl. Acad. Sci. USA,  2007, 104(11), 4337-4341.
[http://dx.doi.org/10.1073/pnas.0607879104] [PMID:  17360525] 
[64] 
Zuo, Y.C.; Peng, Y.; Liu, L.; Chen, W.; Yang, L.; Fan, G.L. Predicting peroxidase subcellular location by hybridizing different descriptors of Chou’ pseudo amino acid patterns. Anal. Biochem.,  2014, 458, 14-19.
[http://dx.doi.org/10.1016/j.ab.2014.04.032] [PMID:  24802134] 
[65] 
Liu, D.; Li, G.; Zuo, Y. Function determinants of TET proteins: the arrangements of sequence motifs with specific codes. Brief. Bioinform.,  2019, 20(5), 1826-1835.
[http://dx.doi.org/10.1093/bib/bby053] [PMID:  29947743] 
[66] 
Chen, K.; Kurgan, L.A.; Ruan, J. Prediction of flexible/rigid regions from protein sequences using k-spaced amino acid pairs. BMC Struct. Biol.,  2007, 7, 25.
[http://dx.doi.org/10.1186/1472-6807-7-25] [PMID:  17437643] 
[67] 
Chen, K.; Jiang, Y.; Du, L.; Kurgan, L. Prediction of integral membrane protein type by collocated hydrophobic amino acid pairs. J. Comput. Chem.,  2009, 30(1), 163-172.
[http://dx.doi.org/10.1002/jcc.21053] [PMID:  18567007] 
[68] 
Senes, A.; Gerstein, M.; Engelman, D.M. Statistical analysis of amino acid patterns in transmembrane helices: the GxxxG motif occurs frequently and in association with beta-branched residues at neighboring positions. J. Mol. Biol.,  2000, 296(3), 921-936.
[http://dx.doi.org/10.1006/jmbi.1999.3488] [PMID:  10677292] 
[69] 
Chen, W.; Feng, P.; Nie, F. iATP: A sequence based method for identifying anti-tubercular peptides. Med. Chem.,  2019, 16(5), 620-625.
[http://dx.doi.org/10.2174/1573406415666191002152441] [PMID:  31339073] 
[70] 
Chen, Z.; Zhou, Y.; Song, J.; Zhang, Z. hCKSAAP_UbSite: improved prediction of human ubiquitination sites by exploiting amino acid pattern and properties. Biochim. Biophys. Acta,  2013, 1834(8), 1461-1467.
[http://dx.doi.org/10.1016/j.bbapap.2013.04.006] [PMID:  23603789] 
[71] 
Yang, J.; Roy, A.; Zhang, Y. Protein-ligand binding site recognition using complementary binding-specific substructure comparison and sequence profile alignment. Bioinformatics,  2013, 29(20), 2588-2595.
[http://dx.doi.org/10.1093/bioinformatics/btt447] [PMID:  23975762] 
[72] 
Zhang, Y. I-TASSER server for protein 3D structure prediction. BMC Bioinformatics,  2008, 9, 40.
[http://dx.doi.org/10.1186/1471-2105-9-40] [PMID:  18215316] 
[73] 
Ma, L. Co-Clustering Analysis of Protein Secondary Structures. Curr. Bioinform.,  2017, 12(3), 213-224.
[http://dx.doi.org/10.2174/1574893612666170111145319] 
[74] 
McGuffin, L.J.; Bryson, K.; Jones, D.T. The PSIPRED protein structure prediction server. Bioinformatics,  2000, 16(4), 404-405.
[http://dx.doi.org/10.1093/bioinformatics/16.4.404] [PMID:  10869041] 
[75] 
Zheng, L.; Huang, S.; Mu, N.; Zhang, H.; Zhang, J.; Chang, Y.; Yang, L.; Zuo, Y. RAACBook: A web server of reduced amino acid alphabet for sequence-dependent inference by using Chou’s five-step rule. Database (Oxford),  2019, 2019 ,baz131
[http://dx.doi.org/10.1093/database/baz131] [PMID:  31802128] 
[76] 
Zheng, L. RaacLogo: a new sequence logo generator by using reduced amino acid clusters. Brief. Bioinform., 2020.
[PMID:  32524143] 
[77] 
Kawashima, S.; Pokarowski, P.; Pokarowska, M.; Kolinski, A.; Katayama, T.; Kanehisa, M. AAindex: amino acid index database, progress report 2008. Nucleic Acids Res.,  2008, 36(Database issue), D202-D205.
[PMID:  17998252] 
[78] 
Fauchere, J.L.P.V.E. Hydrophobic parameters II of amino acid side-chains from the partitioning of N-acetyl-amino acid amides. Eur. J. Med. Chem.,  1983, 18, 369-375.
[79] 
Grantham, R. Amino acid difference formula to help explain protein evolution. Science,  1974, 185(4154), 862-864.
[http://dx.doi.org/10.1126/science.185.4154.862] [PMID:  4843792] 
[80] 
Jones, D.T.; Taylor, W.R.; Thornton, J.M. A new approach to protein fold recognition. Nature,  1992, 358(6381), 86-89.
[http://dx.doi.org/10.1038/358086a0] [PMID:  1614539] 
[81] 
Jones, S.; Thornton, J.M. Principles of protein-protein interactions. Proc. Natl. Acad. Sci. USA,  1996, 93(1), 13-20.
[http://dx.doi.org/10.1073/pnas.93.1.13] [PMID:  8552589] 
[82] 
Klein, P.; Kanehisa, M.; DeLisi, C. Prediction of protein function from sequence properties. Discriminant analysis of a data base. Biochim. Biophys. Acta,  1984, 787(3), 221-226.
[http://dx.doi.org/10.1016/0167-4838(84)90312-1] [PMID:  6547351] 
[83] 
Janin, J.; Wodak, S. Conformation of amino acid side-chains in proteins. J. Mol. Biol.,  1978, 125(3), 357-386.
[http://dx.doi.org/10.1016/0022-2836(78)90408-4] [PMID:  731698] 
[84] 
Shao, J.; Xu, D.; Tsai, S.N.; Wang, Y.; Ngai, S.M. Computational identification of protein methylation sites through bi-profile Bayes feature extraction. PLoS One,  2009, 4(3) ,e4920
[http://dx.doi.org/10.1371/journal.pone.0004920] [PMID:  19290060] 
[85] 
Song, J.; Tan, H.; Shen, H.; Mahmood, K.; Boyd, S.E.; Webb, G.I.; Akutsu, T.; Whisstock, J.C. Cascleave: towards more accurate prediction of caspase substrate cleavage sites. Bioinformatics,  2010, 26(6), 752-760.
[http://dx.doi.org/10.1093/bioinformatics/btq043] [PMID:  20130033] 
[86] 
Jia, C.; He, W.; Zou, Q. DephosSitePred: A High Accuracy Predictor for Protein Dephosphorylation Sites. Comb. Chem. High Throughput Screen.,  2017, 20(2), 153-157.
[http://dx.doi.org/10.2174/1386207319666161228155636] [PMID:  28031011] 
[87] 
Ju, Z.; Wang, S.Y. Predicting lysine lipoylation sites using bi-profile bayes feature extraction and fuzzy support vector machine algorithm. Anal. Biochem.,  2018, 561-562, 11-17.
[http://dx.doi.org/10.1016/j.ab.2018.09.007] [PMID:  30218638] 
[88] 
Ju, Z.; Sun, J.; Li, Y.; Wang, L. Predicting lysine glycation sites using bi-profile bayes feature extraction. Comput. Biol. Chem.,  2017, 71, 98-103.
[http://dx.doi.org/10.1016/j.compbiolchem.2017.10.004] [PMID:  29040908] 
[89] 
Jia, C.Z.; He, W.Y.; Yao, Y.H. OH-PRED: prediction of protein hydroxylation sites by incorporating adapted normal distribution bi-profile Bayes feature extraction and physicochemical properties of amino acids. J. Biomol. Struct. Dyn.,  2017, 35(4), 829-835.
[http://dx.doi.org/10.1080/07391102.2016.1163294] [PMID:  26957000] 
[90] 
Ao, C.; Zou, Q.; Yu, L. RFhy-m2G: Identification of RNA
 N2-methylguanosine modification sites based on random forest and hybrid features. 2021.S1046-2023(21)00142-0. 
[91] 
Bairoch, A.; Apweiler, R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res.,  2000, 28(1), 45-48.
[http://dx.doi.org/10.1093/nar/28.1.45] [PMID:  10592178] 
[92] 
Cheng, C.W.; Su, E.C.; Hwang, J.K.; Sung, T.Y.; Hsu, W.L. Predicting RNA-binding sites of proteins using support vector machines and evolutionary information. BMC Bioinformatics,  2008, 9(12), S6.
[http://dx.doi.org/10.1186/1471-2105-9-S12-S6] [PMID:  19091029] 
[93] 
Wang, K.; Samudrala, R. Incorporating background frequency improves entropy-based residue conservation measures. BMC Bioinformatics,  2006, 7, 385.
[http://dx.doi.org/10.1186/1471-2105-7-385] [PMID:  16916457] 
[94] 
Ma, X.; Guo, J.; Liu, H.D.; Xie, J.M.; Sun, X. Sequence-based prediction of DNA-binding residues in proteins with conservation and correlation information. IEEE/ACM Trans. Comput. Biol. Bioinformatics,  2012, 9(6), 1766-1775.
[http://dx.doi.org/10.1109/TCBB.2012.106] [PMID:  22868682] 
[95] 
Zhao, X.; Jiao, Q.; Li, H.; Wu, Y.; Wang, H.; Huang, S.; Wang, G. ECFS-DEA: an ensemble classifier-based feature selection for differential expression analysis on expression profiles. BMC Bioinformatics,  2020, 21(1), 43.
[http://dx.doi.org/10.1186/s12859-020-3388-y] [PMID:  32024464] 
[96] 
Zhu, X.J. Predicting protein structural classes for low-similarity sequences by evaluating different features. Knowl. Base. Syst.,  2019, 163, 787-793.
[http://dx.doi.org/10.1016/j.knosys.2018.10.007] 
[97] 
Yang, H.; Yang, W.; Dao, F.Y.; Lv, H.; Ding, H.; Chen, W.; Lin, H. A comparison and assessment of computational method for identifying recombination hotspots in Saccharomyces cerevisiae. Brief. Bioinform.,  2019, 21(5), 1568-1580.
[http://dx.doi.org/10.1093/bib/bbz123] [PMID:  31633777] 
[98] 
Liu, K.; Chen, W. iMRM: a platform for simultaneously identifying multiple kinds of RNA modifications. Bioinformatics,  2020, 36(11), 3336-3342.
[http://dx.doi.org/10.1093/bioinformatics/btaa155] [PMID:  32134472] 
[99] 
Berrhail, F.; Belhadef, H. Genetic algorithm-based feature selection approach for enhancing the effectiveness of similarity searching in ligand-based virtual screening. Curr. Bioinform.,  2020, 15(5), 431-444.
[http://dx.doi.org/10.2174/1574893614666191119123935] 
[100] 
Schaduangrat, N.; Nantasenamat, C.; Prachayasittikul, V.; Shoombuatong, W. ACPred: a computational tool for the prediction and analysis of anticancer peptides. Molecules,  2019, 24(10), 1973.
[http://dx.doi.org/10.3390/molecules24101973] [PMID:  31121946] 
[101] 
Simeon, S.; Shoombuatong, W.; Anuwongcharoen, N.; Preeyanon, L.; Prachayasittikul, V.; Wikberg, J.E.; Nantasenamat, C. osFP: a web server for predicting the oligomeric states of fluorescent proteins. J. Cheminform.,  2016, 8(1), 72.
[http://dx.doi.org/10.1186/s13321-016-0185-8] [PMID:  28053671] 
[102] 
Win, T.S.; Schaduangrat, N.; Prachayasittikul, V.; Nantasenamat, C.; Shoombuatong, W. PAAP: a web server for predicting antihypertensive activity of peptides. Future Med. Chem.,  2018, 10(15), 1749-1767.
[http://dx.doi.org/10.4155/fmc-2017-0300] [PMID:  30039980] 
[103] 
Peng, H.; Long, F.; Ding, C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell.,  2005, 27(8), 1226-1238.
[http://dx.doi.org/10.1109/TPAMI.2005.159] [PMID:  16119262] 
[104] 
Hasan, M.M.; Manavalan, B.; Shoombuatong, W.; Khatun, M.S.; Kurata, H. i6mA-Fuse: improved and robust prediction of DNA 6 mA sites in the Rosaceae genome by fusing multiple feature representation. Plant Mol. Biol.,  2020, 103(1-2), 225-234.
[http://dx.doi.org/10.1007/s11103-020-00988-y] [PMID:  32140819] 
[105] 
Hasan, M.M.; Manavalan, B.; Shoombuatong, W.; Khatun, M.S.; Kurata, H. i4mC-Mouse: Improved identification of DNA N4-methylcytosine sites in the mouse genome using multiple encoding schemes. Comput. Struct. Biotechnol. J.,  2020, 18, 906-912.
[http://dx.doi.org/10.1016/j.csbj.2020.04.001] [PMID:  32322372] 
[106] 
Hasan, M.M.; Manavalan, B.; Khatun, M.S.; Kurata, H. i4mC-ROSE, a bioinformatics tool for the identification of DNA N4-methylcytosine sites in the Rosaceae genome. Int. J. Biol. Macromol.,  2020, 157, 752-758.
[http://dx.doi.org/10.1016/j.ijbiomac.2019.12.009] [PMID:  31805335] 
[107] 
Du, X. Identification and analysis of cancer diagnosis using probabilistic classification vector machines with feature selection. Curr. Bioinform.,  2018, 13(6), 625-632.
[http://dx.doi.org/10.2174/1574893612666170405125637] 
[108] 
Xu, Z.C.; Feng, P.M.; Yang, H.; Qiu, W.R.; Chen, W.; Lin, H. iRNAD: a computational tool for identifying D modification sites in RNA sequence. Bioinformatics,  2019, 35(23), 4922-4929.
[http://dx.doi.org/10.1093/bioinformatics/btz358] [PMID:  31077296] 
[109] 
Lin, H. Identifying Sigma70 promoters with novel pseudo nucleotide composition. IEEE/ACM Trans Comput Biol Bioinform,  2019, 16(4), 1316-1321.
[http://dx.doi.org/10.1109/TCBB.2017.2666141] 
[110] 
Zhang, Z.Y.; Yang, Y.H.; Ding, H.; Wang, D.; Chen, W.; Lin, H. Design powerful predictor for mRNA subcellular location prediction in Homo sapiens. Brief. Bioinform.,  2020, 22(1), 526-535.
[http://dx.doi.org/10.1093/bib/bbz177] [PMID:  31994694] 
[111] 
Tahir, M.; Idris, A. MD-LBP: An efficient computational model for protein subcellular localization from HeLa cell lines using SVM. Curr. Bioinform.,  2020, 15(3), 204-211.
[http://dx.doi.org/10.2174/1574893614666190723120716] 
[112] 
Jiang, Q.; Wang, G.; Jin, S.; Li, Y.; Wang, Y. Predicting human microRNA-disease associations based on support vector machine. Int. J. Data Min. Bioinform.,  2013, 8(3), 282-293.
[http://dx.doi.org/10.1504/IJDMB.2013.056078] [PMID:  24417022] 
[113] 
Ao, C.; Yu, L.; Zou, Q. Prediction of bio-sequence modifications and the associations with diseases. Brief. Funct. Genomics,  2021, 20(1), 1-18.
[http://dx.doi.org/10.1093/bfgp/elaa023] [PMID:  33313647] 
[114] 
Tao, Z.; Li, Y.; Teng, Z.; Zhao, Y. A method for identifying vesicle transport proteins based on LibSVM and MRMD. Comput. Math. Methods Med.,  2020, 2020 ,8926750
[http://dx.doi.org/10.1155/2020/8926750] [PMID:  33133228] 
[115] 
Wang, S. Immune cell infiltration-based signature for prognosis and immunogenomic analysis in breast cancer. Brief. Bioinform.,  2021, 22(2), 2020-2031.
[http://dx.doi.org/10.1093/bib/bbaa026] [PMID:  32141494] 
[116] 
Chang, C.C.; Lin, C.J. LIBSVM: A Library for Support Vector Machines. ACM Trans. Intell. Syst. Technol.,  2011, 2(3)
[http://dx.doi.org/10.1145/1961189.1961199] 
[117] 
Wei, H.; Liu, B. iCircDA-MF: identification of circRNA-disease associations based on matrix factorization. Brief. Bioinform.,  2020, 21(4), 1356-1367.
[http://dx.doi.org/10.1093/bib/bbz057] [PMID:  31197324] 
[118] 
He, K.M. Deep residual learning for image recognition. 2016 Ieee Conference on Computer Vision and Pattern Recognition (Cvpr),  2016, pp. 770-778.
[http://dx.doi.org/10.1109/CVPR.2016.90] 
[119] 
Huang, Y.; Zhou, D.; Wang, Y.; Zhang, X.; Su, M.; Wang, C.; Sun, Z.; Jiang, Q.; Sun, B.; Zhang, Y. Prediction of transcription factors binding events based on epigenetic modifications in different human cells. Epigenomics,  2020, 12(16), 1443-1456.
[http://dx.doi.org/10.2217/epi-2019-0321] [PMID:  32921165] 
[120] 
Wang, X.; Yang, Y.; Liu, J.; Wang, G. The stacking strategy-based hybrid framework for identifying non-coding RNAs.  Brief. Bioinform., 2021, bbab023., 
[http://dx.doi.org/10.1093/bib/bbab023] [PMID: 33693454] 
[121] 
Witten, I.H.; Frank, E.; Hall, M.A. Data mining : Practical
 machine learning tools and techniques, 3rd ed; Morgan
Kaufmann series in data management systemsBurlington,
 MA; , 2011. 
[122] 
Tang, H.; Chen, W.; Lin, H. Identification of immunoglobulins using Chou’s pseudo amino acid composition with feature selection technique. Mol. Biosyst.,  2016, 12(4), 1269-1275.
[http://dx.doi.org/10.1039/C5MB00883B] [PMID:  26883492] 
[123] 
Chen, W.; Feng, P.; Liu, T.; Jin, D. Recent advances in machine learning methods for predicting heat shock proteins. Curr. Drug Metab.,  2019, 20(3), 224-228.
[http://dx.doi.org/10.2174/1389200219666181031105916] [PMID:  30378494] 
[124] 
Amanat, S. Identification of lysine carboxylation sites in proteins by integrating statistical moments and position relative features via general PseAAC. Curr. Bioinform.,  2020, 15(5), 396-407.
[http://dx.doi.org/10.2174/1574893614666190723114923] 
[125] 
Cheng, L.; Qi, C.; Zhuang, H.; Fu, T.; Zhang, X. gutMDisorder: a comprehensive database for dysbiosis of the gut microbiota in disorders and interventions. Nucleic Acids Res.,  2020, 48(D1), D554-D560.
[http://dx.doi.org/10.1093/nar/gkz843] [PMID:  31584099] 
[126] 
Cheng, L.; Zhuang, H.; Ju, H.; Yang, S.; Han, J.; Tan, R.; Hu, Y. Exposing the Causal Effect of Body Mass Index on the Risk of Type 2 Diabetes Mellitus: A Mendelian Randomization Study. Front. Genet.,  2019, 10, 94.
[http://dx.doi.org/10.3389/fgene.2019.00094] [PMID:  30891058] 
[127] 
Wei, L.; Liao, M.; Gao, Y.; Ji, R.; He, Z.; Zou, Q. Improved and Promising Identification of Human MicroRNAs by Incorporating a High-Quality Negative Set. IEEE/ACM Trans. Comput. Biol. Bioinformatics,  2014, 11(1), 192-201.
[http://dx.doi.org/10.1109/TCBB.2013.146] [PMID:  26355518] 
[128] 
Wei, L.; Wan, S.; Guo, J.; Wong, K.K. A novel hierarchical selective ensemble classifier with bioinformatics application. Artif. Intell. Med.,  2017, 83, 82-90.
[http://dx.doi.org/10.1016/j.artmed.2017.02.005] [PMID:  28245947] 
[129] 
Wei, L.; Xing, P.; Zeng, J.; Chen, J.; Su, R.; Guo, F. Improved prediction of protein-protein interactions using novel negative samples, features, and an ensemble classifier. Artif. Intell. Med.,  2017, 83, 67-74.
[http://dx.doi.org/10.1016/j.artmed.2017.03.001] [PMID:  28320624] 
[130] 
Manavalan, B.; Hasan, M.M.; Basith, S.; Gosu, V.; Shin, T.H.; Lee, G. Empirical Comparison and Analysis of Web-Based DNA N4-Methylcytosine Site Prediction Tools. Mol. Ther. Nucleic Acids,  2020, 22, 406-420.
[http://dx.doi.org/10.1016/j.omtn.2020.09.010] [PMID:  33230445] 
[131] 
Manavalan, B. Computational prediction of species-specific yeast DNA replication origin via iterative feature representation. Brief. Bioinform.,  2020, 22(2), 2126-2140.
[PMID:  33232970] 
[132] 
Basith, S.; Manavalan, B.; Hwan Shin, T.; Lee, G. Machine intelligence in peptide therapeutics: A next-generation tool for rapid disease screening. Med. Res. Rev.,  2020, 40(4), 1276-1314.
[http://dx.doi.org/10.1002/med.21658] [PMID:  31922268] 
[133] 
Liang, P.; Yang, W.; Chen, X.; Long, C.; Zheng, L.; Li, H.; Zuo, Y. Machine Learning of Single-Cell Transcriptome Highly Identifies mRNA Signature by Comparing F-Score Selection with DGE Analysis. Mol. Ther. Nucleic Acids,  2020, 20, 155-163.
[http://dx.doi.org/10.1016/j.omtn.2020.02.004] [PMID:  32169803] 
[134] 
Su, R.; Liu, X.; Wei, L.; Zou, Q. Deep-Resp-Forest: A deep forest model to predict anti-cancer drug response. Methods,  2019, 166, 91-102.
[http://dx.doi.org/10.1016/j.ymeth.2019.02.009] [PMID:  30772464] 
[135] 
Wei, L.; Chen, H.; Su, R. M6APred-EL: A Sequence-Based Predictor for Identifying N6-methyladenosine Sites Using Ensemble Learning. Mol. Ther. Nucleic Acids,  2018, 12, 635-644.
[http://dx.doi.org/10.1016/j.omtn.2018.07.004] [PMID:  30081234] 
[136] 
Zhai, Y.; Chen, Y.; Teng, Z.; Zhao, Y. Identifying Antioxidant Proteins by Using Amino Acid Composition and Protein-Protein Interactions. Front. Cell Dev. Biol.,  2020, 8 ,591487
[http://dx.doi.org/10.3389/fcell.2020.591487] [PMID:  33195258] 
[137] 
Guo, Z.; Wang, P.; Liu, Z.; Zhao, Y. Discrimination of Thermophilic Proteins and Non-thermophilic Proteins Using Feature Dimension Reduction. Front. Bioeng. Biotechnol.,  2020, 8 ,584807
[http://dx.doi.org/10.3389/fbioe.2020.584807] [PMID:  33195148] 
[138] 
Faraggi, E.; Xue, B.; Zhou, Y. Improving the prediction accuracy of residue solvent accessibility and real-value backbone torsion angles of proteins by guided-learning through a two-layer neural network. Proteins,  2009, 74(4), 847-856.
[http://dx.doi.org/10.1002/prot.22193] [PMID:  18704931] 
[139] 
Ward, J.J.; Sodhi, J.S.; McGuffin, L.J.; Buxton, B.F.; Jones, D.T. Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. J. Mol. Biol.,  2004, 337(3), 635-645.
[http://dx.doi.org/10.1016/j.jmb.2004.02.002] [PMID:  15019783] 
[140] 
Cheng, J. SCRATCH: A protein structure and structural feature prediction server.Nucleic Acids Res, 2005, 33(Web
Server issue), W72-6., 
[http://dx.doi.org/10.1093/nar/gki396] 
[141] 
Hasan, M.M.; Alam, M.A.; Shoombuatong, W.; Deng, H.W.; Manavalan, B.; Kurata, H. NeuroPred-FRL: An interpretable
prediction model for identifying neuropeptide
using feature representation learning Brief. Bioinform.,  2021, bbab167.
[http://dx.doi.org/10.1093/bib/bbab167] [PMID: 33975333] 
[142] 
Charoenkwan, P.; Chiangjong, W.; Nantasenamat, C.; Hasan, M.M.; Manavalan, B.; Shoombuatong, W. .StackIL6: A
 stacking ensemble model for improving the prediction of
 IL-6 inducing peptides. Brief. Bioinform., 2021, bbab172 
[http://dx.doi.org/10.1093/bib/bbab172] [PMID: 33963832] 
[143] 
Lv, H.; Dao, F.Y.; Zulfiqar, H.; Lin, H. DeepIPs: comprehensive assessment and computational identification of phosphorylation sites of SARS-CoV-2 infection using a deep learning-based approach. Bioinformatics,  2020, 36(11), 3350-3356.
[http://dx.doi.org/10.1093/bib/bbab244] [PMID:  32145017] 
[144] 
Wei, L.; Su, R.; Luan, S.; Liao, Z.; Manavalan, B.; Zou, Q.; Shi, X. Iterative feature representations improve N4-methylcytosine site prediction. Bioinformatics,  2019, 35(23), 4930-4937.
[http://dx.doi.org/10.1093/bioinformatics/btz408] [PMID:  31099381] 
[145] 
Long, H. Predicting Protein Phosphorylation Sites Based on Deep Learning. Curr. Bioinform.,  2020, 15(4), 300-308.
[http://dx.doi.org/10.2174/1574893614666190902154332] 
[146] 
Guo, C. ExomeHMM: A Hidden Markov Model for Detecting Copy Number Variation Using Whole-Exome Sequencing Data. Curr. Bioinform.,  2017, 12(2), 147-155.
[http://dx.doi.org/10.2174/1574893611666160727160757] 
Rights & Permissions Print Cite
Article Metrics
21
1
Journal Information
For Authors
For Editors
For Reviewers
Explore Articles
Open Access
Open Access Articles
For Visitors
DOI https://dx.doi.org/10.2174/0929867328666210910125802	Print ISSN 0929-8673
Publisher Name Bentham Science Publisher	Online ISSN 1875-533X
当代药物化学

使用机器学习方法预测蛋白质 ATP 结合残基的调查

摘要

当代药物化学

使用机器学习方法预测蛋白质 ATP 结合残基的调查

摘要 Play Pause

Related Journals

Related Books

摘要