Review Article

使用机器学习方法预测蛋白质 ATP 结合残基的调查

卷 29, 期 5, 2022

发表于: 10 January, 2022

页: [789 - 806] 页: 18

弟呕挨: 10.2174/0929867328666210910125802

价格: $65

摘要

蛋白质-配体相互作用是大多数蛋白质功能所必需的。 5'-三磷酸腺苷 (ATP) 就是这样一种配体,它作为辅酶在为细胞活动提供能量、催化生物反应和信号传导方面起着至关重要的作用。 了解蛋白质的 ATP 结合残基有助于注释蛋白质功能和药物设计。 然而,由于后基因组时代大量的蛋白质序列涌入数据库,通过实验识别 ATP 结合残基既费钱又费时。 为了解决这个问题,已经开发了计算方法来预测 ATP 结合残基。 在这篇综述中,我们简要总结了机器学习方法在检测蛋白质 ATP 结合残基中的应用。 我们希望这篇综述有助于进一步的研究。

关键词: 5'-三磷酸腺苷 (ATP)、结合残基、预测、机器学习、特征提取、蛋白质

[1]
Bergamini, C.M.; Dondi, A.; Lanzara, V.; Squerzanti, M.; Cervellati, C.; Montin, K.; Mischiati, C.; Tasco, G.; Collighan, R.; Griffin, M.; Casadio, R. Thermodynamics of binding of regulatory ligands to tissue transglutaminase. Amino Acids, 2010, 39(1), 297-304.
[http://dx.doi.org/10.1007/s00726-009-0442-5] [PMID: 20033238]
[2]
Talavera, D.; Robertson, D.L.; Lovell, S.C. Characterization of protein-protein interaction interfaces from a single species. PLoS One, 2011, 6(6) ,e21053
[http://dx.doi.org/10.1371/journal.pone.0021053] [PMID: 21738603]
[3]
Bartoli, L.; Martelli, P.L.; Rossi, I.; Fariselli, P.; Casadio, R. The prediction of protein-protein interacting sites in genome-wide protein interaction networks: the test case of the human cell cycle. Curr. Protein Pept. Sci., 2010, 11(7), 601-608.
[http://dx.doi.org/10.2174/138920310794109157] [PMID: 20887257]
[4]
Jakhar, R.; Dangi, M.; Khichi, A.; Chhillar, A.K. Relevance of molecular docking studies in drug designing. Curr. Bioinform., 2020, 15(4), 270-278.
[http://dx.doi.org/10.2174/1574893615666191219094216]
[5]
Liu, B.; Gao, X.; Zhang, H. BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches. Nucleic Acids Res., 2019, 47(20) ,e127
[http://dx.doi.org/10.1093/nar/gkz740] [PMID: 31504851]
[6]
Zhao, X.; Wang, H.; Li, H.; Wu, Y.; Wang, G. Identifying plant pentatricopeptide repeat proteins using a variable selection method. Front. Plant Sci., 2021, 12 ,506681
[http://dx.doi.org/10.3389/fpls.2021.506681] [PMID: 33732270]
[7]
Maruyama, K. The discovery of adenosine-triphosphate and the establishment of its structure. J. Hist. Biol., 1991, 24(1), 145-154.
[http://dx.doi.org/10.1007/BF00130477]
[8]
Bunney, T.D.; van Walraven, H.S.; de Boer, A.H. 14-3-3 protein is a regulator of the mitochondrial and chloroplast ATP synthase. Proc. Natl. Acad. Sci. USA, 2001, 98(7), 4249-4254.
[http://dx.doi.org/10.1073/pnas.061437498] [PMID: 11274449]
[9]
Maruyama, K. The discovery of adenosine triphosphate and the establishment of its structure. J. Hist. Biol., 1991, 24, 145-154.
[http://dx.doi.org/10.1007/BF00130477]
[10]
Maxwell, A.; Lawson, D.M. The ATP-binding site of type II topoisomerases as a target for antibacterial drugs. Curr. Top. Med. Chem., 2003, 3(3), 283-303.
[http://dx.doi.org/10.2174/1568026033452500] [PMID: 12570764]
[11]
Rock, F.L.; Mao, W.; Yaremchuk, A.; Tukalo, M.; Crépin, T.; Zhou, H.; Zhang, Y.K.; Hernandez, V.; Akama, T.; Baker, S.J.; Plattner, J.J.; Shapiro, L.; Martinis, S.A.; Benkovic, S.J.; Cusack, S.; Alley, M.R. An antifungal agent inhibits an aminoacyl-tRNA synthetase by trapping tRNA in the editing site. Science, 2007, 316(5832), 1759-1761.
[http://dx.doi.org/10.1126/science.1142189] [PMID: 17588934]
[12]
Yu, L.; Wang, M.; Yang, Y.; Xu, F.; Zhang, X.; Xie, F.; Gao, L.; Li, X. Predicting therapeutic drugs for hepatocellular carcinoma based on tissue-specific pathways. PLOS Comput. Biol., 2021, 17(2) ,e1008696
[http://dx.doi.org/10.1371/journal.pcbi.1008696] [PMID: 33561121]
[13]
Guo, T.; Shi, Y.; Sun, Z. A novel statistical ligand-binding site predictor: application to ATP-binding sites. Protein Eng. Des. Sel., 2005, 18(2), 65-70.
[http://dx.doi.org/10.1093/protein/gzi006] [PMID: 15799998]
[14]
Saito, M.; Go, M.; Shirai, T. An empirical approach for detecting nucleotide-binding sites on proteins. Protein Eng. Des. Sel., 2006, 19(2), 67-75.
[http://dx.doi.org/10.1093/protein/gzj002] [PMID: 16403825]
[15]
Jiménez, J.; Škalič, M.; Martínez-Rosell, G.; De Fabritiis, G. KDEEP: Protein-ligand absolute binding affinity prediction via 3D-convolutional neural networks. J. Chem. Inf. Model., 2018, 58(2), 287-296.
[http://dx.doi.org/10.1021/acs.jcim.7b00650] [PMID: 29309725]
[16]
Qazi, S.R. HSEAT: A tool for plant heat shock element analysis, motif identification and analysis. Curr. Bioinform., 2020, 15(3), 196-203.
[http://dx.doi.org/10.2174/1574893614666190102151956]
[17]
Tang, Y-J.; Pang, Y-H.; Liu, B. IDP-Seq2Seq: Identification of intrinsically disordered regions based on sequence to sequence learning. Bioinformaitcs, 2020, 36(21), 5177-5186.
[http://dx.doi.org/10.1093/bioinformatics/btaa667] [PMID: 32702119]
[18]
Chauhan, J.S.; Mishra, N.K.; Raghava, G.P.S. Identification of ATP binding residues of a protein from its primary sequence. BMC Bioinformatics, 2009, 10, 434.
[http://dx.doi.org/10.1186/1471-2105-10-434] [PMID: 20021687]
[19]
Chen, K.; Mizianty, M.J.; Kurgan, L. ATPsite: sequence-based prediction of ATP-binding residues. Proteome Sci., 2011, 9(Suppl. 1), S4.
[http://dx.doi.org/10.1186/1477-5956-9-S1-S4] [PMID: 22165846]
[20]
Chen, K.; Mizianty, M.J.; Kurgan, L. Prediction and analysis of nucleotide-binding residues using sequence and sequence-derived structural descriptors. Bioinformatics, 2012, 28(3), 331-341.
[http://dx.doi.org/10.1093/bioinformatics/btr657] [PMID: 22130595]
[21]
Firoz, A.; Malik, A.; Joplin, K.H.; Ahmad, Z.; Jha, V.; Ahmad, S. Residue propensities, discrimination and binding site prediction of adenine and guanine phosphates. BMC Biochem., 2011, 12, 20.
[http://dx.doi.org/10.1186/1471-2091-12-20] [PMID: 21569447]
[22]
Zhang, Y.N.; Yu, D.J.; Li, S.S.; Fan, Y.X.; Huang, Y.; Shen, H.B. Predicting protein-ATP binding sites from primary sequence through fusing bi-profile sampling of multi-view features. BMC Bioinformatics, 2012, 13, 118.
[http://dx.doi.org/10.1186/1471-2105-13-118] [PMID: 22651691]
[23]
Yu, D.J.; Hu, J.; Huang, Y.; Shen, H.B.; Qi, Y.; Tang, Z.M.; Yang, J.Y. TargetATPsite: a template-free method for ATP-binding sites prediction with residue evolution image sparse representation and classifier ensemble. J. Comput. Chem., 2013, 34(11), 974-985.
[http://dx.doi.org/10.1002/jcc.23219] [PMID: 23288787]
[24]
Yu, D.J. Improving protein-ATP binding residues prediction by boosting SVMs with random under-sampling. Neurocomputing, 2013, 104, 180-190.
[http://dx.doi.org/10.1016/j.neucom.2012.10.012]
[25]
Ma, X.; Sun, X. Sequence-based predictor of ATP-binding residues using random forest and mRMR-IFS feature selection. J. Theor. Biol., 2014, 360, 59-66.
[http://dx.doi.org/10.1016/j.jtbi.2014.06.037] [PMID: 25014477]
[26]
Fang, C.; Noguchi, T.; Yamana, H. Simplified sequence-based method for ATP-binding prediction using contextual local evolutionary conservation. Algorithms Mol. Biol., 2014, 9(1), 7.
[http://dx.doi.org/10.1186/1748-7188-9-7] [PMID: 24618258]
[27]
Andrews, B.J.; Hu, J. TSC_ATP: A two-stage classifier for predicting protein-ATP binding sites from protein sequence. 2015 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (Cibcb), 2015, pp. 153-157.
[http://dx.doi.org/10.1109/CIBCB.2015.7300330]
[28]
Chen, W.; Feng, P.; Yang, H.; Ding, H.; Lin, H.; Chou, K.C. iRNA-3typeA: identifying three types of modification at RNA’s adenosine sites. Mol. Ther. Nucleic Acids, 2018, 11, 468-474.
[http://dx.doi.org/10.1016/j.omtn.2018.03.012] [PMID: 29858081]
[29]
Nguyen, T.T.D.; Le, N.Q.; Kusuma, R.M.I.; Ou, Y.Y. Prediction of ATP-binding sites in membrane proteins using a two-dimensional convolutional neural network. J. Mol. Graph. Model., 2019, 92, 86-93.
[http://dx.doi.org/10.1016/j.jmgm.2019.07.003] [PMID: 31344547]
[30]
Libbrecht, M.W.; Noble, W.S. Machine learning applications in genetics and genomics. Nat. Rev. Genet., 2015, 16(6), 321-332.
[http://dx.doi.org/10.1038/nrg3920] [PMID: 25948244]
[31]
Schrider, D.R.; Kern, A.D. Supervised machine learning for population genetics: a new paradigm. Trends Genet., 2018, 34(4), 301-312.
[http://dx.doi.org/10.1016/j.tig.2017.12.005] [PMID: 29331490]
[32]
Feng, P.; Ding, H.; Lin, H.; Chen, W. AOD: the antioxidant protein database. Sci. Rep., 2017, 7(1), 7449.
[http://dx.doi.org/10.1038/s41598-017-08115-6] [PMID: 28784999]
[33]
Liang, Z.Y.; Lai, H.Y.; Yang, H.; Zhang, C.J.; Yang, H.; Wei, H.H.; Chen, X.X.; Zhao, Y.W.; Su, Z.D.; Li, W.C.; Deng, E.Z.; Tang, H.; Chen, W.; Lin, H. Pro54DB: a database for experimentally verified sigma-54 promoters. Bioinformatics, 2017, 33(3), 467-469.
[PMID: 28171531]
[34]
Ning, L.; Cui, T.; Zheng, B.; Wang, N.; Luo, J.; Yang, B.; Du, M.; Cheng, J.; Dou, Y.; Wang, D. MNDR v3.0: mammal ncRNA-disease repository with increased coverage and annotation. Nucleic Acids Res., 2020, 49(D1), D160-D164.
[PMID: 32833025]
[35]
He, B.; Chai, G.; Duan, Y.; Yan, Z.; Qiu, L.; Zhang, H.; Liu, Z.; He, Q.; Han, K.; Ru, B.; Guo, F.B.; Ding, H.; Lin, H.; Wang, X.; Rao, N.; Zhou, P.; Huang, J. BDB: biopanning data bank. Nucleic Acids Res., 2016, 44(D1), D1127-D1132.
[http://dx.doi.org/10.1093/nar/gkv1100] [PMID: 26503249]
[36]
Hasan, M.A.M. Citrullination site prediction by incorporating sequence coupled effects into PseAAC and resolving data imbalance issue. Curr. Bioinform., 2020, 15(3), 235-245.
[http://dx.doi.org/10.2174/1574893614666191202152328]
[37]
Zhao, T.; Hu, Y.; Peng, J.; Cheng, L. DeepLGP: a novel deep learning method for prioritizing lncRNA target genes. Bioinformatics, 2020, 36(16), 4466-4472.
[http://dx.doi.org/10.1093/bioinformatics/btaa428] [PMID: 32467970]
[38]
Zhao, T.; Hu, Y.; Cheng, L. Deep-DRM: a computational method for identifying disease-related metabolites based on graph deep learning approaches. Brief. Bioinform., 2021, 22(4), 10.
[http://dx.doi.org/10.1093/bib/bbaa212] [PMID: 33048110]
[39]
Jin, Q. DUNet: A deformable network for retinal vessel segmentation. Knowl. Base. Syst., 2019, 178, 149-162.
[http://dx.doi.org/10.1016/j.knosys.2019.04.025]
[40]
Su, R.; Wu, H.; Xu, B.; Liu, X.; Wei, L. Developing a multi-dose computational model for drug-induced hepatotoxicity prediction based on toxicogenomics data. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 2019, 16(4), 1231-1239.
[http://dx.doi.org/10.1109/TCBB.2018.2858756] [PMID: 30040651]
[41]
Wei, L. Computational prediction and interpretation of cell-specific replication origin sites from multiple eukaryotes by exploiting stacking framework.Brief. Bioinform., 2021, 22(4), bbaa275.
[PMID: 33152766]
[42]
Wu, X.; Yu, L. EPSOL: Sequence-based protein solubility prediction using multidimensional embedding. Bioinformatics (Oxford, England), 2021, btab463.,
[43]
Huang, Y.; Niu, B.; Gao, Y.; Fu, L.; Li, W. CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics, 2010, 26(5), 680-682.
[http://dx.doi.org/10.1093/bioinformatics/btq003] [PMID: 20053844]
[44]
Wang, G.; Dunbrack, R.L. Jr PISCES: a protein sequence culling server. Bioinformatics, 2003, 19(12), 1589-1591.
[http://dx.doi.org/10.1093/bioinformatics/btg224] [PMID: 12912846]
[45]
Luscombe, N.M.; Laskowski, R.A.; Thornton, J.M. Amino acid-base interactions: a three-dimensional analysis of protein-DNA interactions at an atomic level. Nucleic Acids Res., 2001, 29(13), 2860-2874.
[http://dx.doi.org/10.1093/nar/29.13.2860] [PMID: 11433033]
[46]
Sobolev, V.; Sorokine, A.; Prilusky, J.; Abola, E.E.; Edelman, M. Automated analysis of interatomic contacts in proteins. Bioinformatics, 1999, 15(4), 327-332.
[http://dx.doi.org/10.1093/bioinformatics/15.4.327] [PMID: 10320401]
[47]
Bauer, R.A.; Günther, S.; Jansen, D.; Heeger, C.; Thaben, P.F.; Preissner, R. SuperSite: dictionary of metabolite and drug binding sites in proteins. Nucleic Acids Res., 2009, 37(Database issue), D195-D200.
[http://dx.doi.org/10.1093/nar/gkn618] [PMID: 18842629]
[48]
Berman, H.M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T.N.; Weissig, H.; Shindyalov, I.N.; Bourne, P.E. The Protein Data Bank. Nucleic Acids Res., 2000, 28(1), 235-242.
[http://dx.doi.org/10.1093/nar/28.1.235] [PMID: 10592235]
[49]
Wu, C.H.; Apweiler, R.; Bairoch, A.; Natale, D.A.; Barker, W.C.; Boeckmann, B.; Ferro, S.; Gasteiger, E.; Huang, H.; Lopez, R.; Magrane, M.; Martin, M.J.; Mazumder, R.; O’Donovan, C.; Redaschi, N.; Suzek, B. The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Res., 2006, 34(Database issue), D187-D191.
[http://dx.doi.org/10.1093/nar/gkj161] [PMID: 16381842]
[50]
Altschul, S.F.; Madden, T.L.; Schäffer, A.A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D.J. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 1997, 25(17), 3389-3402.
[http://dx.doi.org/10.1093/nar/25.17.3389] [PMID: 9254694]
[51]
He, H.B.; Garcia, E.A. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng., 2009, 21(9), 1263-1284.
[http://dx.doi.org/10.1109/TKDE.2008.239]
[52]
Zhang, J.; Liu, B. A review on the recent developments of sequence-based protein feature extraction methods. Curr. Bioinform., 2019, 14(3), 190-199.
[http://dx.doi.org/10.2174/1574893614666181212102749]
[53]
Cheng, L.; Zhao, H.; Wang, P.; Zhou, W.; Luo, M.; Li, T.; Han, J.; Liu, S.; Jiang, Q. Computational methods for identifying similar diseases. Mol. Ther. Nucleic Acids, 2019, 18, 590-604.
[http://dx.doi.org/10.1016/j.omtn.2019.09.019] [PMID: 31678735]
[54]
Cheng, L. Computational and biological methods for gene therapy. Curr. Gene Ther., 2019, 19(4), 210-210.
[http://dx.doi.org/10.2174/156652321904191022113307] [PMID: 31762421]
[55]
Zuo, Y.; Li, Y.; Chen, Y.; Li, G.; Yan, Z.; Yang, L. PseKRAAC: a flexible web server for generating pseudo K-tuple reduced amino acids composition. Bioinformatics, 2017, 33(1), 122-124.
[http://dx.doi.org/10.1093/bioinformatics/btw564] [PMID: 27565583]
[56]
Win, T.S.; Malik, A.A.; Prachayasittikul, V.; Wikberg, S. J.E.; Nantasenamat, C.; Shoombuatong, W. HemoPred: a web server for predicting the hemolytic activity of peptides. Future Med. Chem., 2017, 9(3), 275-291.
[http://dx.doi.org/10.4155/fmc-2016-0188] [PMID: 28211294]
[57]
Shoombuatong, W.; Hongjaisee, S.; Barin, F.; Chaijaruwanich, J.; Samleerat, T. HIV-1 CRF01_AE coreceptor usage prediction using kernel methods based logistic model trees. Comput. Biol. Med., 2012, 42(9), 885-889.
[http://dx.doi.org/10.1016/j.compbiomed.2012.06.011] [PMID: 22824642]
[58]
Hasan, M.M.; Schaduangrat, N.; Basith, S.; Lee, G.; Shoombuatong, W.; Manavalan, B. HLPpred-Fuse: improved and robust prediction of hemolytic peptide and its activity by fusing multiple feature representation. Bioinformatics, 2020, 36(11), 3350-3356.
[http://dx.doi.org/10.1093/bioinformatics/btaa160] [PMID: 32145017]
[59]
Charoenkwan, P.; Shoombuatong, W.; Lee, H.C.; Chaijaruwanich, J.; Huang, H.L.; Ho, S.Y. SCMCRYS: predicting protein crystallization using an ensemble scoring card method with estimating propensity scores of P-collocated amino acid pairs. PLoS One, 2013, 8(9) ,e72368
[http://dx.doi.org/10.1371/journal.pone.0072368] [PMID: 24019868]
[60]
Shoombuatong, W.; Schaduangrat, N.; Nantasenamat, C. Unraveling the bioactivity of anticancer peptides as deduced from machine learning. EXCLI J., 2018, 17, 734-752.
[PMID: 30190664]
[61]
Shao, J.; Yan, K.; Liu, B. FoldRec-C2C: protein fold recognition by combining cluster-to-cluster model and protein similarity network. Brief. Bioinform., 2021, 22(3), bbaa144.
[http://dx.doi.org/10.1093/bib/bbaa144] [PMID: 32685972]
[62]
Shang, Y. Prediction of drug-target interactions based on multi-layer network representation learning. Neurocomputing, 2021, 434, 80-89.
[http://dx.doi.org/10.1016/j.neucom.2020.12.068]
[63]
Shen, J.; Zhang, J.; Luo, X.; Zhu, W.; Yu, K.; Chen, K.; Li, Y.; Jiang, H. Predicting protein-protein interactions based only on sequences information. Proc. Natl. Acad. Sci. USA, 2007, 104(11), 4337-4341.
[http://dx.doi.org/10.1073/pnas.0607879104] [PMID: 17360525]
[64]
Zuo, Y.C.; Peng, Y.; Liu, L.; Chen, W.; Yang, L.; Fan, G.L. Predicting peroxidase subcellular location by hybridizing different descriptors of Chou’ pseudo amino acid patterns. Anal. Biochem., 2014, 458, 14-19.
[http://dx.doi.org/10.1016/j.ab.2014.04.032] [PMID: 24802134]
[65]
Liu, D.; Li, G.; Zuo, Y. Function determinants of TET proteins: the arrangements of sequence motifs with specific codes. Brief. Bioinform., 2019, 20(5), 1826-1835.
[http://dx.doi.org/10.1093/bib/bby053] [PMID: 29947743]
[66]
Chen, K.; Kurgan, L.A.; Ruan, J. Prediction of flexible/rigid regions from protein sequences using k-spaced amino acid pairs. BMC Struct. Biol., 2007, 7, 25.
[http://dx.doi.org/10.1186/1472-6807-7-25] [PMID: 17437643]
[67]
Chen, K.; Jiang, Y.; Du, L.; Kurgan, L. Prediction of integral membrane protein type by collocated hydrophobic amino acid pairs. J. Comput. Chem., 2009, 30(1), 163-172.
[http://dx.doi.org/10.1002/jcc.21053] [PMID: 18567007]
[68]
Senes, A.; Gerstein, M.; Engelman, D.M. Statistical analysis of amino acid patterns in transmembrane helices: the GxxxG motif occurs frequently and in association with beta-branched residues at neighboring positions. J. Mol. Biol., 2000, 296(3), 921-936.
[http://dx.doi.org/10.1006/jmbi.1999.3488] [PMID: 10677292]
[69]
Chen, W.; Feng, P.; Nie, F. iATP: A sequence based method for identifying anti-tubercular peptides. Med. Chem., 2019, 16(5), 620-625.
[http://dx.doi.org/10.2174/1573406415666191002152441] [PMID: 31339073]
[70]
Chen, Z.; Zhou, Y.; Song, J.; Zhang, Z. hCKSAAP_UbSite: improved prediction of human ubiquitination sites by exploiting amino acid pattern and properties. Biochim. Biophys. Acta, 2013, 1834(8), 1461-1467.
[http://dx.doi.org/10.1016/j.bbapap.2013.04.006] [PMID: 23603789]
[71]
Yang, J.; Roy, A.; Zhang, Y. Protein-ligand binding site recognition using complementary binding-specific substructure comparison and sequence profile alignment. Bioinformatics, 2013, 29(20), 2588-2595.
[http://dx.doi.org/10.1093/bioinformatics/btt447] [PMID: 23975762]
[72]
Zhang, Y. I-TASSER server for protein 3D structure prediction. BMC Bioinformatics, 2008, 9, 40.
[http://dx.doi.org/10.1186/1471-2105-9-40] [PMID: 18215316]
[73]
Ma, L. Co-Clustering Analysis of Protein Secondary Structures. Curr. Bioinform., 2017, 12(3), 213-224.
[http://dx.doi.org/10.2174/1574893612666170111145319]
[74]
McGuffin, L.J.; Bryson, K.; Jones, D.T. The PSIPRED protein structure prediction server. Bioinformatics, 2000, 16(4), 404-405.
[http://dx.doi.org/10.1093/bioinformatics/16.4.404] [PMID: 10869041]
[75]
Zheng, L.; Huang, S.; Mu, N.; Zhang, H.; Zhang, J.; Chang, Y.; Yang, L.; Zuo, Y. RAACBook: A web server of reduced amino acid alphabet for sequence-dependent inference by using Chou’s five-step rule. Database (Oxford), 2019, 2019 ,baz131
[http://dx.doi.org/10.1093/database/baz131] [PMID: 31802128]
[76]
Zheng, L. RaacLogo: a new sequence logo generator by using reduced amino acid clusters. Brief. Bioinform., 2020.
[PMID: 32524143]
[77]
Kawashima, S.; Pokarowski, P.; Pokarowska, M.; Kolinski, A.; Katayama, T.; Kanehisa, M. AAindex: amino acid index database, progress report 2008. Nucleic Acids Res., 2008, 36(Database issue), D202-D205.
[PMID: 17998252]
[78]
Fauchere, J.L.P.V.E. Hydrophobic parameters II of amino acid side-chains from the partitioning of N-acetyl-amino acid amides. Eur. J. Med. Chem., 1983, 18, 369-375.
[79]
Grantham, R. Amino acid difference formula to help explain protein evolution. Science, 1974, 185(4154), 862-864.
[http://dx.doi.org/10.1126/science.185.4154.862] [PMID: 4843792]
[80]
Jones, D.T.; Taylor, W.R.; Thornton, J.M. A new approach to protein fold recognition. Nature, 1992, 358(6381), 86-89.
[http://dx.doi.org/10.1038/358086a0] [PMID: 1614539]
[81]
Jones, S.; Thornton, J.M. Principles of protein-protein interactions. Proc. Natl. Acad. Sci. USA, 1996, 93(1), 13-20.
[http://dx.doi.org/10.1073/pnas.93.1.13] [PMID: 8552589]
[82]
Klein, P.; Kanehisa, M.; DeLisi, C. Prediction of protein function from sequence properties. Discriminant analysis of a data base. Biochim. Biophys. Acta, 1984, 787(3), 221-226.
[http://dx.doi.org/10.1016/0167-4838(84)90312-1] [PMID: 6547351]
[83]
Janin, J.; Wodak, S. Conformation of amino acid side-chains in proteins. J. Mol. Biol., 1978, 125(3), 357-386.
[http://dx.doi.org/10.1016/0022-2836(78)90408-4] [PMID: 731698]
[84]
Shao, J.; Xu, D.; Tsai, S.N.; Wang, Y.; Ngai, S.M. Computational identification of protein methylation sites through bi-profile Bayes feature extraction. PLoS One, 2009, 4(3) ,e4920
[http://dx.doi.org/10.1371/journal.pone.0004920] [PMID: 19290060]
[85]
Song, J.; Tan, H.; Shen, H.; Mahmood, K.; Boyd, S.E.; Webb, G.I.; Akutsu, T.; Whisstock, J.C. Cascleave: towards more accurate prediction of caspase substrate cleavage sites. Bioinformatics, 2010, 26(6), 752-760.
[http://dx.doi.org/10.1093/bioinformatics/btq043] [PMID: 20130033]
[86]
Jia, C.; He, W.; Zou, Q. DephosSitePred: A High Accuracy Predictor for Protein Dephosphorylation Sites. Comb. Chem. High Throughput Screen., 2017, 20(2), 153-157.
[http://dx.doi.org/10.2174/1386207319666161228155636] [PMID: 28031011]
[87]
Ju, Z.; Wang, S.Y. Predicting lysine lipoylation sites using bi-profile bayes feature extraction and fuzzy support vector machine algorithm. Anal. Biochem., 2018, 561-562, 11-17.
[http://dx.doi.org/10.1016/j.ab.2018.09.007] [PMID: 30218638]
[88]
Ju, Z.; Sun, J.; Li, Y.; Wang, L. Predicting lysine glycation sites using bi-profile bayes feature extraction. Comput. Biol. Chem., 2017, 71, 98-103.
[http://dx.doi.org/10.1016/j.compbiolchem.2017.10.004] [PMID: 29040908]
[89]
Jia, C.Z.; He, W.Y.; Yao, Y.H. OH-PRED: prediction of protein hydroxylation sites by incorporating adapted normal distribution bi-profile Bayes feature extraction and physicochemical properties of amino acids. J. Biomol. Struct. Dyn., 2017, 35(4), 829-835.
[http://dx.doi.org/10.1080/07391102.2016.1163294] [PMID: 26957000]
[90]
Ao, C.; Zou, Q.; Yu, L. RFhy-m2G: Identification of RNA N2-methylguanosine modification sites based on random forest and hybrid features. 2021.S1046-2023(21)00142-0.
[91]
Bairoch, A.; Apweiler, R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res., 2000, 28(1), 45-48.
[http://dx.doi.org/10.1093/nar/28.1.45] [PMID: 10592178]
[92]
Cheng, C.W.; Su, E.C.; Hwang, J.K.; Sung, T.Y.; Hsu, W.L. Predicting RNA-binding sites of proteins using support vector machines and evolutionary information. BMC Bioinformatics, 2008, 9(12), S6.
[http://dx.doi.org/10.1186/1471-2105-9-S12-S6] [PMID: 19091029]
[93]
Wang, K.; Samudrala, R. Incorporating background frequency improves entropy-based residue conservation measures. BMC Bioinformatics, 2006, 7, 385.
[http://dx.doi.org/10.1186/1471-2105-7-385] [PMID: 16916457]
[94]
Ma, X.; Guo, J.; Liu, H.D.; Xie, J.M.; Sun, X. Sequence-based prediction of DNA-binding residues in proteins with conservation and correlation information. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 2012, 9(6), 1766-1775.
[http://dx.doi.org/10.1109/TCBB.2012.106] [PMID: 22868682]
[95]
Zhao, X.; Jiao, Q.; Li, H.; Wu, Y.; Wang, H.; Huang, S.; Wang, G. ECFS-DEA: an ensemble classifier-based feature selection for differential expression analysis on expression profiles. BMC Bioinformatics, 2020, 21(1), 43.
[http://dx.doi.org/10.1186/s12859-020-3388-y] [PMID: 32024464]
[96]
Zhu, X.J. Predicting protein structural classes for low-similarity sequences by evaluating different features. Knowl. Base. Syst., 2019, 163, 787-793.
[http://dx.doi.org/10.1016/j.knosys.2018.10.007]
[97]
Yang, H.; Yang, W.; Dao, F.Y.; Lv, H.; Ding, H.; Chen, W.; Lin, H. A comparison and assessment of computational method for identifying recombination hotspots in Saccharomyces cerevisiae. Brief. Bioinform., 2019, 21(5), 1568-1580.
[http://dx.doi.org/10.1093/bib/bbz123] [PMID: 31633777]
[98]
Liu, K.; Chen, W. iMRM: a platform for simultaneously identifying multiple kinds of RNA modifications. Bioinformatics, 2020, 36(11), 3336-3342.
[http://dx.doi.org/10.1093/bioinformatics/btaa155] [PMID: 32134472]
[99]
Berrhail, F.; Belhadef, H. Genetic algorithm-based feature selection approach for enhancing the effectiveness of similarity searching in ligand-based virtual screening. Curr. Bioinform., 2020, 15(5), 431-444.
[http://dx.doi.org/10.2174/1574893614666191119123935]
[100]
Schaduangrat, N.; Nantasenamat, C.; Prachayasittikul, V.; Shoombuatong, W. ACPred: a computational tool for the prediction and analysis of anticancer peptides. Molecules, 2019, 24(10), 1973.
[http://dx.doi.org/10.3390/molecules24101973] [PMID: 31121946]
[101]
Simeon, S.; Shoombuatong, W.; Anuwongcharoen, N.; Preeyanon, L.; Prachayasittikul, V.; Wikberg, J.E.; Nantasenamat, C. osFP: a web server for predicting the oligomeric states of fluorescent proteins. J. Cheminform., 2016, 8(1), 72.
[http://dx.doi.org/10.1186/s13321-016-0185-8] [PMID: 28053671]
[102]
Win, T.S.; Schaduangrat, N.; Prachayasittikul, V.; Nantasenamat, C.; Shoombuatong, W. PAAP: a web server for predicting antihypertensive activity of peptides. Future Med. Chem., 2018, 10(15), 1749-1767.
[http://dx.doi.org/10.4155/fmc-2017-0300] [PMID: 30039980]
[103]
Peng, H.; Long, F.; Ding, C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell., 2005, 27(8), 1226-1238.
[http://dx.doi.org/10.1109/TPAMI.2005.159] [PMID: 16119262]
[104]
Hasan, M.M.; Manavalan, B.; Shoombuatong, W.; Khatun, M.S.; Kurata, H. i6mA-Fuse: improved and robust prediction of DNA 6 mA sites in the Rosaceae genome by fusing multiple feature representation. Plant Mol. Biol., 2020, 103(1-2), 225-234.
[http://dx.doi.org/10.1007/s11103-020-00988-y] [PMID: 32140819]
[105]
Hasan, M.M.; Manavalan, B.; Shoombuatong, W.; Khatun, M.S.; Kurata, H. i4mC-Mouse: Improved identification of DNA N4-methylcytosine sites in the mouse genome using multiple encoding schemes. Comput. Struct. Biotechnol. J., 2020, 18, 906-912.
[http://dx.doi.org/10.1016/j.csbj.2020.04.001] [PMID: 32322372]
[106]
Hasan, M.M.; Manavalan, B.; Khatun, M.S.; Kurata, H. i4mC-ROSE, a bioinformatics tool for the identification of DNA N4-methylcytosine sites in the Rosaceae genome. Int. J. Biol. Macromol., 2020, 157, 752-758.
[http://dx.doi.org/10.1016/j.ijbiomac.2019.12.009] [PMID: 31805335]
[107]
Du, X. Identification and analysis of cancer diagnosis using probabilistic classification vector machines with feature selection. Curr. Bioinform., 2018, 13(6), 625-632.
[http://dx.doi.org/10.2174/1574893612666170405125637]
[108]
Xu, Z.C.; Feng, P.M.; Yang, H.; Qiu, W.R.; Chen, W.; Lin, H. iRNAD: a computational tool for identifying D modification sites in RNA sequence. Bioinformatics, 2019, 35(23), 4922-4929.
[http://dx.doi.org/10.1093/bioinformatics/btz358] [PMID: 31077296]
[109]
Lin, H. Identifying Sigma70 promoters with novel pseudo nucleotide composition. IEEE/ACM Trans Comput Biol Bioinform, 2019, 16(4), 1316-1321.
[http://dx.doi.org/10.1109/TCBB.2017.2666141]
[110]
Zhang, Z.Y.; Yang, Y.H.; Ding, H.; Wang, D.; Chen, W.; Lin, H. Design powerful predictor for mRNA subcellular location prediction in Homo sapiens. Brief. Bioinform., 2020, 22(1), 526-535.
[http://dx.doi.org/10.1093/bib/bbz177] [PMID: 31994694]
[111]
Tahir, M.; Idris, A. MD-LBP: An efficient computational model for protein subcellular localization from HeLa cell lines using SVM. Curr. Bioinform., 2020, 15(3), 204-211.
[http://dx.doi.org/10.2174/1574893614666190723120716]
[112]
Jiang, Q.; Wang, G.; Jin, S.; Li, Y.; Wang, Y. Predicting human microRNA-disease associations based on support vector machine. Int. J. Data Min. Bioinform., 2013, 8(3), 282-293.
[http://dx.doi.org/10.1504/IJDMB.2013.056078] [PMID: 24417022]
[113]
Ao, C.; Yu, L.; Zou, Q. Prediction of bio-sequence modifications and the associations with diseases. Brief. Funct. Genomics, 2021, 20(1), 1-18.
[http://dx.doi.org/10.1093/bfgp/elaa023] [PMID: 33313647]
[114]
Tao, Z.; Li, Y.; Teng, Z.; Zhao, Y. A method for identifying vesicle transport proteins based on LibSVM and MRMD. Comput. Math. Methods Med., 2020, 2020 ,8926750
[http://dx.doi.org/10.1155/2020/8926750] [PMID: 33133228]
[115]
Wang, S. Immune cell infiltration-based signature for prognosis and immunogenomic analysis in breast cancer. Brief. Bioinform., 2021, 22(2), 2020-2031.
[http://dx.doi.org/10.1093/bib/bbaa026] [PMID: 32141494]
[116]
Chang, C.C.; Lin, C.J. LIBSVM: A Library for Support Vector Machines. ACM Trans. Intell. Syst. Technol., 2011, 2(3)
[http://dx.doi.org/10.1145/1961189.1961199]
[117]
Wei, H.; Liu, B. iCircDA-MF: identification of circRNA-disease associations based on matrix factorization. Brief. Bioinform., 2020, 21(4), 1356-1367.
[http://dx.doi.org/10.1093/bib/bbz057] [PMID: 31197324]
[118]
He, K.M. Deep residual learning for image recognition. 2016 Ieee Conference on Computer Vision and Pattern Recognition (Cvpr), 2016, pp. 770-778.
[http://dx.doi.org/10.1109/CVPR.2016.90]
[119]
Huang, Y.; Zhou, D.; Wang, Y.; Zhang, X.; Su, M.; Wang, C.; Sun, Z.; Jiang, Q.; Sun, B.; Zhang, Y. Prediction of transcription factors binding events based on epigenetic modifications in different human cells. Epigenomics, 2020, 12(16), 1443-1456.
[http://dx.doi.org/10.2217/epi-2019-0321] [PMID: 32921165]
[120]
Wang, X.; Yang, Y.; Liu, J.; Wang, G. The stacking strategy-based hybrid framework for identifying non-coding RNAs. Brief. Bioinform., 2021, bbab023.,
[http://dx.doi.org/10.1093/bib/bbab023] [PMID: 33693454]
[121]
Witten, I.H.; Frank, E.; Hall, M.A. Data mining : Practical machine learning tools and techniques, 3rd ed; Morgan Kaufmann series in data management systemsBurlington, MA; , 2011.
[122]
Tang, H.; Chen, W.; Lin, H. Identification of immunoglobulins using Chou’s pseudo amino acid composition with feature selection technique. Mol. Biosyst., 2016, 12(4), 1269-1275.
[http://dx.doi.org/10.1039/C5MB00883B] [PMID: 26883492]
[123]
Chen, W.; Feng, P.; Liu, T.; Jin, D. Recent advances in machine learning methods for predicting heat shock proteins. Curr. Drug Metab., 2019, 20(3), 224-228.
[http://dx.doi.org/10.2174/1389200219666181031105916] [PMID: 30378494]
[124]
Amanat, S. Identification of lysine carboxylation sites in proteins by integrating statistical moments and position relative features via general PseAAC. Curr. Bioinform., 2020, 15(5), 396-407.
[http://dx.doi.org/10.2174/1574893614666190723114923]
[125]
Cheng, L.; Qi, C.; Zhuang, H.; Fu, T.; Zhang, X. gutMDisorder: a comprehensive database for dysbiosis of the gut microbiota in disorders and interventions. Nucleic Acids Res., 2020, 48(D1), D554-D560.
[http://dx.doi.org/10.1093/nar/gkz843] [PMID: 31584099]
[126]
Cheng, L.; Zhuang, H.; Ju, H.; Yang, S.; Han, J.; Tan, R.; Hu, Y. Exposing the Causal Effect of Body Mass Index on the Risk of Type 2 Diabetes Mellitus: A Mendelian Randomization Study. Front. Genet., 2019, 10, 94.
[http://dx.doi.org/10.3389/fgene.2019.00094] [PMID: 30891058]
[127]
Wei, L.; Liao, M.; Gao, Y.; Ji, R.; He, Z.; Zou, Q. Improved and Promising Identification of Human MicroRNAs by Incorporating a High-Quality Negative Set. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 2014, 11(1), 192-201.
[http://dx.doi.org/10.1109/TCBB.2013.146] [PMID: 26355518]
[128]
Wei, L.; Wan, S.; Guo, J.; Wong, K.K. A novel hierarchical selective ensemble classifier with bioinformatics application. Artif. Intell. Med., 2017, 83, 82-90.
[http://dx.doi.org/10.1016/j.artmed.2017.02.005] [PMID: 28245947]
[129]
Wei, L.; Xing, P.; Zeng, J.; Chen, J.; Su, R.; Guo, F. Improved prediction of protein-protein interactions using novel negative samples, features, and an ensemble classifier. Artif. Intell. Med., 2017, 83, 67-74.
[http://dx.doi.org/10.1016/j.artmed.2017.03.001] [PMID: 28320624]
[130]
Manavalan, B.; Hasan, M.M.; Basith, S.; Gosu, V.; Shin, T.H.; Lee, G. Empirical Comparison and Analysis of Web-Based DNA N4-Methylcytosine Site Prediction Tools. Mol. Ther. Nucleic Acids, 2020, 22, 406-420.
[http://dx.doi.org/10.1016/j.omtn.2020.09.010] [PMID: 33230445]
[131]
Manavalan, B. Computational prediction of species-specific yeast DNA replication origin via iterative feature representation. Brief. Bioinform., 2020, 22(2), 2126-2140.
[PMID: 33232970]
[132]
Basith, S.; Manavalan, B.; Hwan Shin, T.; Lee, G. Machine intelligence in peptide therapeutics: A next-generation tool for rapid disease screening. Med. Res. Rev., 2020, 40(4), 1276-1314.
[http://dx.doi.org/10.1002/med.21658] [PMID: 31922268]
[133]
Liang, P.; Yang, W.; Chen, X.; Long, C.; Zheng, L.; Li, H.; Zuo, Y. Machine Learning of Single-Cell Transcriptome Highly Identifies mRNA Signature by Comparing F-Score Selection with DGE Analysis. Mol. Ther. Nucleic Acids, 2020, 20, 155-163.
[http://dx.doi.org/10.1016/j.omtn.2020.02.004] [PMID: 32169803]
[134]
Su, R.; Liu, X.; Wei, L.; Zou, Q. Deep-Resp-Forest: A deep forest model to predict anti-cancer drug response. Methods, 2019, 166, 91-102.
[http://dx.doi.org/10.1016/j.ymeth.2019.02.009] [PMID: 30772464]
[135]
Wei, L.; Chen, H.; Su, R. M6APred-EL: A Sequence-Based Predictor for Identifying N6-methyladenosine Sites Using Ensemble Learning. Mol. Ther. Nucleic Acids, 2018, 12, 635-644.
[http://dx.doi.org/10.1016/j.omtn.2018.07.004] [PMID: 30081234]
[136]
Zhai, Y.; Chen, Y.; Teng, Z.; Zhao, Y. Identifying Antioxidant Proteins by Using Amino Acid Composition and Protein-Protein Interactions. Front. Cell Dev. Biol., 2020, 8 ,591487
[http://dx.doi.org/10.3389/fcell.2020.591487] [PMID: 33195258]
[137]
Guo, Z.; Wang, P.; Liu, Z.; Zhao, Y. Discrimination of Thermophilic Proteins and Non-thermophilic Proteins Using Feature Dimension Reduction. Front. Bioeng. Biotechnol., 2020, 8 ,584807
[http://dx.doi.org/10.3389/fbioe.2020.584807] [PMID: 33195148]
[138]
Faraggi, E.; Xue, B.; Zhou, Y. Improving the prediction accuracy of residue solvent accessibility and real-value backbone torsion angles of proteins by guided-learning through a two-layer neural network. Proteins, 2009, 74(4), 847-856.
[http://dx.doi.org/10.1002/prot.22193] [PMID: 18704931]
[139]
Ward, J.J.; Sodhi, J.S.; McGuffin, L.J.; Buxton, B.F.; Jones, D.T. Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. J. Mol. Biol., 2004, 337(3), 635-645.
[http://dx.doi.org/10.1016/j.jmb.2004.02.002] [PMID: 15019783]
[140]
Cheng, J. SCRATCH: A protein structure and structural feature prediction server.Nucleic Acids Res, 2005, 33(Web Server issue), W72-6.,
[http://dx.doi.org/10.1093/nar/gki396]
[141]
Hasan, M.M.; Alam, M.A.; Shoombuatong, W.; Deng, H.W.; Manavalan, B.; Kurata, H. NeuroPred-FRL: An interpretable prediction model for identifying neuropeptide using feature representation learning Brief. Bioinform., 2021, bbab167.
[http://dx.doi.org/10.1093/bib/bbab167] [PMID: 33975333]
[142]
Charoenkwan, P.; Chiangjong, W.; Nantasenamat, C.; Hasan, M.M.; Manavalan, B.; Shoombuatong, W. .StackIL6: A stacking ensemble model for improving the prediction of IL-6 inducing peptides. Brief. Bioinform., 2021, bbab172
[http://dx.doi.org/10.1093/bib/bbab172] [PMID: 33963832]
[143]
Lv, H.; Dao, F.Y.; Zulfiqar, H.; Lin, H. DeepIPs: comprehensive assessment and computational identification of phosphorylation sites of SARS-CoV-2 infection using a deep learning-based approach. Bioinformatics, 2020, 36(11), 3350-3356.
[http://dx.doi.org/10.1093/bib/bbab244] [PMID: 32145017]
[144]
Wei, L.; Su, R.; Luan, S.; Liao, Z.; Manavalan, B.; Zou, Q.; Shi, X. Iterative feature representations improve N4-methylcytosine site prediction. Bioinformatics, 2019, 35(23), 4930-4937.
[http://dx.doi.org/10.1093/bioinformatics/btz408] [PMID: 31099381]
[145]
Long, H. Predicting Protein Phosphorylation Sites Based on Deep Learning. Curr. Bioinform., 2020, 15(4), 300-308.
[http://dx.doi.org/10.2174/1574893614666190902154332]
[146]
Guo, C. ExomeHMM: A Hidden Markov Model for Detecting Copy Number Variation Using Whole-Exome Sequencing Data. Curr. Bioinform., 2017, 12(2), 147-155.
[http://dx.doi.org/10.2174/1574893611666160727160757]

Rights & Permissions Print Cite
© 2025 Bentham Science Publishers | Privacy Policy