Generic placeholder image

Current Medicinal Chemistry

Editor-in-Chief

ISSN (Print): 0929-8673
ISSN (Online): 1875-533X

Review Article

A Survey for Predicting ATP Binding Residues of Proteins Using Machine Learning Methods

Author(s): Yu-He Yang, Jia-Shu Wang, Shi-Shi Yuan, Meng-Lu Liu, Wei Su, Hao Lin* and Zhao-Yue Zhang*

Volume 29, Issue 5, 2022

Published on: 10 January, 2022

Page: [789 - 806] Pages: 18

DOI: 10.2174/0929867328666210910125802

Price: $65

Abstract

Protein-ligand interactions are necessary for majority protein functions. Adenosine- 5’-triphosphate (ATP) is one such ligand that plays vital role as a coenzyme in providing energy for cellular activities, catalyzing biological reaction and signaling. Knowing ATP binding residues of proteins is helpful for annotation of protein function and drug design. However, due to the huge amounts of protein sequences influx into databases in the post-genome era, experimentally identifying ATP binding residues is costineffective and time-consuming. To address this problem, computational methods have been developed to predict ATP binding residues. In this review, we briefly summarized the application of machine learning methods in detecting ATP binding residues of proteins. We expect this review will be helpful for further research.

Keywords: Adenosine-5’-triphosphate (ATP), binding residues, prediction, machine learning, feature extraction, proteins.

[1]
Bergamini, C.M.; Dondi, A.; Lanzara, V.; Squerzanti, M.; Cervellati, C.; Montin, K.; Mischiati, C.; Tasco, G.; Collighan, R.; Griffin, M.; Casadio, R. Thermodynamics of binding of regulatory ligands to tissue transglutaminase. Amino Acids, 2010, 39(1), 297-304.
[http://dx.doi.org/10.1007/s00726-009-0442-5] [PMID: 20033238]
[2]
Talavera, D.; Robertson, D.L.; Lovell, S.C. Characterization of protein-protein interaction interfaces from a single species. PLoS One, 2011, 6(6) ,e21053
[http://dx.doi.org/10.1371/journal.pone.0021053] [PMID: 21738603]
[3]
Bartoli, L.; Martelli, P.L.; Rossi, I.; Fariselli, P.; Casadio, R. The prediction of protein-protein interacting sites in genome-wide protein interaction networks: the test case of the human cell cycle. Curr. Protein Pept. Sci., 2010, 11(7), 601-608.
[http://dx.doi.org/10.2174/138920310794109157] [PMID: 20887257]
[4]
Jakhar, R.; Dangi, M.; Khichi, A.; Chhillar, A.K. Relevance of molecular docking studies in drug designing. Curr. Bioinform., 2020, 15(4), 270-278.
[http://dx.doi.org/10.2174/1574893615666191219094216]
[5]
Liu, B.; Gao, X.; Zhang, H. BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches. Nucleic Acids Res., 2019, 47(20) ,e127
[http://dx.doi.org/10.1093/nar/gkz740] [PMID: 31504851]
[6]
Zhao, X.; Wang, H.; Li, H.; Wu, Y.; Wang, G. Identifying plant pentatricopeptide repeat proteins using a variable selection method. Front. Plant Sci., 2021, 12 ,506681
[http://dx.doi.org/10.3389/fpls.2021.506681] [PMID: 33732270]
[7]
Maruyama, K. The discovery of adenosine-triphosphate and the establishment of its structure. J. Hist. Biol., 1991, 24(1), 145-154.
[http://dx.doi.org/10.1007/BF00130477]
[8]
Bunney, T.D.; van Walraven, H.S.; de Boer, A.H. 14-3-3 protein is a regulator of the mitochondrial and chloroplast ATP synthase. Proc. Natl. Acad. Sci. USA, 2001, 98(7), 4249-4254.
[http://dx.doi.org/10.1073/pnas.061437498] [PMID: 11274449]
[9]
Maruyama, K. The discovery of adenosine triphosphate and the establishment of its structure. J. Hist. Biol., 1991, 24, 145-154.
[http://dx.doi.org/10.1007/BF00130477]
[10]
Maxwell, A.; Lawson, D.M. The ATP-binding site of type II topoisomerases as a target for antibacterial drugs. Curr. Top. Med. Chem., 2003, 3(3), 283-303.
[http://dx.doi.org/10.2174/1568026033452500] [PMID: 12570764]
[11]
Rock, F.L.; Mao, W.; Yaremchuk, A.; Tukalo, M.; Crépin, T.; Zhou, H.; Zhang, Y.K.; Hernandez, V.; Akama, T.; Baker, S.J.; Plattner, J.J.; Shapiro, L.; Martinis, S.A.; Benkovic, S.J.; Cusack, S.; Alley, M.R. An antifungal agent inhibits an aminoacyl-tRNA synthetase by trapping tRNA in the editing site. Science, 2007, 316(5832), 1759-1761.
[http://dx.doi.org/10.1126/science.1142189] [PMID: 17588934]
[12]
Yu, L.; Wang, M.; Yang, Y.; Xu, F.; Zhang, X.; Xie, F.; Gao, L.; Li, X. Predicting therapeutic drugs for hepatocellular carcinoma based on tissue-specific pathways. PLOS Comput. Biol., 2021, 17(2) ,e1008696
[http://dx.doi.org/10.1371/journal.pcbi.1008696] [PMID: 33561121]
[13]
Guo, T.; Shi, Y.; Sun, Z. A novel statistical ligand-binding site predictor: application to ATP-binding sites. Protein Eng. Des. Sel., 2005, 18(2), 65-70.
[http://dx.doi.org/10.1093/protein/gzi006] [PMID: 15799998]
[14]
Saito, M.; Go, M.; Shirai, T. An empirical approach for detecting nucleotide-binding sites on proteins. Protein Eng. Des. Sel., 2006, 19(2), 67-75.
[http://dx.doi.org/10.1093/protein/gzj002] [PMID: 16403825]
[15]
Jiménez, J.; Škalič, M.; Martínez-Rosell, G.; De Fabritiis, G. KDEEP: Protein-ligand absolute binding affinity prediction via 3D-convolutional neural networks. J. Chem. Inf. Model., 2018, 58(2), 287-296.
[http://dx.doi.org/10.1021/acs.jcim.7b00650] [PMID: 29309725]
[16]
Qazi, S.R. HSEAT: A tool for plant heat shock element analysis, motif identification and analysis. Curr. Bioinform., 2020, 15(3), 196-203.
[http://dx.doi.org/10.2174/1574893614666190102151956]
[17]
Tang, Y-J.; Pang, Y-H.; Liu, B. IDP-Seq2Seq: Identification of intrinsically disordered regions based on sequence to sequence learning. Bioinformaitcs, 2020, 36(21), 5177-5186.
[http://dx.doi.org/10.1093/bioinformatics/btaa667] [PMID: 32702119]
[18]
Chauhan, J.S.; Mishra, N.K.; Raghava, G.P.S. Identification of ATP binding residues of a protein from its primary sequence. BMC Bioinformatics, 2009, 10, 434.
[http://dx.doi.org/10.1186/1471-2105-10-434] [PMID: 20021687]
[19]
Chen, K.; Mizianty, M.J.; Kurgan, L. ATPsite: sequence-based prediction of ATP-binding residues. Proteome Sci., 2011, 9(Suppl. 1), S4.
[http://dx.doi.org/10.1186/1477-5956-9-S1-S4] [PMID: 22165846]
[20]
Chen, K.; Mizianty, M.J.; Kurgan, L. Prediction and analysis of nucleotide-binding residues using sequence and sequence-derived structural descriptors. Bioinformatics, 2012, 28(3), 331-341.
[http://dx.doi.org/10.1093/bioinformatics/btr657] [PMID: 22130595]
[21]
Firoz, A.; Malik, A.; Joplin, K.H.; Ahmad, Z.; Jha, V.; Ahmad, S. Residue propensities, discrimination and binding site prediction of adenine and guanine phosphates. BMC Biochem., 2011, 12, 20.
[http://dx.doi.org/10.1186/1471-2091-12-20] [PMID: 21569447]
[22]
Zhang, Y.N.; Yu, D.J.; Li, S.S.; Fan, Y.X.; Huang, Y.; Shen, H.B. Predicting protein-ATP binding sites from primary sequence through fusing bi-profile sampling of multi-view features. BMC Bioinformatics, 2012, 13, 118.
[http://dx.doi.org/10.1186/1471-2105-13-118] [PMID: 22651691]
[23]
Yu, D.J.; Hu, J.; Huang, Y.; Shen, H.B.; Qi, Y.; Tang, Z.M.; Yang, J.Y. TargetATPsite: a template-free method for ATP-binding sites prediction with residue evolution image sparse representation and classifier ensemble. J. Comput. Chem., 2013, 34(11), 974-985.
[http://dx.doi.org/10.1002/jcc.23219] [PMID: 23288787]
[24]
Yu, D.J. Improving protein-ATP binding residues prediction by boosting SVMs with random under-sampling. Neurocomputing, 2013, 104, 180-190.
[http://dx.doi.org/10.1016/j.neucom.2012.10.012]
[25]
Ma, X.; Sun, X. Sequence-based predictor of ATP-binding residues using random forest and mRMR-IFS feature selection. J. Theor. Biol., 2014, 360, 59-66.
[http://dx.doi.org/10.1016/j.jtbi.2014.06.037] [PMID: 25014477]
[26]
Fang, C.; Noguchi, T.; Yamana, H. Simplified sequence-based method for ATP-binding prediction using contextual local evolutionary conservation. Algorithms Mol. Biol., 2014, 9(1), 7.
[http://dx.doi.org/10.1186/1748-7188-9-7] [PMID: 24618258]
[27]
Andrews, B.J.; Hu, J. TSC_ATP: A two-stage classifier for predicting protein-ATP binding sites from protein sequence. 2015 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (Cibcb), 2015, pp. 153-157.
[http://dx.doi.org/10.1109/CIBCB.2015.7300330]
[28]
Chen, W.; Feng, P.; Yang, H.; Ding, H.; Lin, H.; Chou, K.C. iRNA-3typeA: identifying three types of modification at RNA’s adenosine sites. Mol. Ther. Nucleic Acids, 2018, 11, 468-474.
[http://dx.doi.org/10.1016/j.omtn.2018.03.012] [PMID: 29858081]
[29]
Nguyen, T.T.D.; Le, N.Q.; Kusuma, R.M.I.; Ou, Y.Y. Prediction of ATP-binding sites in membrane proteins using a two-dimensional convolutional neural network. J. Mol. Graph. Model., 2019, 92, 86-93.
[http://dx.doi.org/10.1016/j.jmgm.2019.07.003] [PMID: 31344547]
[30]
Libbrecht, M.W.; Noble, W.S. Machine learning applications in genetics and genomics. Nat. Rev. Genet., 2015, 16(6), 321-332.
[http://dx.doi.org/10.1038/nrg3920] [PMID: 25948244]
[31]
Schrider, D.R.; Kern, A.D. Supervised machine learning for population genetics: a new paradigm. Trends Genet., 2018, 34(4), 301-312.
[http://dx.doi.org/10.1016/j.tig.2017.12.005] [PMID: 29331490]
[32]
Feng, P.; Ding, H.; Lin, H.; Chen, W. AOD: the antioxidant protein database. Sci. Rep., 2017, 7(1), 7449.
[http://dx.doi.org/10.1038/s41598-017-08115-6] [PMID: 28784999]
[33]
Liang, Z.Y.; Lai, H.Y.; Yang, H.; Zhang, C.J.; Yang, H.; Wei, H.H.; Chen, X.X.; Zhao, Y.W.; Su, Z.D.; Li, W.C.; Deng, E.Z.; Tang, H.; Chen, W.; Lin, H. Pro54DB: a database for experimentally verified sigma-54 promoters. Bioinformatics, 2017, 33(3), 467-469.
[PMID: 28171531]
[34]
Ning, L.; Cui, T.; Zheng, B.; Wang, N.; Luo, J.; Yang, B.; Du, M.; Cheng, J.; Dou, Y.; Wang, D. MNDR v3.0: mammal ncRNA-disease repository with increased coverage and annotation. Nucleic Acids Res., 2020, 49(D1), D160-D164.
[PMID: 32833025]
[35]
He, B.; Chai, G.; Duan, Y.; Yan, Z.; Qiu, L.; Zhang, H.; Liu, Z.; He, Q.; Han, K.; Ru, B.; Guo, F.B.; Ding, H.; Lin, H.; Wang, X.; Rao, N.; Zhou, P.; Huang, J. BDB: biopanning data bank. Nucleic Acids Res., 2016, 44(D1), D1127-D1132.
[http://dx.doi.org/10.1093/nar/gkv1100] [PMID: 26503249]
[36]
Hasan, M.A.M. Citrullination site prediction by incorporating sequence coupled effects into PseAAC and resolving data imbalance issue. Curr. Bioinform., 2020, 15(3), 235-245.
[http://dx.doi.org/10.2174/1574893614666191202152328]
[37]
Zhao, T.; Hu, Y.; Peng, J.; Cheng, L. DeepLGP: a novel deep learning method for prioritizing lncRNA target genes. Bioinformatics, 2020, 36(16), 4466-4472.
[http://dx.doi.org/10.1093/bioinformatics/btaa428] [PMID: 32467970]
[38]
Zhao, T.; Hu, Y.; Cheng, L. Deep-DRM: a computational method for identifying disease-related metabolites based on graph deep learning approaches. Brief. Bioinform., 2021, 22(4), 10.
[http://dx.doi.org/10.1093/bib/bbaa212] [PMID: 33048110]
[39]
Jin, Q. DUNet: A deformable network for retinal vessel segmentation. Knowl. Base. Syst., 2019, 178, 149-162.
[http://dx.doi.org/10.1016/j.knosys.2019.04.025]
[40]
Su, R.; Wu, H.; Xu, B.; Liu, X.; Wei, L. Developing a multi-dose computational model for drug-induced hepatotoxicity prediction based on toxicogenomics data. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 2019, 16(4), 1231-1239.
[http://dx.doi.org/10.1109/TCBB.2018.2858756] [PMID: 30040651]
[41]
Wei, L. Computational prediction and interpretation of cell-specific replication origin sites from multiple eukaryotes by exploiting stacking framework.Brief. Bioinform., 2021, 22(4), bbaa275.
[PMID: 33152766]
[42]
Wu, X.; Yu, L. EPSOL: Sequence-based protein solubility prediction using multidimensional embedding. Bioinformatics (Oxford, England), 2021, btab463.,
[43]
Huang, Y.; Niu, B.; Gao, Y.; Fu, L.; Li, W. CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics, 2010, 26(5), 680-682.
[http://dx.doi.org/10.1093/bioinformatics/btq003] [PMID: 20053844]
[44]
Wang, G.; Dunbrack, R.L. Jr PISCES: a protein sequence culling server. Bioinformatics, 2003, 19(12), 1589-1591.
[http://dx.doi.org/10.1093/bioinformatics/btg224] [PMID: 12912846]
[45]
Luscombe, N.M.; Laskowski, R.A.; Thornton, J.M. Amino acid-base interactions: a three-dimensional analysis of protein-DNA interactions at an atomic level. Nucleic Acids Res., 2001, 29(13), 2860-2874.
[http://dx.doi.org/10.1093/nar/29.13.2860] [PMID: 11433033]
[46]
Sobolev, V.; Sorokine, A.; Prilusky, J.; Abola, E.E.; Edelman, M. Automated analysis of interatomic contacts in proteins. Bioinformatics, 1999, 15(4), 327-332.
[http://dx.doi.org/10.1093/bioinformatics/15.4.327] [PMID: 10320401]
[47]
Bauer, R.A.; Günther, S.; Jansen, D.; Heeger, C.; Thaben, P.F.; Preissner, R. SuperSite: dictionary of metabolite and drug binding sites in proteins. Nucleic Acids Res., 2009, 37(Database issue), D195-D200.
[http://dx.doi.org/10.1093/nar/gkn618] [PMID: 18842629]
[48]
Berman, H.M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T.N.; Weissig, H.; Shindyalov, I.N.; Bourne, P.E. The Protein Data Bank. Nucleic Acids Res., 2000, 28(1), 235-242.
[http://dx.doi.org/10.1093/nar/28.1.235] [PMID: 10592235]
[49]
Wu, C.H.; Apweiler, R.; Bairoch, A.; Natale, D.A.; Barker, W.C.; Boeckmann, B.; Ferro, S.; Gasteiger, E.; Huang, H.; Lopez, R.; Magrane, M.; Martin, M.J.; Mazumder, R.; O’Donovan, C.; Redaschi, N.; Suzek, B. The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Res., 2006, 34(Database issue), D187-D191.
[http://dx.doi.org/10.1093/nar/gkj161] [PMID: 16381842]
[50]
Altschul, S.F.; Madden, T.L.; Schäffer, A.A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D.J. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 1997, 25(17), 3389-3402.
[http://dx.doi.org/10.1093/nar/25.17.3389] [PMID: 9254694]
[51]
He, H.B.; Garcia, E.A. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng., 2009, 21(9), 1263-1284.
[http://dx.doi.org/10.1109/TKDE.2008.239]
[52]
Zhang, J.; Liu, B. A review on the recent developments of sequence-based protein feature extraction methods. Curr. Bioinform., 2019, 14(3), 190-199.
[http://dx.doi.org/10.2174/1574893614666181212102749]
[53]
Cheng, L.; Zhao, H.; Wang, P.; Zhou, W.; Luo, M.; Li, T.; Han, J.; Liu, S.; Jiang, Q. Computational methods for identifying similar diseases. Mol. Ther. Nucleic Acids, 2019, 18, 590-604.
[http://dx.doi.org/10.1016/j.omtn.2019.09.019] [PMID: 31678735]
[54]
Cheng, L. Computational and biological methods for gene therapy. Curr. Gene Ther., 2019, 19(4), 210-210.
[http://dx.doi.org/10.2174/156652321904191022113307] [PMID: 31762421]
[55]
Zuo, Y.; Li, Y.; Chen, Y.; Li, G.; Yan, Z.; Yang, L. PseKRAAC: a flexible web server for generating pseudo K-tuple reduced amino acids composition. Bioinformatics, 2017, 33(1), 122-124.
[http://dx.doi.org/10.1093/bioinformatics/btw564] [PMID: 27565583]
[56]
Win, T.S.; Malik, A.A.; Prachayasittikul, V.; Wikberg, S. J.E.; Nantasenamat, C.; Shoombuatong, W. HemoPred: a web server for predicting the hemolytic activity of peptides. Future Med. Chem., 2017, 9(3), 275-291.
[http://dx.doi.org/10.4155/fmc-2016-0188] [PMID: 28211294]
[57]
Shoombuatong, W.; Hongjaisee, S.; Barin, F.; Chaijaruwanich, J.; Samleerat, T. HIV-1 CRF01_AE coreceptor usage prediction using kernel methods based logistic model trees. Comput. Biol. Med., 2012, 42(9), 885-889.
[http://dx.doi.org/10.1016/j.compbiomed.2012.06.011] [PMID: 22824642]
[58]
Hasan, M.M.; Schaduangrat, N.; Basith, S.; Lee, G.; Shoombuatong, W.; Manavalan, B. HLPpred-Fuse: improved and robust prediction of hemolytic peptide and its activity by fusing multiple feature representation. Bioinformatics, 2020, 36(11), 3350-3356.
[http://dx.doi.org/10.1093/bioinformatics/btaa160] [PMID: 32145017]
[59]
Charoenkwan, P.; Shoombuatong, W.; Lee, H.C.; Chaijaruwanich, J.; Huang, H.L.; Ho, S.Y. SCMCRYS: predicting protein crystallization using an ensemble scoring card method with estimating propensity scores of P-collocated amino acid pairs. PLoS One, 2013, 8(9) ,e72368
[http://dx.doi.org/10.1371/journal.pone.0072368] [PMID: 24019868]
[60]
Shoombuatong, W.; Schaduangrat, N.; Nantasenamat, C. Unraveling the bioactivity of anticancer peptides as deduced from machine learning. EXCLI J., 2018, 17, 734-752.
[PMID: 30190664]
[61]
Shao, J.; Yan, K.; Liu, B. FoldRec-C2C: protein fold recognition by combining cluster-to-cluster model and protein similarity network. Brief. Bioinform., 2021, 22(3), bbaa144.
[http://dx.doi.org/10.1093/bib/bbaa144] [PMID: 32685972]
[62]
Shang, Y. Prediction of drug-target interactions based on multi-layer network representation learning. Neurocomputing, 2021, 434, 80-89.
[http://dx.doi.org/10.1016/j.neucom.2020.12.068]
[63]
Shen, J.; Zhang, J.; Luo, X.; Zhu, W.; Yu, K.; Chen, K.; Li, Y.; Jiang, H. Predicting protein-protein interactions based only on sequences information. Proc. Natl. Acad. Sci. USA, 2007, 104(11), 4337-4341.
[http://dx.doi.org/10.1073/pnas.0607879104] [PMID: 17360525]
[64]
Zuo, Y.C.; Peng, Y.; Liu, L.; Chen, W.; Yang, L.; Fan, G.L. Predicting peroxidase subcellular location by hybridizing different descriptors of Chou’ pseudo amino acid patterns. Anal. Biochem., 2014, 458, 14-19.
[http://dx.doi.org/10.1016/j.ab.2014.04.032] [PMID: 24802134]
[65]
Liu, D.; Li, G.; Zuo, Y. Function determinants of TET proteins: the arrangements of sequence motifs with specific codes. Brief. Bioinform., 2019, 20(5), 1826-1835.
[http://dx.doi.org/10.1093/bib/bby053] [PMID: 29947743]
[66]
Chen, K.; Kurgan, L.A.; Ruan, J. Prediction of flexible/rigid regions from protein sequences using k-spaced amino acid pairs. BMC Struct. Biol., 2007, 7, 25.
[http://dx.doi.org/10.1186/1472-6807-7-25] [PMID: 17437643]
[67]
Chen, K.; Jiang, Y.; Du, L.; Kurgan, L. Prediction of integral membrane protein type by collocated hydrophobic amino acid pairs. J. Comput. Chem., 2009, 30(1), 163-172.
[http://dx.doi.org/10.1002/jcc.21053] [PMID: 18567007]
[68]
Senes, A.; Gerstein, M.; Engelman, D.M. Statistical analysis of amino acid patterns in transmembrane helices: the GxxxG motif occurs frequently and in association with beta-branched residues at neighboring positions. J. Mol. Biol., 2000, 296(3), 921-936.
[http://dx.doi.org/10.1006/jmbi.1999.3488] [PMID: 10677292]
[69]
Chen, W.; Feng, P.; Nie, F. iATP: A sequence based method for identifying anti-tubercular peptides. Med. Chem., 2019, 16(5), 620-625.
[http://dx.doi.org/10.2174/1573406415666191002152441] [PMID: 31339073]
[70]
Chen, Z.; Zhou, Y.; Song, J.; Zhang, Z. hCKSAAP_UbSite: improved prediction of human ubiquitination sites by exploiting amino acid pattern and properties. Biochim. Biophys. Acta, 2013, 1834(8), 1461-1467.
[http://dx.doi.org/10.1016/j.bbapap.2013.04.006] [PMID: 23603789]
[71]
Yang, J.; Roy, A.; Zhang, Y. Protein-ligand binding site recognition using complementary binding-specific substructure comparison and sequence profile alignment. Bioinformatics, 2013, 29(20), 2588-2595.
[http://dx.doi.org/10.1093/bioinformatics/btt447] [PMID: 23975762]
[72]
Zhang, Y. I-TASSER server for protein 3D structure prediction. BMC Bioinformatics, 2008, 9, 40.
[http://dx.doi.org/10.1186/1471-2105-9-40] [PMID: 18215316]
[73]
Ma, L. Co-Clustering Analysis of Protein Secondary Structures. Curr. Bioinform., 2017, 12(3), 213-224.
[http://dx.doi.org/10.2174/1574893612666170111145319]
[74]
McGuffin, L.J.; Bryson, K.; Jones, D.T. The PSIPRED protein structure prediction server. Bioinformatics, 2000, 16(4), 404-405.
[http://dx.doi.org/10.1093/bioinformatics/16.4.404] [PMID: 10869041]
[75]
Zheng, L.; Huang, S.; Mu, N.; Zhang, H.; Zhang, J.; Chang, Y.; Yang, L.; Zuo, Y. RAACBook: A web server of reduced amino acid alphabet for sequence-dependent inference by using Chou’s five-step rule. Database (Oxford), 2019, 2019 ,baz131
[http://dx.doi.org/10.1093/database/baz131] [PMID: 31802128]
[76]
Zheng, L. RaacLogo: a new sequence logo generator by using reduced amino acid clusters. Brief. Bioinform., 2020.
[PMID: 32524143]
[77]
Kawashima, S.; Pokarowski, P.; Pokarowska, M.; Kolinski, A.; Katayama, T.; Kanehisa, M. AAindex: amino acid index database, progress report 2008. Nucleic Acids Res., 2008, 36(Database issue), D202-D205.
[PMID: 17998252]
[78]
Fauchere, J.L.P.V.E. Hydrophobic parameters II of amino acid side-chains from the partitioning of N-acetyl-amino acid amides. Eur. J. Med. Chem., 1983, 18, 369-375.
[79]
Grantham, R. Amino acid difference formula to help explain protein evolution. Science, 1974, 185(4154), 862-864.
[http://dx.doi.org/10.1126/science.185.4154.862] [PMID: 4843792]
[80]
Jones, D.T.; Taylor, W.R.; Thornton, J.M. A new approach to protein fold recognition. Nature, 1992, 358(6381), 86-89.
[http://dx.doi.org/10.1038/358086a0] [PMID: 1614539]
[81]
Jones, S.; Thornton, J.M. Principles of protein-protein interactions. Proc. Natl. Acad. Sci. USA, 1996, 93(1), 13-20.
[http://dx.doi.org/10.1073/pnas.93.1.13] [PMID: 8552589]
[82]
Klein, P.; Kanehisa, M.; DeLisi, C. Prediction of protein function from sequence properties. Discriminant analysis of a data base. Biochim. Biophys. Acta, 1984, 787(3), 221-226.
[http://dx.doi.org/10.1016/0167-4838(84)90312-1] [PMID: 6547351]
[83]
Janin, J.; Wodak, S. Conformation of amino acid side-chains in proteins. J. Mol. Biol., 1978, 125(3), 357-386.
[http://dx.doi.org/10.1016/0022-2836(78)90408-4] [PMID: 731698]
[84]
Shao, J.; Xu, D.; Tsai, S.N.; Wang, Y.; Ngai, S.M. Computational identification of protein methylation sites through bi-profile Bayes feature extraction. PLoS One, 2009, 4(3) ,e4920
[http://dx.doi.org/10.1371/journal.pone.0004920] [PMID: 19290060]
[85]
Song, J.; Tan, H.; Shen, H.; Mahmood, K.; Boyd, S.E.; Webb, G.I.; Akutsu, T.; Whisstock, J.C. Cascleave: towards more accurate prediction of caspase substrate cleavage sites. Bioinformatics, 2010, 26(6), 752-760.
[http://dx.doi.org/10.1093/bioinformatics/btq043] [PMID: 20130033]
[86]
Jia, C.; He, W.; Zou, Q. DephosSitePred: A High Accuracy Predictor for Protein Dephosphorylation Sites. Comb. Chem. High Throughput Screen., 2017, 20(2), 153-157.
[http://dx.doi.org/10.2174/1386207319666161228155636] [PMID: 28031011]
[87]
Ju, Z.; Wang, S.Y. Predicting lysine lipoylation sites using bi-profile bayes feature extraction and fuzzy support vector machine algorithm. Anal. Biochem., 2018, 561-562, 11-17.
[http://dx.doi.org/10.1016/j.ab.2018.09.007] [PMID: 30218638]
[88]
Ju, Z.; Sun, J.; Li, Y.; Wang, L. Predicting lysine glycation sites using bi-profile bayes feature extraction. Comput. Biol. Chem., 2017, 71, 98-103.
[http://dx.doi.org/10.1016/j.compbiolchem.2017.10.004] [PMID: 29040908]
[89]
Jia, C.Z.; He, W.Y.; Yao, Y.H. OH-PRED: prediction of protein hydroxylation sites by incorporating adapted normal distribution bi-profile Bayes feature extraction and physicochemical properties of amino acids. J. Biomol. Struct. Dyn., 2017, 35(4), 829-835.
[http://dx.doi.org/10.1080/07391102.2016.1163294] [PMID: 26957000]
[90]
Ao, C.; Zou, Q.; Yu, L. RFhy-m2G: Identification of RNA N2-methylguanosine modification sites based on random forest and hybrid features. 2021.S1046-2023(21)00142-0.
[91]
Bairoch, A.; Apweiler, R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res., 2000, 28(1), 45-48.
[http://dx.doi.org/10.1093/nar/28.1.45] [PMID: 10592178]
[92]
Cheng, C.W.; Su, E.C.; Hwang, J.K.; Sung, T.Y.; Hsu, W.L. Predicting RNA-binding sites of proteins using support vector machines and evolutionary information. BMC Bioinformatics, 2008, 9(12), S6.
[http://dx.doi.org/10.1186/1471-2105-9-S12-S6] [PMID: 19091029]
[93]
Wang, K.; Samudrala, R. Incorporating background frequency improves entropy-based residue conservation measures. BMC Bioinformatics, 2006, 7, 385.
[http://dx.doi.org/10.1186/1471-2105-7-385] [PMID: 16916457]
[94]
Ma, X.; Guo, J.; Liu, H.D.; Xie, J.M.; Sun, X. Sequence-based prediction of DNA-binding residues in proteins with conservation and correlation information. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 2012, 9(6), 1766-1775.
[http://dx.doi.org/10.1109/TCBB.2012.106] [PMID: 22868682]
[95]
Zhao, X.; Jiao, Q.; Li, H.; Wu, Y.; Wang, H.; Huang, S.; Wang, G. ECFS-DEA: an ensemble classifier-based feature selection for differential expression analysis on expression profiles. BMC Bioinformatics, 2020, 21(1), 43.
[http://dx.doi.org/10.1186/s12859-020-3388-y] [PMID: 32024464]
[96]
Zhu, X.J. Predicting protein structural classes for low-similarity sequences by evaluating different features. Knowl. Base. Syst., 2019, 163, 787-793.
[http://dx.doi.org/10.1016/j.knosys.2018.10.007]
[97]
Yang, H.; Yang, W.; Dao, F.Y.; Lv, H.; Ding, H.; Chen, W.; Lin, H. A comparison and assessment of computational method for identifying recombination hotspots in Saccharomyces cerevisiae. Brief. Bioinform., 2019, 21(5), 1568-1580.
[http://dx.doi.org/10.1093/bib/bbz123] [PMID: 31633777]
[98]
Liu, K.; Chen, W. iMRM: a platform for simultaneously identifying multiple kinds of RNA modifications. Bioinformatics, 2020, 36(11), 3336-3342.
[http://dx.doi.org/10.1093/bioinformatics/btaa155] [PMID: 32134472]
[99]
Berrhail, F.; Belhadef, H. Genetic algorithm-based feature selection approach for enhancing the effectiveness of similarity searching in ligand-based virtual screening. Curr. Bioinform., 2020, 15(5), 431-444.
[http://dx.doi.org/10.2174/1574893614666191119123935]
[100]
Schaduangrat, N.; Nantasenamat, C.; Prachayasittikul, V.; Shoombuatong, W. ACPred: a computational tool for the prediction and analysis of anticancer peptides. Molecules, 2019, 24(10), 1973.
[http://dx.doi.org/10.3390/molecules24101973] [PMID: 31121946]
[101]
Simeon, S.; Shoombuatong, W.; Anuwongcharoen, N.; Preeyanon, L.; Prachayasittikul, V.; Wikberg, J.E.; Nantasenamat, C. osFP: a web server for predicting the oligomeric states of fluorescent proteins. J. Cheminform., 2016, 8(1), 72.
[http://dx.doi.org/10.1186/s13321-016-0185-8] [PMID: 28053671]
[102]
Win, T.S.; Schaduangrat, N.; Prachayasittikul, V.; Nantasenamat, C.; Shoombuatong, W. PAAP: a web server for predicting antihypertensive activity of peptides. Future Med. Chem., 2018, 10(15), 1749-1767.
[http://dx.doi.org/10.4155/fmc-2017-0300] [PMID: 30039980]
[103]
Peng, H.; Long, F.; Ding, C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell., 2005, 27(8), 1226-1238.
[http://dx.doi.org/10.1109/TPAMI.2005.159] [PMID: 16119262]
[104]
Hasan, M.M.; Manavalan, B.; Shoombuatong, W.; Khatun, M.S.; Kurata, H. i6mA-Fuse: improved and robust prediction of DNA 6 mA sites in the Rosaceae genome by fusing multiple feature representation. Plant Mol. Biol., 2020, 103(1-2), 225-234.
[http://dx.doi.org/10.1007/s11103-020-00988-y] [PMID: 32140819]
[105]
Hasan, M.M.; Manavalan, B.; Shoombuatong, W.; Khatun, M.S.; Kurata, H. i4mC-Mouse: Improved identification of DNA N4-methylcytosine sites in the mouse genome using multiple encoding schemes. Comput. Struct. Biotechnol. J., 2020, 18, 906-912.
[http://dx.doi.org/10.1016/j.csbj.2020.04.001] [PMID: 32322372]
[106]
Hasan, M.M.; Manavalan, B.; Khatun, M.S.; Kurata, H. i4mC-ROSE, a bioinformatics tool for the identification of DNA N4-methylcytosine sites in the Rosaceae genome. Int. J. Biol. Macromol., 2020, 157, 752-758.
[http://dx.doi.org/10.1016/j.ijbiomac.2019.12.009] [PMID: 31805335]
[107]
Du, X. Identification and analysis of cancer diagnosis using probabilistic classification vector machines with feature selection. Curr. Bioinform., 2018, 13(6), 625-632.
[http://dx.doi.org/10.2174/1574893612666170405125637]
[108]
Xu, Z.C.; Feng, P.M.; Yang, H.; Qiu, W.R.; Chen, W.; Lin, H. iRNAD: a computational tool for identifying D modification sites in RNA sequence. Bioinformatics, 2019, 35(23), 4922-4929.
[http://dx.doi.org/10.1093/bioinformatics/btz358] [PMID: 31077296]
[109]
Lin, H. Identifying Sigma70 promoters with novel pseudo nucleotide composition. IEEE/ACM Trans Comput Biol Bioinform, 2019, 16(4), 1316-1321.
[http://dx.doi.org/10.1109/TCBB.2017.2666141]
[110]
Zhang, Z.Y.; Yang, Y.H.; Ding, H.; Wang, D.; Chen, W.; Lin, H. Design powerful predictor for mRNA subcellular location prediction in Homo sapiens. Brief. Bioinform., 2020, 22(1), 526-535.
[http://dx.doi.org/10.1093/bib/bbz177] [PMID: 31994694]
[111]
Tahir, M.; Idris, A. MD-LBP: An efficient computational model for protein subcellular localization from HeLa cell lines using SVM. Curr. Bioinform., 2020, 15(3), 204-211.
[http://dx.doi.org/10.2174/1574893614666190723120716]
[112]
Jiang, Q.; Wang, G.; Jin, S.; Li, Y.; Wang, Y. Predicting human microRNA-disease associations based on support vector machine. Int. J. Data Min. Bioinform., 2013, 8(3), 282-293.
[http://dx.doi.org/10.1504/IJDMB.2013.056078] [PMID: 24417022]
[113]
Ao, C.; Yu, L.; Zou, Q. Prediction of bio-sequence modifications and the associations with diseases. Brief. Funct. Genomics, 2021, 20(1), 1-18.
[http://dx.doi.org/10.1093/bfgp/elaa023] [PMID: 33313647]
[114]
Tao, Z.; Li, Y.; Teng, Z.; Zhao, Y. A method for identifying vesicle transport proteins based on LibSVM and MRMD. Comput. Math. Methods Med., 2020, 2020 ,8926750
[http://dx.doi.org/10.1155/2020/8926750] [PMID: 33133228]
[115]
Wang, S. Immune cell infiltration-based signature for prognosis and immunogenomic analysis in breast cancer. Brief. Bioinform., 2021, 22(2), 2020-2031.
[http://dx.doi.org/10.1093/bib/bbaa026] [PMID: 32141494]
[116]
Chang, C.C.; Lin, C.J. LIBSVM: A Library for Support Vector Machines. ACM Trans. Intell. Syst. Technol., 2011, 2(3)
[http://dx.doi.org/10.1145/1961189.1961199]
[117]
Wei, H.; Liu, B. iCircDA-MF: identification of circRNA-disease associations based on matrix factorization. Brief. Bioinform., 2020, 21(4), 1356-1367.
[http://dx.doi.org/10.1093/bib/bbz057] [PMID: 31197324]
[118]
He, K.M. Deep residual learning for image recognition. 2016 Ieee Conference on Computer Vision and Pattern Recognition (Cvpr), 2016, pp. 770-778.
[http://dx.doi.org/10.1109/CVPR.2016.90]
[119]
Huang, Y.; Zhou, D.; Wang, Y.; Zhang, X.; Su, M.; Wang, C.; Sun, Z.; Jiang, Q.; Sun, B.; Zhang, Y. Prediction of transcription factors binding events based on epigenetic modifications in different human cells. Epigenomics, 2020, 12(16), 1443-1456.
[http://dx.doi.org/10.2217/epi-2019-0321] [PMID: 32921165]
[120]
Wang, X.; Yang, Y.; Liu, J.; Wang, G. The stacking strategy-based hybrid framework for identifying non-coding RNAs. Brief. Bioinform., 2021, bbab023.,
[http://dx.doi.org/10.1093/bib/bbab023] [PMID: 33693454]
[121]
Witten, I.H.; Frank, E.; Hall, M.A. Data mining : Practical machine learning tools and techniques, 3rd ed; Morgan Kaufmann series in data management systemsBurlington, MA; , 2011.
[122]
Tang, H.; Chen, W.; Lin, H. Identification of immunoglobulins using Chou’s pseudo amino acid composition with feature selection technique. Mol. Biosyst., 2016, 12(4), 1269-1275.
[http://dx.doi.org/10.1039/C5MB00883B] [PMID: 26883492]
[123]
Chen, W.; Feng, P.; Liu, T.; Jin, D. Recent advances in machine learning methods for predicting heat shock proteins. Curr. Drug Metab., 2019, 20(3), 224-228.
[http://dx.doi.org/10.2174/1389200219666181031105916] [PMID: 30378494]
[124]
Amanat, S. Identification of lysine carboxylation sites in proteins by integrating statistical moments and position relative features via general PseAAC. Curr. Bioinform., 2020, 15(5), 396-407.
[http://dx.doi.org/10.2174/1574893614666190723114923]
[125]
Cheng, L.; Qi, C.; Zhuang, H.; Fu, T.; Zhang, X. gutMDisorder: a comprehensive database for dysbiosis of the gut microbiota in disorders and interventions. Nucleic Acids Res., 2020, 48(D1), D554-D560.
[http://dx.doi.org/10.1093/nar/gkz843] [PMID: 31584099]
[126]
Cheng, L.; Zhuang, H.; Ju, H.; Yang, S.; Han, J.; Tan, R.; Hu, Y. Exposing the Causal Effect of Body Mass Index on the Risk of Type 2 Diabetes Mellitus: A Mendelian Randomization Study. Front. Genet., 2019, 10, 94.
[http://dx.doi.org/10.3389/fgene.2019.00094] [PMID: 30891058]
[127]
Wei, L.; Liao, M.; Gao, Y.; Ji, R.; He, Z.; Zou, Q. Improved and Promising Identification of Human MicroRNAs by Incorporating a High-Quality Negative Set. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 2014, 11(1), 192-201.
[http://dx.doi.org/10.1109/TCBB.2013.146] [PMID: 26355518]
[128]
Wei, L.; Wan, S.; Guo, J.; Wong, K.K. A novel hierarchical selective ensemble classifier with bioinformatics application. Artif. Intell. Med., 2017, 83, 82-90.
[http://dx.doi.org/10.1016/j.artmed.2017.02.005] [PMID: 28245947]
[129]
Wei, L.; Xing, P.; Zeng, J.; Chen, J.; Su, R.; Guo, F. Improved prediction of protein-protein interactions using novel negative samples, features, and an ensemble classifier. Artif. Intell. Med., 2017, 83, 67-74.
[http://dx.doi.org/10.1016/j.artmed.2017.03.001] [PMID: 28320624]
[130]
Manavalan, B.; Hasan, M.M.; Basith, S.; Gosu, V.; Shin, T.H.; Lee, G. Empirical Comparison and Analysis of Web-Based DNA N4-Methylcytosine Site Prediction Tools. Mol. Ther. Nucleic Acids, 2020, 22, 406-420.
[http://dx.doi.org/10.1016/j.omtn.2020.09.010] [PMID: 33230445]
[131]
Manavalan, B. Computational prediction of species-specific yeast DNA replication origin via iterative feature representation. Brief. Bioinform., 2020, 22(2), 2126-2140.
[PMID: 33232970]
[132]
Basith, S.; Manavalan, B.; Hwan Shin, T.; Lee, G. Machine intelligence in peptide therapeutics: A next-generation tool for rapid disease screening. Med. Res. Rev., 2020, 40(4), 1276-1314.
[http://dx.doi.org/10.1002/med.21658] [PMID: 31922268]
[133]
Liang, P.; Yang, W.; Chen, X.; Long, C.; Zheng, L.; Li, H.; Zuo, Y. Machine Learning of Single-Cell Transcriptome Highly Identifies mRNA Signature by Comparing F-Score Selection with DGE Analysis. Mol. Ther. Nucleic Acids, 2020, 20, 155-163.
[http://dx.doi.org/10.1016/j.omtn.2020.02.004] [PMID: 32169803]
[134]
Su, R.; Liu, X.; Wei, L.; Zou, Q. Deep-Resp-Forest: A deep forest model to predict anti-cancer drug response. Methods, 2019, 166, 91-102.
[http://dx.doi.org/10.1016/j.ymeth.2019.02.009] [PMID: 30772464]
[135]
Wei, L.; Chen, H.; Su, R. M6APred-EL: A Sequence-Based Predictor for Identifying N6-methyladenosine Sites Using Ensemble Learning. Mol. Ther. Nucleic Acids, 2018, 12, 635-644.
[http://dx.doi.org/10.1016/j.omtn.2018.07.004] [PMID: 30081234]
[136]
Zhai, Y.; Chen, Y.; Teng, Z.; Zhao, Y. Identifying Antioxidant Proteins by Using Amino Acid Composition and Protein-Protein Interactions. Front. Cell Dev. Biol., 2020, 8 ,591487
[http://dx.doi.org/10.3389/fcell.2020.591487] [PMID: 33195258]
[137]
Guo, Z.; Wang, P.; Liu, Z.; Zhao, Y. Discrimination of Thermophilic Proteins and Non-thermophilic Proteins Using Feature Dimension Reduction. Front. Bioeng. Biotechnol., 2020, 8 ,584807
[http://dx.doi.org/10.3389/fbioe.2020.584807] [PMID: 33195148]
[138]
Faraggi, E.; Xue, B.; Zhou, Y. Improving the prediction accuracy of residue solvent accessibility and real-value backbone torsion angles of proteins by guided-learning through a two-layer neural network. Proteins, 2009, 74(4), 847-856.
[http://dx.doi.org/10.1002/prot.22193] [PMID: 18704931]
[139]
Ward, J.J.; Sodhi, J.S.; McGuffin, L.J.; Buxton, B.F.; Jones, D.T. Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. J. Mol. Biol., 2004, 337(3), 635-645.
[http://dx.doi.org/10.1016/j.jmb.2004.02.002] [PMID: 15019783]
[140]
Cheng, J. SCRATCH: A protein structure and structural feature prediction server.Nucleic Acids Res, 2005, 33(Web Server issue), W72-6.,
[http://dx.doi.org/10.1093/nar/gki396]
[141]
Hasan, M.M.; Alam, M.A.; Shoombuatong, W.; Deng, H.W.; Manavalan, B.; Kurata, H. NeuroPred-FRL: An interpretable prediction model for identifying neuropeptide using feature representation learning Brief. Bioinform., 2021, bbab167.
[http://dx.doi.org/10.1093/bib/bbab167] [PMID: 33975333]
[142]
Charoenkwan, P.; Chiangjong, W.; Nantasenamat, C.; Hasan, M.M.; Manavalan, B.; Shoombuatong, W. .StackIL6: A stacking ensemble model for improving the prediction of IL-6 inducing peptides. Brief. Bioinform., 2021, bbab172
[http://dx.doi.org/10.1093/bib/bbab172] [PMID: 33963832]
[143]
Lv, H.; Dao, F.Y.; Zulfiqar, H.; Lin, H. DeepIPs: comprehensive assessment and computational identification of phosphorylation sites of SARS-CoV-2 infection using a deep learning-based approach. Bioinformatics, 2020, 36(11), 3350-3356.
[http://dx.doi.org/10.1093/bib/bbab244] [PMID: 32145017]
[144]
Wei, L.; Su, R.; Luan, S.; Liao, Z.; Manavalan, B.; Zou, Q.; Shi, X. Iterative feature representations improve N4-methylcytosine site prediction. Bioinformatics, 2019, 35(23), 4930-4937.
[http://dx.doi.org/10.1093/bioinformatics/btz408] [PMID: 31099381]
[145]
Long, H. Predicting Protein Phosphorylation Sites Based on Deep Learning. Curr. Bioinform., 2020, 15(4), 300-308.
[http://dx.doi.org/10.2174/1574893614666190902154332]
[146]
Guo, C. ExomeHMM: A Hidden Markov Model for Detecting Copy Number Variation Using Whole-Exome Sequencing Data. Curr. Bioinform., 2017, 12(2), 147-155.
[http://dx.doi.org/10.2174/1574893611666160727160757]

Rights & Permissions Print Cite
© 2025 Bentham Science Publishers | Privacy Policy