Generic placeholder image

Current Bioinformatics

Editor-in-Chief

ISSN (Print): 1574-8936
ISSN (Online): 2212-392X

Research Article

Distinguishing Enzymes and Non-enzymes Based on Structural Information with an Alignment Free Approach

Author(s): Lifeng Yang and Xiong Jiao *

Volume 16, Issue 1, 2021

Published on: 24 March, 2020

Page: [44 - 52] Pages: 9

DOI: 10.2174/1574893615666200324134037

Price: $65

Abstract

Background: Knowledge of protein functions is very crucial for the understanding of biological processes. Experimental methods for protein function prediction are of no use to treat the growing amount of protein sequence and structure data.

Objective: To develop some computational techniques for the protein function prediction.

Methods: Based on the residue interaction network features and the motion mode information, an SVM model was constructed and used as the predictor. The role of these features was analyzed and some interesting results were obtained.

Results: An alignment-free method for the classification of enzyme and non-enzyme is developed in this work. There is no single feature that occupies a dominant position in the prediction process. The topological and the information-theoretic residue interaction network features have a better performance. The combination of the fast mode and the slow mode can get a better explanation for the classification result.

Conclusion: The method proposed in this paper can act as a classifier for the enzymes and nonenzymes.

Keywords: Protein descriptors, enzyme, motion mode, residue interaction network, support vector machines, protein function.

Graphical Abstract

[1]
Liao ZJ, Wan SX, He Y, Zou Q. Classification of small GTPases with hybrid protein features and advanced machine learning techniques. Curr Bioinform 2018; 13(5): 492-500.
[http://dx.doi.org/10.2174/1574893612666171121162552]
[2]
Wei L, Xing P, Tang J, Zou Q. PhosPred-RF: a novel sequence-based predictor for phosphorylation sites using sequential information only. IEEE Trans Nanobioscience 2017; 16(4): 240-7.
[http://dx.doi.org/10.1109/TNB.2017.2661756] [PMID: 28166503]
[3]
Cozzetto D, Tramontano A. Relationship between multiple sequence alignments and quality of protein comparative models. Proteins 2005; 58(1): 151-7.
[http://dx.doi.org/10.1002/prot.20284] [PMID: 15495137]
[4]
Benner SA, Chamberlin SG, Liberles DA, Govindarajan S, Knecht L. Functional inferences from reconstructed evolutionary biology involving rectified databases--an evolutionarily grounded approach to functional genomics. Res Microbiol 2000; 151(2): 97-106.
[http://dx.doi.org/10.1016/S0923-2508(00)00123-6 ] [PMID: 10865954]
[5]
Naveed M, Mehboob MZ, Hussain A, Ikram K, Talat A, Zeeshan N. Structural and functional annotation of conserved virulent hypothetical proteins in chlamydia trachomatis: an in-silica approach. Curr Bioinform 2019; 14(4): 344-52.
[http://dx.doi.org/10.2174/1574893613666181107111259]
[6]
Wei LY, Tang JJ, Zou Q. Local-DPP: an improved DNA-binding protein prediction method by exploring local evolutionary information. Inf Sci 2017; 384: 135-44.
[http://dx.doi.org/10.1016/j.ins.2016.06.026]
[7]
Zeng X, Liu L, Lü L, Zou Q. Prediction of potential disease-associated microRNAs using structural perturbation method. Bioinformatics 2018; 34(14): 2425-32.
[http://dx.doi.org/10.1093/bioinformatics/bty112] [PMID: 29490018]
[8]
Strope PK, Moriyama EN. Simple alignment-free methods for protein classification: a case study from G-protein-coupled receptors. Genomics 2007; 89(5): 602-12.
[http://dx.doi.org/10.1016/j.ygeno.2007.01.008] [PMID: 17336495]
[9]
Deshmukh S, Khaitan S, Das D, Gupta M, Wangikar PP. An alignment-free method for classification of protein sequences. Protein Pept Lett 2007; 14(7): 647-57.
[http://dx.doi.org/10.2174/092986607781483804] [PMID: 17897089]
[10]
González-Díaz H, González-Díaz Y, Santana L, Ubeira FM, Uriarte E. Proteomics, networks and connectivity indices. Proteomics 2008; 8(4): 750-78.
[http://dx.doi.org/10.1002/pmic.200700638] [PMID: 18297652]
[11]
Agüero-Chapin G, Pérez-Machado G, Molina-Ruiz R, et al. TI2BioP: topological indices to biopolymers. its practical use to unravel cryptic bacteriocin-like domains. Amino Acids 2011; 40(2): 431-42.
[http://dx.doi.org/10.1007/s00726-010-0653-9] [PMID: 20563611]
[12]
Vishveshwara S, Ghosh A, Hansia P. Intra and inter-molecular communications through protein structure network. Curr Protein Pept Sci 2009; 10(2): 146-60.
[http://dx.doi.org/10.2174/138920309787847590] [PMID: 19355982]
[13]
Yan W, Zhou J, Sun M, Chen J, Hu G, Shen B. The construction of an amino acid network for understanding protein structure and function. Amino Acids 2014; 46(6): 1419-39.
[http://dx.doi.org/10.1007/s00726-014-1710-6] [PMID: 24623120]
[14]
Piovesan D, Minervini G, Tosatto SC. The RING 2.0 web server for high quality residue interaction networks. Nucleic Acids Research 2016; 44(Web Server issue): W367-74.
[15]
Li Z-R, Lin HH, Han LY, Jiang L, Chen X, Chen YZ. PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence. Nucleic Acids Res 2006; 34((Suppl_2)): W32-7.
[http://dx.doi.org/10.1093/nar/gkl305]
[16]
Rao HB, Zhu F, Yang GB, Li ZR, Chen YZ. Update of PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence. Nucleic Acids Res 2011; 39((Suppl_2)): W385-90.
[http://dx.doi.org/10.1093/nar/gkr284]
[17]
Zhang P, Tao L, Zeng X, et al. A protein network descriptor server and its use in studying protein, disease, metabolic and drug targeted networks. Brief Bioinform 2017; 18(6): 1057-70.
[http://dx.doi.org/10.1093/bib/bbw071] [PMID: 27542402]
[18]
Zhang P, Tao L, Zeng X, et al. PROFEAT update: a protein features web server with added facility to compute network descriptors for studying omics-derived networks. J Mol Biol 2017; 429(3): 416-25.
[http://dx.doi.org/10.1016/j.jmb.2016.10.013] [PMID: 27742592]
[19]
Bahar I, Lezon TR, Yang LW, Eyal E. Global dynamics of proteins: bridging between structure and function. Annu Rev Biophys 2010; 39: 23-42.
[http://dx.doi.org/10.1146/annurev.biophys.093008.131258] [PMID: 20192781]
[20]
Bhadra P, Pal D. De novo inference of protein function from coarse-grained dynamics. Proteins 2014; 82(10): 2443-54.
[http://dx.doi.org/10.1002/prot.24609] [PMID: 24862950]
[21]
Hensen U, Meyer T, Haas J, Rex R, Vriend G, Grubmüller H. Exploring protein dynamics space: the dynasome as the missing link between protein structure and function. PLoS One 2012; 7(5): e33931.
[http://dx.doi.org/10.1371/journal.pone.0033931] [PMID: 22606222]
[22]
Bahar I, Rader AJ. Coarse-grained normal mode analysis in structural biology. Curr Opin Struct Biol 2005; 15(5): 586-92.
[http://dx.doi.org/10.1016/j.sbi.2005.08.007] [PMID: 16143512]
[23]
Bahar I, Atilgan AR, Demirel MC, Erman B. Vibrational dynamics of folded proteins: significance of slow and fast motions in relation to function and stability. Phys Rev Lett 1998; 80(12): 2733.
[http://dx.doi.org/10.1103/PhysRevLett.80.2733]
[24]
Haliloglu T, Bahar I, Erman B. Gaussian dynamics of folded proteins. Phys Rev Lett 1997; 79(16): 3090-3.
[http://dx.doi.org/10.1103/PhysRevLett.79.3090]
[25]
Bahar I, Erman B, Jernigan RL, Atilgan AR, Covell DG. Collective motions in HIV-1 reverse transcriptase: examination of flexibility and enzyme function. J Mol Biol 1999; 285(3): 1023-37.
[http://dx.doi.org/10.1006/jmbi.1998.2371] [PMID: 9887265]
[26]
Rader AJ, Bahar I. Folding core predictions from network models of proteins. Polymer (Guildf) 2004; 45(2): 659-68.
[http://dx.doi.org/10.1016/j.polymer.2003.10.080]
[27]
Cai Y-D, Chou K-C. Predicting enzyme subclass by functional domain composition and pseudo amino acid composition. J Proteome Res 2005; 4(3): 967-71.
[http://dx.doi.org/10.1021/pr0500399] [PMID: 15952744]
[28]
Shen H-B, Chou K-C. EzyPred: a top-down approach for predicting enzyme functional classes and subclasses. Biochem Biophys Res Commun 2007; 364(1): 53-9.
[http://dx.doi.org/10.1016/j.bbrc.2007.09.098] [PMID: 17931599]
[29]
Hu L-L, Chen C, Huang T, Cai Y-D, Chou K-C. Predicting biological functions of compounds based on chemical-chemical interactions. PLoS One 2011; 6(12): e29491.
[http://dx.doi.org/10.1371/journal.pone.0029491] [PMID: 22220213]
[30]
Dobson PD, Doig AJ. Distinguishing enzyme structures from non-enzymes without alignments. J Mol Biol 2003; 330(4): 771-83.
[http://dx.doi.org/10.1016/S0022-2836(03)00628-4 ] [PMID: 12850146]
[31]
Jiao X, Chang S, Li CH, Chen WZ, Wang CX. Construction and application of the weighted amino acid network based on energy. Phys Rev E Stat Nonlin Soft Matter Phys 2007; 75(5 Pt 1): 051903.
[http://dx.doi.org/10.1103/PhysRevE.75.051903] [PMID: 17677094]
[32]
Li H, Chang Y-Y, Yang L-W, Bahar I. iGNM 2.0: the Gaussian network model database for biomolecular structural dynamics. Nucleic Acids Res 2016; 44(D1): D415-22.
[http://dx.doi.org/10.1093/nar/gkv1236] [PMID: 26582920]
[33]
Roffo G, Melzi S. Ranking to learn: feature ranking and selection via eigenvector centrality. new frontiers in mining complex patterns, Fifth International workshop, nfMCP 2016.
[34]
Roffo G, Melzi S, Cristani M. Infinite Feature Selection IEEE International Conference on Computer Vision (ICCV) 2015.
[http://dx.doi.org/10.1109/ICCV.2015.478]
[35]
Zou Q, Zeng JC, Cao LJ, Ji RR. A novel features ranking metric with application to scalable visual and bioinformatics data classification. Neurocomputing 2016; 173: 346-54.
[http://dx.doi.org/10.1016/j.neucom.2014.12.123]
[36]
Zhao S, Han YH, Zou Q, Hu QH. Hierarchical support vector machine based structural classification with fused hierarchies. Neurocomputing 2016; 214: 86-92.
[http://dx.doi.org/10.1016/j.neucom.2016.05.072]
[37]
Chang C-C, Lin C-J. LIBSVM: a library for support vector machines. ACM transactions on intelligent systems and technology (TIST) 2011; 2(3): 27.
[http://dx.doi.org/10.1145/1961189.1961199]
[38]
Faruto YL. LIBSVM-farutoUltimateVersion-a toolbox with implements for support vector machines based on libsvm Software Available at http://www. ilovematlab. cn 2009.
[39]
Shervashidze N, Vishwanathan SVN, Petri T, Mehlhorn K, Borgwardt K. Efficient graphlet kernels for large graph comparison. Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, PMLR 5:488-495 2009.
[40]
Johansson FD, Frost O, Retzner C, Dubhashi D. Classifying large graphs with differential privacy. Conference: Modeling Decisions for Artificial Intelligence 2015.
[http://dx.doi.org/10.1007/978-3-319-232409_1]
[41]
Orsini F, Frasconi P, De Raedt L. Graph invariant kernels presented at the IJCAI’15: Proceedings of the 24th International Conference on Artificial Intelligence July 2015; 3756-62.
[42]
Niepert M, Ahmed M, Kutzkov K. Learning convolutional neural networks for graphs. Proceedings of The 33rd International Conference on Machine Learning, PMLR 48: 2014-2023; 2016
[43]
Bai L, Hancock ER. Depth-based complexity traces of graphs. Pattern Recognit 2014; 47(3): 1172-86.
[http://dx.doi.org/10.1016/j.patcog.2013.09.010]
[44]
Shervashidze N, Schweitzer P. Leeuwen EJv, Mehlhorn K, Borgwardt KM. Weisfeiler-lehman graph kernels. J Mach Learn Res 2011; 12: 2539-61.
[45]
Li G, Semerci M, Yener B, Zaki MJ. Effective graph classification based on topological and label attributes. ASA Data Science J 2012; 5(4): 265-83.
[http://dx.doi.org/10.1002/sam.11153]
[46]
Neumann M, Garnett R, Bauckhage C, Kersting K. Propagation kernels: efficient graph kernels from propagated information. Mach Learn 2016; 102(2): 209-45.
[http://dx.doi.org/10.1007/s10994-015-5517-9]
[47]
Such FP, Sah S, Dominguez MA, et al. Robust spatial filtering with graph convolutional neural networks. IEEE J Sel Top Signal Process 2017; 11(6): 884-96.
[http://dx.doi.org/10.1109/JSTSP.2017.2726981]
[48]
Ruiz-Blanco YB, Agüero-Chapin G, García-Hernández E, Álvarez O, Antunes A, Green J. Exploring general-purpose protein features for distinguishing enzymes and non-enzymes within the twilight zone. BMC Bioinformatics 2017; 18(1): 349.
[http://dx.doi.org/10.1186/s12859-017-1758-x] [PMID: 28732462]
[49]
Borgwardt KM, Ong CS, Schonauer S, Vishwanathan SVN, Smola AJ, Kriegel H-P. Protein function prediction via graph kernels. Bioinformatics 2005; 21((Suppl_1)): i47-56.
[http://dx.doi.org/10.1093/bioinformatics/bti1007]
[50]
Munteanu CR, González-Díaz H, Magalhães AL. Enzymes/non-enzymes classification model complexity based on composition, sequence, 3D and topological indices. J Theor Biol 2008; 254(2): 476-82.
[http://dx.doi.org/10.1016/j.jtbi.2008.06.003] [PMID: 18606172]
[51]
Sharif MM, Tharwat A, Hassanien AE, Hefny HA. Enzyme vs. Non-Enzyme Classification Based On Principal Component Analysis And Adaboost Classifier 2016 Ieee International Conference On Computing, Communication And Automation. 2016; 288-93.
[http://dx.doi.org/10.1109/CCAA.2016.7813731]
[52]
Tobi D. Large-scale analysis of the dynamics of enzymes. Proteins 2013; 81(11): 1910-8.
[http://dx.doi.org/10.1002/prot.24335] [PMID: 23737241]

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy