A Novel Prediction of Quaternary Structural Type of Proteins with Gene Ontology

Xuan       Xiao; Wei-Jie       Chen; Wang-Ren       Qiu

doi:10.2174/0929866526666191014144618

Abstract

Background: The information of quaternary structure attributes of proteins is very important because it is closely related to the biological functions of proteins. With the rapid development of new generation sequencing technology, we are facing a challenge: how to automatically identify the four-level attributes of new polypeptide chains according to their sequence information (i.e., whether they are formed as just as a monomer, or as a hetero-oligomer, or a homo-oligomer).

Objective: In this article, our goal is to find a new way to represent protein sequences, thereby improving the prediction rate of protein quaternary structure.

Methods: In this article, we developed a prediction system for protein quaternary structural type in which a protein sequence was expressed by combining the Pfam functional-domain and gene ontology. turn protein features into digital sequences, and complete the prediction of quaternary structure through specific machine learning algorithms and verification algorithm.

Results: Our data set contains 5495 protein samples. Through the method provided in this paper, we classify proteins into monomer, or as a hetero-oligomer, or a homo-oligomer, and the prediction rate is 74.38%, which is 3.24% higher than that of previous studies. Through this new feature extraction method, we can further classify the four-level structure of proteins, and the results are also correspondingly improved.

Conclusion: After the applying the new prediction system, compared with the previous results, we have successfully improved the prediction rate. We have reason to believe that the feature extraction method in this paper has better practicability and can be used as a reference for other protein classification problems.

Keywords: Protein quaternary structure, Pfam, function domain composition, gene ontology, random forest algorithm, Jackknife test.

« Previous Next »

Graphical Abstract

[1] 
Paul, M.; Tavornpanich, S.; Abrial, D.; Gasqui, P.; Charras-Garrido, M.; Thanapongtharm, W.; Xiao, X.; Gilbert, M.; Roger, F.; Ducrot, C. Anthropogenic factors and the risk of highly pathogenic avian influenza H5N1: Prospects from a spatial-based model. Vet. Res.,  2010, 41(3), 28.
[http://dx.doi.org/10.1051/vetres/2009076] [PMID:  20003910] 
[2] 
Zhang, S.W.; Chen, W.; Yang, F.; Pan, Q. Using Chou’s pseudo amino acid composition to predict protein quaternary structure: A sequence-segmented PseAAC approach. Amino Acids,  2008, 35(3), 591-598.
[http://dx.doi.org/10.1007/s00726-008-0086-x] [PMID:  18427713] 
[3] 
Zhang, S.W.; Pan, Q.; Zhang, H.C.; Shao, Z.C.; Shi, J.Y. Prediction of protein homo-oligomer types by pseudo amino acid composition: Approached with an improved feature extraction and Naive Bayes Feature Fusion. Amino Acids,  2006, 30(4), 461-468.
[http://dx.doi.org/10.1007/s00726-006-0263-8] [PMID:  16773245] 
[4] 
Zhang, S.W.; Pan, Q.; Zhang, H.C.; Zhang, Y.L.; Wang, H.Y. Classification of protein quaternary structure with support vector machine. Bioinformatics,  2003, 19(18), 2390-2396.
[http://dx.doi.org/10.1093/bioinformatics/btg331] [PMID:  14668222] 
[5] 
Garian, R. Prediction of quaternary structure from primary structure. Bioinformatics,  2001, 17(6), 551-556.
[http://dx.doi.org/10.1093/bioinformatics/17.6.551] [PMID:  11395433] 
[6] 
Carugo, O. A structural proteomics filter: Prediction of the quaternary structural type of hetero-oligomeric proteins on the basis of their sequences. J. Appl. Cryst.,  2010, 40(6), 986-989.
[http://dx.doi.org/10.1107/S0021889807041076] 
[7] 
Xiao, X.; Wang, P.; Chou, K.C. Predicting the quaternary structure attribute of a protein by hybridizing functional domain composition and pseudo amino acid composition. J. Appl. Cryst.,  2010, 42(2), 169-173.
[http://dx.doi.org/10.1107/S0021889809002751] 
[8] 
Xiao, X.; Wang, P.; Chou, K.C. Quat-2L: A web-server for predicting protein quaternary structural attributes. Mol. Divers.,  2011, 15(1), 149-155.
[http://dx.doi.org/10.1007/s11030-010-9227-8] [PMID:  20148364] 
[9] 
Ashburner, M.; Ball, C.A.; Blake, J.A.; Botstein, D.; Butler, H.; Cherry, J.M.; Davis, A.P.; Dolinski, K.; Dwight, S.S.; Eppig, J.T.; Harris, M.A.; Hill, D.P.; Issel-Tarver, L.; Kasarskis, A.; Lewis, S.; Matese, J.C.; Richardson, J.E.; Ringwald, M.; Rubin, G.M.; Sherlock, G. Gene ontology: Tool for the unification of biology. Nat. Genet.,  2000, 25(1), 25-29.
[http://dx.doi.org/10.1038/75556] [PMID:  10802651] 
[10] 
Harris, M.A.; Clark, J.; Ireland, A.; Lomax, J.; Ashburner, M.; Foulger, R.; Eilbeck, K.; Lewis, S.; Marshall, B.; Mungall, C.; Richter, J.; Rubin, G.M.; Blake, J.A.; Bult, C.; Dolan, M.; Drabkin, H.; Eppig, J.T.; Hill, D.P.; Ni, L.; Ringwald, M.; Balakrishnan, R.; Cherry, J.M.; Christie, K.R.; Costanzo, M.C.; Dwight, S.S.; Engel, S.; Fisk, D.G.; Hirschman, J.E.; Hong, E.L.; Nash, R.S.; Sethuraman, A.; Theesfeld, C.L.; Botstein, D.; Dolinski, K.; Feierbach, B.; Berardini, T.; Mundodi, S.; Rhee, S.Y.; Apweiler, R.; Barrell, D.; Camon, E.; Dimmer, E.; Lee, V.; Chisholm, R.; Gaudet, P.; Kibbe, W.; Kishore, R.; Schwarz, E.M.; Sternberg, P.; Gwinn, M.; Hannick, L.; Wortman, J.; Berriman, M.; Wood, V.; de la Cruz, N.; Tonellato, P.; Jaiswal, P.; Seigfried, T.; White, R.; Gene Ontology, C. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res.,  2004, 32(Database issue), D258-D261.
[PMID:  14681407] 
[11] 
Agapito, G.; Milano, M.; Guzzi, P.H.; Cannataro, M. Extracting cross-ontology weighted association rules from gene ontology annotations. IEEE/ACM Trans. Comput. Biol. Bioinform,  2016, 13(2), 197-208.
[http://dx.doi.org/10.1109/tcbb.2015.2462348] 
[12] 
Peng, J.; Wang, T.; Wang, J.; Wang, Y.; Chen, J. Extending gene ontology with gene association networks. Bioinformatics,  2016, 32(8), 1185-1194.
[http://dx.doi.org/10.1093/bioinformatics/btv712] [PMID:  26644414] 
[13] 
Chabalier, J.; Mosser, J.; Burgun, A. A transversal approach to predict gene product networks from ontology-based similarity. BMC Bioinformatics,  2007, 8, 235.
[http://dx.doi.org/10.1186/1471-2105-8-235] [PMID:  17605807] 
[14] 
Fan, H.; Guo, Z.; Wang, C. Combinations of gene ontology and pathway characterize and predict prognosis genes for recurrence of gastric cancer after surgery. DNA Cell Biol.,  2015, 34(9), 579-587.
[http://dx.doi.org/10.1089/dna.2015.2923] [PMID:  26154702] 
[15] 
Tao, Y.; Sam, L.; Li, J.; Friedman, C.; Lussier, Y.A. Information theory applied to the sparse gene ontology annotation network to predict novel gene function. Bioinformatics,  2007, 23(13), i529-i538.
[http://dx.doi.org/10.1093/bioinformatics/btm195] [PMID:  17646340] 
[16] 
Cibrián-Jaramillo, A.; De la Torre-Bárcena, J.E.; Lee, E.K.; Katari, M.S.; Little, D.P.; Stevenson, D.W.; Martienssen, R.; Coruzzi, G.M.; DeSalle, R. Using phylogenomic patterns and gene ontology to identify proteins of importance in plant evolution. Genome Biol. Evol.,  2010, 2, 225-239.
[http://dx.doi.org/10.1093/gbe/evq012] [PMID:  20624728] 
[17] 
Wan, S.; Mak, M.W.; Kung, S.Y. HybridGO-Loc: Mining hybrid features on gene ontology for predicting subcellular localization of multi-location proteins. PLoS One,  2014, 9(3)e89545
[http://dx.doi.org/10.1371/journal.pone.0089545] [PMID:  24647341] 
[18] 
Wan, S.; Mak, M.W.; Kung, S.Y. R3P-Loc: A compact multi-label predictor using ridge regression and random projection for protein subcellular localization. J. Theor. Biol.,  2014, 360, 34-45.
[http://dx.doi.org/10.1016/j.jtbi.2014.06.031] [PMID:  24997236] 
[19] 
Wan, S.; Mak, M.W.; Kung, S.Y. mPLR-Loc: An adaptive decision multi-label classifier based on penalized logistic regression for protein subcellular localization prediction. Anal. Biochem.,  2015, 473, 14-27.
[http://dx.doi.org/10.1016/j.ab.2014.10.014] [PMID:  25449328] 
[20] 
Chou, K.C.; Cai, Y.D. A new hybrid approach to predict subcellular localization of proteins by incorporating gene ontology. Biochem. Biophys. Res. Commun.,  2003, 311(3), 743-747.
[http://dx.doi.org/10.1016/j.bbrc.2003.10.062] [PMID:  14623335] 
[21] 
Chou, K.C. Some remarks on protein attribute prediction and pseudo amino acid composition. J. Theor. Biol.,  2011, 273(1), 236-247.
[http://dx.doi.org/10.1016/j.jtbi.2010.12.024] [PMID:  21168420] 
[22] 
Li, W.; Godzik, A. Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics,  2006, 22(13), 1658-1659.
[http://dx.doi.org/10.1093/bioinformatics/btl158] [PMID:  16731699] 
[23] 
Chou, K.C. Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology. Curr. Proteomics,  2009, 6(4), 262-274.
[http://dx.doi.org/10.2174/157016409789973707] 
[24] 
Chou, K.C. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins,  2001, 43(3), 246-255.
[http://dx.doi.org/10.1002/prot.1035] [PMID:  11288174] 
[25] 
Chou, K.C.; Cai, Y.D. Predicting protein quaternary structure by pseudo amino acid composition. Proteins,  2003, 53(2), 282-289.
[http://dx.doi.org/10.1002/prot.10500] [PMID:  14517979] 
[26] 
Altschul, S.F.; Madden, T.L.; Schäffer, A.A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D.J. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res.,  1997, 25(17), 3389-3402.
[http://dx.doi.org/10.1093/nar/25.17.3389] [PMID:  9254694] 
[27] 
Hopp, T.P.; Woods, K.R. Prediction of protein antigenic determinants from amino acid sequences. Proc. Natl. Acad. Sci. USA,  1981, 78(6), 3824-3828.
[http://dx.doi.org/10.1073/pnas.78.6.3824] [PMID:  6167991] 
[28] 
Camon, E.; Barrell, D.; Brooksbank, C.; Magrane, M.; Apweiler, R. The Gene Ontology Annotation (GOA) Project--Application of GO in SWISS-PROT, TrEMBL and InterPro. Comp. Funct. Genomics,  2003, 4(1), 71-74.
[http://dx.doi.org/10.1002/cfg.235] [PMID:  18629103] 
[29] 
Camon, E.; Barrell, D.; Lee, V.; Dimmer, E.; Apweiler, R. The Gene Ontology Annotation (GOA) Database--an integrated resource of GO annotations to the UniProt Knowledgebase.In Silico Biol. (Gedrukt); , 2004, 4, pp. (1)5-6.
[PMID:  15089749] 
[30] 
Martucci, D.; Masseroli, M.; Pinciroli, F. Gene ontology application to genomic functional annotation, statistical analysis and knowledge mining. Stud. Health Technol. Inform.,  2004, 102, 108-131.
[PMID:  15853267] 
[31] 
Cheng, X.; Xiao, X.; Chou, K.C. pLoc-mGneg: Predict subcellular localization of Gram-negative bacterial proteins by deep gene ontology learning via general PseAAC. Genomics,  2018, 110(4), 231-239.
[PMID:  28989035] 
[32] 
Cheng, X.; Xiao, X.; Chou, K.C. pLoc-mHum: Predict subcellular localization of multi-location human proteins via general PseAAC to winnow out the crucial GO information. Bioinformatics,  2018, 34(9), 1448-1456.
[http://dx.doi.org/10.1093/bioinformatics/btx711] [PMID:  29106451] 
[33] 
Chou, K.C.; Wu, Z.C.; Xiao, X. iLoc-Euk: A multi-label classifier for predicting the subcellular localization of singleplex and multiplex eukaryotic proteins. PLoS One,  2011, 6(3) e18258
[http://dx.doi.org/10.1371/journal.pone.0018258] [PMID:  21483473] 
[34] 
Chou, K.C.; Shen, H.B. A new method for predicting the subcellular localization of eukaryotic proteins with both single and multiple sites: Euk-mPLoc 2.0. PLoS One,  2010, 5(4) e9931
[http://dx.doi.org/10.1371/journal.pone.0009931] [PMID:  20368981] 
[35] 
Finn, R.D.; Mistry, J.; Schuster-Böckler, B.; Griffiths-Jones, S.; Hollich, V.; Lassmann, T.; Moxon, S.; Marshall, M.; Khanna, A.; Durbin, R.; Eddy, S.R.; Sonnhammer, E.L.; Bateman, A. Pfam: Clans, web tools and services. Nucleic Acids Res.,  2006, 34(Database issue), D247-D251.
[http://dx.doi.org/10.1093/nar/gkj149] [PMID:  16381856] 
[36] 
Harris, C.; Hunter, S. Smart-home technologies were found to support some domains of independent living when ageing at home: Perspectives of older adult consumers’, families, health professionals and service providers. Aust. Occup. Ther. J.,  2016, 63(6), 439-440.
[http://dx.doi.org/10.1111/1440-1630.12323] [PMID:  27981638] 
[37] 
Letunic, I.; Copley, R.R.; Pils, B.; Pinkert, S.; Schultz, J.; Bork, P. SMART 5: Domains in the context of genomes and networks. Nucleic Acids Res.,  2006, 34(Database issue), D257-D260.
[http://dx.doi.org/10.1093/nar/gkj079] [PMID:  16381859] 
[38] 
Marchler-Bauer, A.; Anderson, J.B.; Chitsaz, F.; Derbyshire, M.K.; DeWeese-Scott, C.; Fong, J.H.; Geer, L.Y.; Geer, R.C.; Gonzales, N.R.; Gwadz, M.; He, S.; Hurwitz, D.I.; Jackson, J.D.; Ke, Z.; Lanczycki, C.J.; Liebert, C.A.; Liu, C.; Lu, F.; Lu, S.; Marchler, G.H.; Mullokandov, M.; Song, J.S.; Tasneem, A.; Thanki, N.; Yamashita, R.A.; Zhang, D.; Zhang, N.; Bryant, S.H. CDD: Specific functional annotation with the Conserved Domain Database. Nucleic Acids Res.,  2009, 37(Database issue), D205-D210.
[http://dx.doi.org/10.1093/nar/gkn845] [PMID:  18984618] 
[39] 
Tatusov, R.L.; Fedorova, N.D.; Jackson, J.D.; Jacobs, A.R.; Kiryutin, B.; Koonin, E.V.; Krylov, D.M.; Mazumder, R.; Mekhedov, S.L.; Nikolskaya, A.N.; Rao, B.S.; Smirnov, S.; Sverdlov, A.V.; Vasudevan, S.; Wolf, Y.I.; Yin, J.J.; Natale, D.A. The COG database: An updated version includes eukaryotes. BMC Bioinformatics,  2003, 4, 41.
[http://dx.doi.org/10.1186/1471-2105-4-41] [PMID:  12969510] 
[40] 
Marchler-Bauer, A.; Derbyshire, M.K.; Gonzales, N.R.; Lu, S.; Chitsaz, F.; Geer, L.Y.; Geer, R.C.; He, J.; Gwadz, M.; Hurwitz, D.I.; Lanczycki, C.J.; Lu, F.; Marchler, G.H.; Song, J.S.; Thanki, N.; Wang, Z.; Yamashita, R.A.; Zhang, D.; Zheng, C.; Bryant, S.H. CDD: NCBI’s conserved domain database. Nucleic Acids Res.,  2015, 43(Database issue), D222-D226.
[http://dx.doi.org/10.1093/nar/gku1221] [PMID:  25414356] 
[41] 
Liao, Z.; Wang, X.; Zeng, Y.; Zou, Q. Identification of DEP domain-containing proteins by a machine learning method and experimental analysis of their expression in human HCC tissues. Sci. Rep.,  2016, 6(1), 39655.
[http://dx.doi.org/10.1038/srep39655] [PMID:  28000796] 
[42] 
Kruger, F.A.; Gaulton, A.; Nowotka, M.; Overington, J.P. PPDMs-a resource for mapping small molecule bioactivities from ChEMBL to Pfam-A protein domains. Bioinformatics,  2015, 31(5), 776-778.
[http://dx.doi.org/10.1093/bioinformatics/btu711] [PMID:  25348214] 
[43] 
Ochoa, A.; Storey, J.D.; Llinás, M.; Singh, M. Beyond the E-Value: Stratified statistics for protein domain prediction. PLOS Comput. Biol.,  2015, 11(11) e1004509
[http://dx.doi.org/10.1371/journal.pcbi.1004509] [PMID:  26575353] 
[44] 
Breiman, L. Bagging predictors. Mach. Learn.,  1996, 24(2), 123-140.
[http://dx.doi.org/10.1007/BF00058655] 
[45] 
Loh, W.Y. Classification and regression trees. Wiley Interdiscip. Rev. Data Min. Knowl. Discov.,  2011, 1(1), 14-23.
[http://dx.doi.org/10.1002/widm.8] 
[46] 
Cutler, A.; Cutler, D.R.; Stevens, J.R. Random forests. Mach. Learn.,  2004, 45(1), 157-176.
[47] 
Zhang, C.T.; Chou, K.C. An analysis of protein folding type prediction by seed-propagated sampling and jackknife test. J. Protein Chem.,  1995, 14(7), 583-593.
[http://dx.doi.org/10.1007/BF01886884] [PMID:  8561854] 
[48] 
Chou, K.C.; Shen, H.B. ProtIdent: A web server for identifying proteases and their types by fusing functional domain and sequential evolution information. Biochem. Biophys. Res. Commun.,  2008, 376(2), 321-325.
[http://dx.doi.org/10.1016/j.bbrc.2008.08.125] [PMID:  18774775] 
[49] 
Liu, L.; He, D.; Yang, S.; Xu, Y. Applying chemometrics approaches to model and predict the binding affinities between the human amphiphysin SH3 domain and its peptide ligands. Protein Pept. Lett.,  2010, 17(2), 246-253.
[http://dx.doi.org/10.2174/092986610790226085] [PMID:  20214647] 
[50] 
Lin, H.; Ding, H.; Guo, F.B.; Zhang, A.Y.; Huang, J. Predicting subcellular localization of mycobacterial proteins by using Chou’s pseudo amino acid composition. Protein Pept. Lett.,  2008, 15(7), 739-744.
[http://dx.doi.org/10.2174/092986608785133681] [PMID:  18782071] 
[51] 
Esmaeili, M.; Mohabatkar, H.; Mohsenzadeh, S. Using the concept of Chou’s pseudo amino acid composition for risk type prediction of human papillomaviruses. J. Theor. Biol.,  2010, 263(2), 203-209.
[http://dx.doi.org/10.1016/j.jtbi.2009.11.016] [PMID:  19961864] 
[52] 
Joshi, R.R.; Sekharan, S. Characteristic peptides of protein secondary structural motifs. Protein Pept. Lett.,  2010, 17(10), 1198-1206.
[http://dx.doi.org/10.2174/092986610792231500] [PMID:  20450488] 
[53] 
Zhou, G.P.; Assa-Munt, N. Some insights into protein structural class prediction. Proteins,  2001, 44(1), 57-59.
[http://dx.doi.org/10.1002/prot.1071] [PMID:  11354006] 
[54] 
Zakeri, P.; Moshiri, B.; Sadeghi, M. Prediction of protein submitochondria locations based on data fusion of various features of sequences. J. Theor. Biol.,  2011, 269(1), 208-216.
[http://dx.doi.org/10.1016/j.jtbi.2010.10.026] [PMID:  21040732] 
[55] 
Wang, T.; Yang, J.; Shen, H.B.; Chou, K.C. Predicting membrane protein types by the LLDA algorithm. Protein Pept. Lett.,  2008, 15(9), 915-921.
[http://dx.doi.org/10.2174/092986608785849308] [PMID:  18991767] 
[56] 
Zhou, G.P.; Doctor, K. Subcellular location prediction of apoptosis proteins. Proteins,  2003, 50(1), 44-48.
[http://dx.doi.org/10.1002/prot.10251] [PMID:  12471598] 

Rights & Permissions Print Cite

Article Metrics

8

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/0929866526666191014144618	Print ISSN 0929-8665
Publisher Name Bentham Science Publisher	Online ISSN 1875-5305

Protein & Peptide Letters

A Novel Prediction of Quaternary Structural Type of Proteins with Gene Ontology

Abstract Play Pause

Graphical Abstract

Related Journals

Related Books

Abstract