Prediction of the Disordered Regions of Intrinsically Disordered Proteins Based on the Molecular Functions

WeiXia       Xie; Yong    E.   Feng

doi:10.2174/0929866526666190226160629

Abstract

Background: Intrinsically disordered proteins lack a well-defined three dimensional structure under physiological conditions while possessing the essential biological functions. They take part in various physiological processes such as signal transduction, transcription and posttranslational modifications and etc. The disordered regions are the main functional sites for intrinsically disordered proteins. Therefore, the research of the disordered regions has become a hot issue.

Objective: In this paper, our motivation is to analysis of the features of disordered regions with different molecular functions and predict of different disordered regions using valid features.

Methods: In this article, according to the different molecular function, we firstly divided intrinsically disordered proteins into six classes in DisProt database. Then, we extracted four features using bioinformatics methods, namely, Amino Acid Index (AAIndex), codon frequency (Codon), three kinds of protein secondary structure compositions (3PSS) and Chemical Shifts (CSs), and used these features to predict the disordered regions of the different functions by Support Vector Machine (SVM).

Results: The best overall accuracy was 99.29% using the chemical shift (CSs) as feature. In feature fusion, the overall accuracy can reach 88.70% by using CSs+AAIndex as features. The overall accuracy was up to 86.09% by using CSs+AAIndex+Codon+3PSS as features.

Conclusion: We predicted and analyzed the disordered regions based on the molecular functions. The results showed that the prediction performance can be improved by adding chemical shifts and AAIndex as features, especially chemical shifts. Moreover, the chemical shift was the most effective feature in the prediction. We hoped that our results will be constructive for the study of intrinsically disordered proteins.

Keywords: Intrinsically disordered proteins, disordered regions, amino acid index, codon, protein secondary structure, chemical shifts, support vector machine.

« Previous Next »

Graphical Abstract

[1] 
Huang, Y.Q.; Liu, Z.R. Intrinsically disordered proteins: The new sequence-structure-function relation. Wuli Huaxue Xuebao,  2010, 19, 26-59.
[2] 
Uversky, V.N. Introduction to intrinsically disordered proteins (IDPs). Chem. Rev.,  2014, 114(13), 6557-6560.
[http://dx.doi.org/10.1021/cr500288y] [PMID: 25004990] 
[3] 
Li, J.; Feng, Y.; Wang, X.; Li, J.; Liu, W.; Rong, L.; Bao, J. An overview of predictors for intrinsically disordered proteins over 2010-2014. Int. J. Mol. Sci.,  2015, 16(10), 23446-23462.
[http://dx.doi.org/10.3390/ijms161023446] [PMID: 26426014] 
[4] 
Tompa, P. Intrinsically unstructured proteins. Trends Biochem. Sci.,  2002, 27(10), 527-533.
[http://dx.doi.org/10.1016/S0968-0004(02)02169-2] [PMID: 12368089] 
[5] 
Dunker, A.K.; Oldfield, C.J.; Meng, J.; Romero, P.; Yang, J.Y.; Chen, J.W.; Vacic, V.; Obradovic, Z.; Uversky, V.N. The unfoldomics decade: An update on intrinsically disordered proteins. BMC Genomics,  2008, 9(Suppl. 2), S1.
[http://dx.doi.org/10.1186/1471-2164-9-S2-S1] [PMID: 18831774] 
[6] 
Uversky, V.N.; Dunker, A.K. Understanding protein non-folding. Biochim. Biophys. Acta,  2010, 1804(6), 1231-1264.
[http://dx.doi.org/10.1016/j.bbapap.2010.01.017] [PMID: 20117254] 
[7] 
He, B.; Wang, K.; Liu, Y.; Xue, B.; Uversky, V.N.; Dunker, A.K. Predicting intrinsic disorder in proteins: An overview. Cell Res.,  2009, 19(8), 929-949.
[http://dx.doi.org/10.1038/cr.2009.87] [PMID: 19597536] 
[8] 
Cozzetto, D.; Jones, D.T. The contribution of intrinsic disorder prediction to the elucidation of protein function. Curr. Opin. Struct. Biol.,  2013, 23(3), 467-472.
[http://dx.doi.org/10.1016/j.sbi.2013.02.001] [PMID: 23466039] 
[9] 
Wang, J.; Cao, Z.; Li, S. Molecular dynamics simulations of intrinsically disordered proteins in human diseases. Curr. Comput. Aided. Drug. Des.,  2009, 5, 280-287.
[10] 
Leone, M.; Anna, M.F.; Vincenzi, M.; Accardo, A.; Ringhieri, P.; Tesauro, D.; Carrière, F.; Filomena, R. Conformational disorder in phosphopeptides: Solution studies by CD and NMR techniques. Peptidomics,  2014, 1, 14-21.
[http://dx.doi.org/10.2478/ped-2014-0001] 
[11] 
Bordoli, L.; Kiefer, F.; Schwede, T. Assessment of disorder predictions in CASP7. Proteins,  2007, 69(Suppl. 8), 129-136.
[http://dx.doi.org/10.1002/prot.21671] [PMID: 17680688] 
[12] 
Monastyrskyy, B.; Fidelis, K.; Tramontano, A.; Kryshtafovych, A. Evaluation of residue-residue contact predictions in CASP9. Proteins,  2011, 79(Suppl. 10), 119-125.
[http://dx.doi.org/10.1002/prot.23160] [PMID: 21928322] 
[13] 
Jones, D.T.; Ward, J.J. Prediction of disordered regions in proteins from position specific score matrices. Proteins,  2003, 53(Suppl. 6), 573-578.
[http://dx.doi.org/10.1002/prot.10528] [PMID: 14579348] 
[14] 
Vullo, A.; Bortolami, O.; Pollastri, G.; Tosatto, S.C.E. Spritz: A server for the prediction of intrinsically disordered regions in protein sequences using kernel machines. Nucleic Acids Res,  2006, 34(Web
Server issue), W164-168.
[http://dx.doi.org/10.1093/nar/gkl166] [PMID: 16844983] 
[15] 
Ishida, T.; Kinoshita, K. PrDOS: Prediction of disordered protein
regions from amino acid sequenc. Nucleic Acids Res,  2007, 35(Web
Server), W460-W464.
[16] 
Ishida, T.; Kinoshita, K. Prediction of disordered regions in proteins based on the meta approach. Bioinformatics,  2008, 24(11), 1344-1348.
[http://dx.doi.org/10.1093/bioinformatics/btn195] [PMID: 18426805] 
[17] 
Mizianty, M.J.; Stach, W.; Chen, K.; Kedarisetti, K.D.; Disfani, F.M.; Kurgan, L. Improved sequence-based prediction of disordered regions with multilayer fusion of multiple information sources. Bioinformatics,  2010, 26(18), i489-i496.
[http://dx.doi.org/10.1093/bioinformatics/btq373] [PMID: 20823312] 
[18] 
Zhang, T.; Faraggi, E.; Xue, B.; Dunker, A.K.; Uversky, V.N.; Zhou, Y. SPINE-D: Accurate prediction of short and long disordered regions by a single neural-network based method. J. Biomol. Struct. Dyn.,  2012, 29(4), 799-813.
[http://dx.doi.org/10.1080/073911012010525022] [PMID: 22208280] 
[19] 
Piovesan, D.; Tabaro, F.; Mičetić, I.; Necci, M.; Quaglia, F.; Oldfield, C.J.; Aspromonte, M.C.; Davey, N.E.; Davidović, R.; Dosztányi, Z.; Elofsson, A.; Gasparini, A.; Hatos, A.; Kajava, A.V.; Kalmar, L.; Leonardi, E.; Lazar, T.; Macedo-Ribeiro, S.; Macossay-Castillo, M.; Meszaros, A.; Minervini, G.; Murvai, N.; Pujols, J.; Roche, D.B.; Salladini, E.; Schad, E.; Schramm, A.; Szabo, B.; Tantos, A.; Tonello, F.; Tsirigos, K.D.; Veljković, N.; Ventura, S.; Vranken, W.; Warholm, P.; Uversky, V.N.; Dunker, A.K.; Longhi, S.; Tompa, P.; Tosatto, S.C.E. DisProt 7.0: A major update of the database of disordered proteins. Nucleic Acids Res.,  2017, 45(D1), D219-D227.
[http://dx.doi.org/10.1093/nar/gkw1056] [PMID: 27899601] 
[20] 
McGuffin, L.J.; Bryson, K.; Jones, D.T. The PSIPRED protein structure prediction server. Bioinformatics,  2000, 16(4), 404-405.
[http://dx.doi.org/10.1093/bioinformatics/16.4.404] [PMID: 10869041] 
[21] 
Benson, D.A.; Boguski, M.; Lipman, D.J.; Ostell, J.; Lipman, D.J.; Ostell, J.; Sayers, E.W. GenBank. Nucleic Acids Res.,  1996, 24(1), 1-5.
[http://dx.doi.org/10.1093/nar/24.1.1] [PMID: 8594554] 
[22] 
Lin, H.; Chen, W. Prediction of thermophilic proteins using feature selection technique. J. Microbiol. Methods,  2011, 84(1), 67-70.
[http://dx.doi.org/10.1016/j.mimet.2010.10.013] [PMID: 21044646] 
[23] 
Liu, T.; Jia, C. A high-accuracy protein structural class prediction algorithm using predicted secondary structural information. J. Theor. Biol.,  2010, 267(3), 272-275.
[http://dx.doi.org/10.1016/j.jtbi.2010.09.007] [PMID: 20831876] 
[24] 
Nanni, L.; Lumini, A.; Brahnam, S. An empirical study of different approaches for protein classification. ScientificWorldJournal,  2014, 2014236717
[http://dx.doi.org/10.1155/2014/236717] [PMID: 25028675] 
[25] 
Saha, I.; Zubek, J.; Klingström, T.; Forsberg, S.; Wikander, J.; Kierczak, M.; Maulik, U.; Plewczynski, D. Ensemble learning prediction of protein-protein interactions using proteins functional annotations. Mol. Biosyst.,  2014, 10(4), 820-830.
[http://dx.doi.org/10.1039/c3mb70486f] [PMID: 24469380] 
[26] 
Shien, D.M.; Lee, T.Y.; Chang, W.C.; Hsu, J.B.; Horng, J.T.; Hsu, P.C.; Wang, T.Y.; Huang, H.D. Incorporating structural characteristics for identification of protein methylation sites. J. Comput. Chem.,  2009, 30(9), 1532-1543.
[http://dx.doi.org/10.1002/jcc.21232] [PMID: 19263424] 
[27] 
Song, J.; Tan, H.; Mahmood, K.; Law, R.H.; Buckle, A.M.; Webb, G.I.; Akutsu, T.; Whisstock, J.C. Prodepth: Predict residue depth by support vector regression approach from protein sequences only. PLoS One,  2009, 4(9)e7072
[http://dx.doi.org/10.1371/journal.pone.0007072] [PMID: 19759917] 
[28] 
Tamura, T.; Akutsu, T. Subcellular location prediction of proteins using support vector machines with alignment of block sequences utilizing amino acid composition. BMC Bioinformatics,  2007, 8, 466.
[http://dx.doi.org/10.1186/1471-2105-8-466] [PMID: 18047679] 
[29] 
Xiao, X.; Xu, Z. C.; Qiu, W.R.; Wang, P.; Ge, H.T.; Chou, K.C. iPSW (2L)-PseKNC: A two-layer predictor for identifying
promoters and their strength by hybridfeatures via pseudo K-tuple
	nucleotide composition Genomics,  2018, 12, 001.
[30] 
Zhang, J.; Feng, P.; Lin, H.; Chen, W. Identifying RNA N6-methyladenosne sites in Escherichia coli genome. Front. Microbiol.,  2018, 9, 955.
[http://dx.doi.org/10.3389/fmicb.2018.00955] [PMID: 29867860] 
[31] 
Chen, W.; Yang, H.; Feng, P.; Ding, H.; Lin, H. iDNA4mC: Identifying DNA N4-methylcytosine sites based on nucleotide chemical properties. Bioinformatics,  2017, 33(22), 3518-3523.
[http://dx.doi.org/10.1093/bioinformatics/btx479] [PMID: 28961687] 
[32] 
Li, D.; Ju, Y.; Zou, Q. Protein folds prediction with hierarchical structured SVM. Curr. Proteomics,  2016, 13(2), 79-85.
[http://dx.doi.org/10.2174/157016461302160514000940] 
[33] 
Dao, F.Y.; Lv, H.; Wang, F.; Feng, C.Q.; Ding, H.; Chen, W.; Lin, H. Identify origin of replication in Sccharomyces cerevisiae using two step feature selection technique. Bioinformatics,  2019, 35(12), 2075-2083.
[34] 
Chang, C.C.; Lin, C.J. A library for support vector machines. ACMT. Intel. Syst. Tech,  2011, 2, 21-27.
[35] 
Kawashima, S.; Kanehisa, M. AAindex: Amino acid index database. Nucleic Acids Res.,  2000, 28(1), 374.
[http://dx.doi.org/10.1093/nar/28.1.374] [PMID: 10592278] 
[36] 
Shi, W.; Punta, M.; Bohon, J.; Sauder, J.M.; D’Mello, R.; Sullivan, M.; Toomey, J.; Abel, D.; Lippi, M.; Passerini, A.; Frasconi, P.; Burley, S.K.; Rost, B.; Chance, M.R. Characterization of metalloproteins by high-throughput X-ray absorption spectroscopy. Genome Res.,  2011, 21(6), 898-907.
[http://dx.doi.org/10.1101/gr.115097.110] [PMID: 21482623] 
[37] 
Wishart, D.S.; Sykes, B.D.; Richards, F.M. Relationship between nuclear magnetic resonance chemical shift and protein secondary structure. J. Mol. Biol.,  1991, 222(2), 311-333.
[http://dx.doi.org/10.1016/0022-2836(91)90214-Q] [PMID: 1960729] 
[38] 
Fan, G.L.; Li, Q.Z. Predicting protein submitochondria locations by combining different descriptors into the general form of Chou’s pseudo amino acid composition. Amino Acids,  2012, 43(2), 545-555.
[http://dx.doi.org/10.1007/s00726-011-1143-4] [PMID: 22102053] 
[39] 
Feng, C.Q.; Zhang, Z.Y.; Zhu, X.J.; Lin, Y.; Chen, W.; Tang, H.; Lin, H. iTerm-PseKNC: A sequence-based tool for predicting bacterial transcriptional terminators. Bioinformatics,  2019, 35(9), 1469-1477.
[PMID: 30247625] 
[40] 
Zhu, X.J.; Feng, C.Q.; Lai, H.Y.; Chen, W.; Hao, L. Predicting protein structural classes for low-similarity sequences by evaluating different features. Knowl. Base. Syst.,  2019, 163, 787-793.
[http://dx.doi.org/10.1016/j.knosys.2018.10.007] 
[41] 
Yang, H.; Lv, H.; Ding, H.; Chen, W.; Lin, H. iRNA-2OM: A sequence-based predictor for identifying 2′-O-methylation sites in Homo sapiens. J. Comput. Biol.,  2018, 25(11), 1266-1277.
[http://dx.doi.org/10.1089/cmb.2018.0004] [PMID: 30113871] 
[42] 
Tang, H.; Zhao, Y.W.; Zou, P.; Zhang, C.M.; Chen, R.; Huang, P.; Lin, H. HBPred: A tool to identify growth hormone-binding proteins. Int. J. Biol. Sci.,  2018, 14(8), 957-964.
[http://dx.doi.org/10.7150/ijbs.24174] [PMID: 29989085] 
[43] 
Yang, W.; Zhu, X.J.; Huang, J.; Ding, H.; Lin, H. A brief survey of machine learning methods in protein sub-Golgi localization. Curr. Bioinform.,  2019, 14(3), 234-240.
[44] 
Esmaeili, M.; Mohabatkar, H.; Mohsenzadeh, S. Using the concept of Chou’s pseudo amino acid composition for risk type prediction of human papillomaviruses. J. Theor. Biol.,  2010, 263(2), 203-209.
[http://dx.doi.org/10.1016/j.jtbi.2009.11.016] [PMID: 19961864] 
[45] 
Chen, Y.L.; Li, Q.Z.; Zhang, L.Q. Using increment of diversity to predict mitochondrial proteins of malaria parasite: Integrating pseudo-amino acid composition and structural alphabet. Amino Acids,  2012, 42(4), 1309-1316.
[http://dx.doi.org/10.1007/s00726-010-0825-7] [PMID: 21191803] 
[46] 
Chou, K.C. Some remarks on protein attribute prediction and pseudo amino acid composition. J. Theor. Biol.,  2011, 273(1), 236-247.
[http://dx.doi.org/10.1016/j.jtbi.2010.12.024] [PMID: 21168420] 
[47] 
Kou, G.; Feng, Y. Identify five kinds of simple super-secondary structures with quadratic discriminant algorithm based on the chemical shifts. J. Theor. Biol.,  2015, 380, 392-398.
[http://dx.doi.org/10.1016/j.jtbi.2015.06.006] [PMID: 26087283] 
[48] 
Hayat, M.; Khan, A. Discriminating outer membrane proteins with Fuzzy K-nearest Neighbor algorithms based on the general form of Chou’s PseAAC. Protein Pept. Lett.,  2012, 19(4), 411-421.
[http://dx.doi.org/10.2174/092986612799789387] [PMID: 22185508] 
[49] 
Feng, P.; Yang, H.; Ding, H.; Lin, H.; Chen, W.; Chou, K.C. iDNA6mA-PseKNC: Identifying DNA N(6)-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC. Genomics,  2019, 11(1), 96-102.
[50] 
Chen, W.; Feng, P.; Liu, T. Jin, D. Recent advances in machine learning methods for predicting heat shock proteins. Curr. Drug Metab.,  2019, 20(3), 224-228.
[51] 
Chen, X.X.; Tang, H.; Li, W.C.; Wu, H.; Chen, W.; Ding, H.; Lin, H. Wu, H.; Chen, W.; Ding, H.; Lin, H. Identification of bacterial cell wall lyases via pseudo amino acid composition. BioMed Res. Int.,  2016, 2016 1654623
[PMID: 27437396] 
[52] 
Yang, H.; Tang, H.; Chen, X.X.; Zhang, C.J.; Zhu, P.P.; Ding, H.; Chen, W.; Lin, H.; Zhang, C.J.; Zhu, P.P.; Ding, H.; Chen, W.; Lin, H. Identification of secretory proteins in Mycobacterium tuberculosis using pseudo amino acid composition. BioMed Res. Int.,  2016, 2016 5413903
[http://dx.doi.org/10.1155/2016/5413903] [PMID: 27597968] 
[53] 
Feng, P.M.; Chen, W.; Lin, H.; Chou, K.C. iHSP-PseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition. Anal. Biochem.,  2013, 442(1), 118-125.
[http://dx.doi.org/10.1016/j.ab.2013.05.024] [PMID: 23756733] 
[54] 
Zhang, T.; Tan, P.; Wang, L.; Jin, N.; Li, Y.; Zhang, L.; Yang, H.; Hu, Z.; Zhang, L.; Hu, C.; Li, C.; Qian, K.; Zhang, C.; Huang, Y.; Li, K.; Lin, H.; Wang, D. RNALocate: A resource for RNA subcellular localizations. Nucleic Acids Res.,  2017, 45(D1), D135-D138.
[PMID: 27543076] 
[55] 
Liang, Z.Y.; Lai, H.Y.; Yang, H.; Zhang, C.J.; Yang, H.; Wei, H.H.; Chen, X.X.; Zhao, Y.W.; Su, Z.D.; Li, W.C.; Deng, E.Z.; Tang, H.; Chen, W.; Lin, H. Pro54DB: A database for experimentally verified sigma-54 promoters. Bioinformatics,  2017, 33(3), 467-469.
[PMID: 28171531] 
[56] 
Zou, Q.; Li, X.B.; Jiang, W.R.; Lin, Z.Y.; Li, G.L.; Chen, K. Survey of MapReduce frame operation in bioinformatics. Brief. Bioinform.,  2014, 15(4), 637-647.
[http://dx.doi.org/10.1093/bib/bbs088] [PMID: 23396756] 
[57] 
Zou, Q. Latest computational techniques for big data era bioinformatics problems. Curr. Genomics,  2017, 18(4), 305.
[http://dx.doi.org/10.2174/138920291804170726143423] [PMID: 29081685] 

Rights & Permissions Print Cite

Article Metrics

17

2

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/0929866526666190226160629	Print ISSN 0929-8665
Publisher Name Bentham Science Publisher	Online ISSN 1875-5305

Protein & Peptide Letters

Prediction of the Disordered Regions of Intrinsically Disordered Proteins Based on the Molecular Functions

Abstract Play Pause

Graphical Abstract

Related Journals

Related Books

Related Articles

Abstract