Generic placeholder image

Letters in Organic Chemistry

Editor-in-Chief

ISSN (Print): 1570-1786
ISSN (Online): 1875-6255

Review Article

Application of Machine Learning Techniques to Predict Protein Phosphorylation Sites

Author(s): Shengli Zhang*, Xian Li, Chengcheng Fan, Zhehui Wu and Qian Liu

Volume 16, Issue 4, 2019

Page: [247 - 257] Pages: 11

DOI: 10.2174/1570178615666180907150928

Price: $65

Abstract

Protein phosphorylation is one of the most important post-translational modifications of proteins. Almost all processes that regulate the life activities of an organism as well as almost all physiological and pathological processes are involved in protein phosphorylation. In this paper, we summarize specific implementation and application of the methods used in protein phosphorylation site prediction such as the support vector machine algorithm, random forest, Jensen-Shannon divergence combined with quadratic discriminant analysis, Adaboost algorithm, increment of diversity with quadratic discriminant analysis, modified CKSAAP algorithm, Bayes classifier combined with phosphorylation sequences enrichment analysis, least absolute shrinkage and selection operator, stochastic search variable selection, partial least squares and deep learning. On the basis of this prediction, we use k-nearest neighbor algorithm with BLOSUM80 matrix method to predict phosphorylation sites. Firstly, we construct dataset and remove the redundant set of positive and negative samples, that is, removal of protein sequences with similarity of more than 30%. Next, the proposed method is evaluated by sensitivity (Sn), specificity (Sp), accuracy (ACC) and Mathew’s correlation coefficient (MCC) these four metrics. Finally, tenfold cross-validation is employed to evaluate this method. The result, which is verified by tenfold cross-validation, shows that the average values of Sn, Sp, ACC and MCC of three types of amino acid (serine, threonine, and tyrosine) are 90.44%, 86.95%, 88.74% and 0.7742, respectively. A comparison with the predictive performance of PhosphoSVM and Musite reveals that the prediction performance of the proposed method is better, and it has the advantages of simplicity, practicality and low time complexity in classification.

Keywords: Phosphorylation site prediction, machine learning, k-nearest neighbor, BLOSUM80, amino acid, CKSAAP, algorithm.

Graphical Abstract

[1]
Beausoleil, S.A.; Villén, J.; Gerber, S.A.; Rush, J.; Gygi, S.P. Nat. Biotechnol., 2006, 24(10), 1285-1292.
[2]
Aponte, A.M.; Phillips, D.; Harris, R.A.; Blinova, K.; French, S.; Johnson, D.T.; Balaban, R.S. Methods Enzymol., 2009, 457, 63-80.
[3]
Kraft, C.; Herzog, F.; Gieffers, C. EMBO J., 2003, 22, 6598-6609.
[4]
Blom, N.; Gammcltoft, S.; Rrunak, S. J. Mol. Biol., 1999, 294(40), 1351-1362.
[5]
Dinkel, H.; Chica, C.; Via, A.; Gould, C.M.; Jensen, L.J.; Gibson, T.J. Nucleic Acids Res., 2011, 39(Suppl. 1), D261-D267.
[6]
Heazlewood, J. Nucleic Acids Res., 2008, 36, 1015-1021.
[7]
Gnad, F. Genome Biol., 2007, 8, R250.
[8]
Gnad, F. Nucleic Acids Res., 2011, 39, 253-260.
[9]
Blom, N. Proteomics, 2004, 4, 1633-1649.
[10]
Xue, Y. Nucleic Acids Res., 2005, 33, 184-187.
[11]
Wong, Y. Nucleic Acids Res., 2007, 35, 588-594.
[12]
Miller, M.L. Sci. Signal., 2008, 1, 2.
[13]
Gao, J.; Xu, D. BMC Bioinformatics, 2010, 11(S12), S9.
[14]
Gao, J.; Thelen, J.J.; Dunker, A.K.; Xu, D. Mol. Cell. Proteomics, 2010, 12, 2586-2600.
[15]
Wang, D.L.; Zeng, S.; Xu, C.H.; Qiu, W.R.; Liang, Y.C.; Trupti, J.; Xu, D. Bioinformatics, 2017, 33(24), 3909-3916.
[16]
Wei, L.; Xing, P.; Tang, J. IEEE Trans. Nanobioscience, 2017, 99, 1-1.
[17]
Jia, C.; He, W.; Zou, Q. Comb. Chem. High Throughput Screen., 2017, 20(2), 153.
[18]
Wan, Y.; Cripps, D.; Thomas, S. J. Proteome Res., 2008, 7(7), 2803-2811.
[19]
Saini, H.; Raicar, G.; Sharma, A. J. Theor. Biol., 2015, 380, 291-298.
[20]
Feng, P.M.; Chen, W.; Lin, H. Anal. Biochem., 2013, 442(1), 118-125.
[21]
Kim, J.H.; Lee, J.; Oh, B.; Kim, K.; Koh, I. Bioinformatics, 2004, 20(17), 3179-3184.
[22]
Huang, S.Y.; Shi, S.P.; Qiu, J.D. J. Mol. Graph. Model., 2015, 56, 84.
[23]
Zhao, Y.W.; Lai, H.Y.; Tang, H.; Chen, W.; Lin, H. Sci. Rep., 2016, 6, 34817.
[24]
Yang, Q.B. New Tools for Rice Phosphorylation Site Prediction Based on SVM [D]. Fujian Agriculture and Forestry University, 2012.
[25]
Fan, R.E.; Chen, P.H.; Lin, C.J. J. Mach. Learn. Res., 2005, 6, 1889-1918.
[26]
Wang, X.; Yan, R.; Song, J. Sci. Rep., 2016, 6, 23510.
[27]
Chen, J.; Ibrahim, M.; Kumar, R. IEEE Trans. Autom. Sci. Eng., 2017, 99, 1-11.
[28]
Zhang, Y.; Ding, C.; Lu, J. Int. Conf. Pattern Recog, 2012, 1479, pp. 1549-1552.
[29]
Ismail, H.D.; Jones, A.; Kim, J.H. BioMed Res. Int., 2016, 3281590.
[30]
Trost, B.; Kusalik, A. Bioinformatics, 2013, 29(6), 686.
[31]
Breiman, L. Mach. Learn., 2001, 45, 5-32.
[32]
Cutler, D.R.; Edwards, T.C.; Beard, K.H.; Cutler, A.; Hess, K.T. Ecology, 2007, 88(11), 2783-2792.
[33]
Fan, W.; Xu, X.; Shen, Y.; Feng, H.; Li, A.; Wang, M. Amino Acids, 2014, 46(4), 1069-1078.
[34]
He, Z.S.; Shi, X.H.; Kong, X.Y. Protein Pept. Lett., 2012, 19(1), 70-78.
[35]
Freund, Y.; Schipare, R.E. J. Comput. Syst. Sci., 1995, 55, 23-37.
[36]
Freund, Y.; Schipare, R.E. Int. Conf. Machine Learning, 1996, pp. 148-156.
[37]
Cai, J.J. Study on Protein Phosphorylation Site Prediction and Rule Extraction; Chinese Academy of Sciences Graduate School of Postgraduate Studies, 2016.
[38]
Feng, P.; Wang, Z.; Yu, X. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 2017, 99, 1-1.
[39]
Zhang, L.; Luo, L. Nucleic Acids Res., 2003, 31(21), 6214-6220.
[40]
Lin, S.; Song, Q.; Tao, H. Sci. Rep., 2015, 5, 11940.
[41]
Li, Z.; Zhao, Y. Adv. Exp. Med. Biol., 2015, 827(827), 275.
[42]
Wang, X.; Yan, R.; Song, J. Sci. Rep., 2016, 6, 23510.
[43]
Pershad, Y.; Govindan, S.; Hara, A.K. Diagnostics, 2017, 7(3), 50.
[44]
Dyrka, W.; Nebel, J.C. BMC Bioinformatics, 2009, 10(1), 323.
[45]
Carrasco, R.C.; Oncina, J. Grammatical Inference and Applications; Springer: Berlin, 1994.
[46]
Dupont, P. Grammatical Interference: Learning Syntax from Sentences; Springer: Berlin, 1996.
[47]
Datta, S.; Mukhopadhyay, S. Plos One, 2015, 10(4), e0122294.
[48]
Fan, W.; Xu, X.; Shen, Y. Amino Acids, 2014, 46(4), 1069-1078.
[49]
Dou, Y.; Yao, B.; Zhang, C. Methods Mol. Biol., 2017, 1484.
[50]
Zhao, Y.W.; Lai, H.Y.; Tang, H. Sci. Rep., 2016, 6, 34817.
[51]
Li, T.; Du, P.; Xu, N. PLoS One, 2010, 5(11), e15411.
[52]
Harris, M.A.; Clark, J.; Ireland, A. Nucleic Acids Res, 2004, 32(suppl_1), D258-D261.
[53]
Von, M.C.; Huynen, M.; Jaeggi, D. Nucleic Acids Res., 2003, 31(1), 258.
[54]
Peng, H.; Long, F.; Ding, C. IEEE Trans. Pattern Anal. Mach. Intell., 2005, 27(8), 1226-1238.
[55]
Peng, C.; Wang, M.; Shen, Y. PLoS One, 2013, 8(10), e78197.
[56]
Wang, M.; Chen, X.; Zhang, M. BMC Proc., 2009, 3(S7), 1-5.
[57]
Zhang, H.; Wang, M.; Chen, X. BMC Bioinformatics, 2009, 10(1), 130.
[58]
Miller, M.L.; Blom, N. Methods Mol. Biol., 2009, 527, 299-310.
[59]
Hjerrild, M.; Stensballe, A.; Rasmussen, T.E. J. Proteome Res., 2004, 3(3), 426.
[60]
Gao, G.H.; Huang, J.L.; Zhou, J.C.; Xie, P.F. Chin. J. Biotechnol., 2015, 13(3), 165-169.
[61]
Yin, B.C.; Wang, W.T.; Wang, L.C.J. J. Beijing Univ. Technol, 2015, 41(1), 48-59.
[62]
Zheng, T.; Chen, Q.Q.; Zhang, Y.J. J. Image Graph., 2014, 19(2), 175-184.
[63]
Cao, R.; Wang, Z.; Wang, Y. BMC Bioinformatics, 2014, 15(1), 1-8.
[64]
Cao, R.; Bhattacharya, D.; Hou, J. BMC Bioinformatics, 2016, 17(1), 495.
[65]
Cao, R.; Adhikari, B.; Bhattacharya, D. Bioinformatics, 2016, 33(4), 586.
[66]
Cao, R.; Freitas, C.; Chan, L. Molecules, 2017, 22(10), 1732.
[67]
Datta, S.; Mukhopadhyay, S. Int. J. Nanomedicine, 2014, 9(Issue 1), 2225-2239.
[68]
Wang, M.H.; Wang, L.R.; Xu, W.L.; Lin, X.J.; Jiang, Z.H.; Feng, H.Q. Chin. J. Biomed. Engineer., 2007, 3, 404-408.
[69]
Dou, Y.; Yao, B.; Zhang, C. Amino Acids, 2014, 46(6), 1459-1469.
[70]
Jung, I.; Matsuyama, A.; Yoshida, M. BMC Bioinformatics, 2010, 11(Suppl. 1), 1-10.
[71]
Pearson, W.R. Curr. Prot. Bioinform, 2013, 43, 3.5.1-3.5.9.
[72]
Feng, P.M.; Lin, H.; Chen, W. Comput. Math. Methods Med., 2013, 2013(2), 567529.
[73]
Chen, W.; Feng, P.M.; Lin, H. BioMed Res. Int., 2014, 2014(2), 623149.
[74]
Chen, W.; Yang, H.; Feng, P. Bioinformatics, 2017, 33(22), 3518-3523.
[75]
Feng, P.; Yang, H.; Ding, H. Genomics, 2019, 11(1), 96-102.
[76]
Chen, W.; Tang, H.; Lin, H. J. Biomol. Struct. Dyn., 2016, 35(3), 683-687.
[77]
Steland, A.; Sachs, R.V. Bernoulli, 2016, 23(4), 2299-2329.
[78]
Tariq, A.; Karim, A.; Foroosh, H. IEEE Trans. Pattern Anal. Mach. Intell., 2016, 99, 2000-2014.
[79]
Alhamzawi, R. Comput. Stat. Data Anal., 2016, 103(2), 68-78.
[80]
Fragoso, T.M.; De Andrade, M.; Pereira, A.C. Genet. Epidemiol., 2016, 40(3), 253-263.
[81]
Momenpour, T.M.A.; Anis, H. Spectrochim. Acta A, 2017, 185, 98.
[82]
Xue, Y.; Ren, J.; Gao, X.; Jin, C.; Wen, L.; Yao, X. Mol. Cell. Proteomics, 2008, 7(9), 1598-1608.
[83]
Wan, J.; Kang, S.; Tang, C.; Yan, J.; Ren, Y.; Liu, J.; Gao, X.; Banerjee, A.; Ellis, L.B.; Li, T. Nucleic Acids Res., 2008, 36(4), e22.
[84]
Miller, M.L.; Jensen, L.J.; Diella, F.; Jorgensen, C. Sci. Signal., 2008, 1(35), ra2.
[85]
Dang, T.H.; Van Leemput, K.; Verschoren, A.; Laukens, K. Bioinformatics, 2008, 24(24), 2857-2864.
[86]
Xue, Y.; Gao, X.J.; Cao, J. Curr. Protein Pept. Sci., 2010, 11(6), 485-496.
[87]
Chou, K.C.; Shen, H.B. Nat. Sci., 2009, 2, 63-92.

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy