Generic placeholder image

Current Proteomics

Editor-in-Chief

ISSN (Print): 1570-1646
ISSN (Online): 1875-6247

Research Article

ABC-Gly: Identifying Protein Lysine Glycation Sites with Artificial Bee Colony Algorithm

Author(s): Yanqiu Yao, Xiaosa Zhao*, Qiao Ning and Junping Zhou*

Volume 18, Issue 1, 2021

Published on: 27 December, 2019

Page: [18 - 26] Pages: 9

DOI: 10.2174/1570164617666191227120136

Price: $65

Abstract

Background: Glycation is a nonenzymatic post-translational modification process by attaching a sugar molecule to a protein or lipid molecule. It may impair the function and change the characteristic of the proteins which may lead to some metabolic diseases. In order to understand the underlying molecular mechanisms of glycation, computational prediction methods have been developed because of their convenience and high speed. However, a more effective computational tool is still a challenging task in computational biology.

Methods: In this study, we showed an accurate identification tool named ABC-Gly for predicting lysine glycation sites. At first, we utilized three informative features, including position-specific amino acid propensity, secondary structure and the composition of k-spaced amino acid pairs to encode the peptides. Moreover, to sufficiently exploit discriminative features thus can improve the prediction and generalization ability of the model, we developed a two-step feature selection, which combined the Fisher score and an improved binary artificial bee colony algorithm based on the support vector machine. Finally, based on the optimal feature subset, we constructed an effective model by using the Support Vector Machine on the training dataset.

Results: The performance of the proposed predictor ABC-Gly was measured with the sensitivity of 76.43%, the specificity of 91.10%, the balanced accuracy of 83.76%, the Area Under the receiveroperating characteristic Curve (AUC) of 0.9313, a Matthew’s Correlation Coefficient (MCC) of 0.6861 by 10-fold cross-validation on training dataset, and a balanced accuracy of 59.05% on independent dataset. Compared to the state-of-the-art predictors on the training dataset, the proposed predictor achieved significant improvement in the AUC of 0.156 and MCC of 0.336.

Conclusion: The detailed analysis results indicated that our predictor may serve as a powerful complementary tool to other existing methods for predicting protein lysine glycation. The source code and datasets of the ABC-Gly were provided in the Supplementary File 1.

Keywords: Lysine, glycation, secondary structure, fisher score, support vector machine, artificial bee colony algorithm.

Graphical Abstract

[1]
Miller, A.K.; Hambly, D.M.; Kerwin, B.A.; Treuheit, M.J.; Gadgil, H.S. Characterization of site-specific glycation during process development of a human therapeutic monoclonal antibody. J. Pharm. Sci., 2011, 100(7), 2543-2550.
[http://dx.doi.org/10.1002/jps.22504] [PMID: 21287557]
[2]
Lapolla, A.; Fedele, D.; Martano, L..; Arico’, N.C.; Garbeglio, M; Traldi, P; Seraglia, R; Favretto, D Advanced glycation end products: A highly complex set of biologically relevant compounds detected by mass spectrometry J. Mass Spectrom, 2001, 36(4), 370-378.
[http://dx.doi.org/10.1002/jms.137] [PMID: 11333439]
[3]
Cho, S.J.; Roman, G.; Yeboah, F.; Konishi, Y. The road to advanced glycation end products: a mechanistic perspective Curr. Med. Chem., 2007, 14(15), 1653-1671.
[http://dx.doi.org/10.2174/092986707780830989] [PMID: 17584071]
[4]
Ahmed, N.; Babaei-Jadidi, R.; Howell, S.K.; Beisswenger, P.J.; Thornalley, P.J. Degradation products of proteins damaged by glycation, oxidation and nitration in clinical type 1 diabetes. Diabetologia, 2005, 48(8), 1590-1603.
[http://dx.doi.org/10.1007/s00125-005-1810-7] [PMID: 15988580]
[5]
Ling, X.; Sakashita, N.; Takeya, M.; Nagai, R.; Horiuchi, S.; Takahashi, K. Immunohistochemical distribution and subcellular localization of three distinct specific molecular structures of advanced glycation end products in human tissues. Lab. Invest., 1998, 78(12), 1591-1606.
[PMID: 9881959]
[6]
Guedes, S.; Vitorino, R.; Domingues, M.R.; Amado, F.; Domingues, P. Glycation and oxidation of histones H2B and H1: in Vitro study and characterization by mass spectrometry. Anal. Bioanal. Chem., 2011, 399(10), 3529-3539.
[http://dx.doi.org/10.1007/s00216-011-4679-y] [PMID: 21274518]
[7]
Tang, Y.R.; Chen, Y.Z.; Canchaya, C.A.; Zhang, Z. GANNPhos: a new phosphorylation site predictor based on a genetic algorithm integrated neural network. Protein Eng. Des. Sel., 2007, 20(8), 405-412.
[http://dx.doi.org/10.1093/protein/gzm035] [PMID: 17652129]
[8]
Johansen, M.B.; Kiemer, L.; Brunak, S. Analysis and prediction of mammalian protein glycation. Glycobiology, 2006, 16(9), 844-853.
[http://dx.doi.org/10.1093/glycob/cwl009] [PMID: 16762979]
[9]
Liu, Y.; Gu, W.; Zhang, W.; Wang, J. Predict and analyze protein glycation sites with the mRMR and IFS methods. BioMed Res. Int., 2015, 2015561547
[http://dx.doi.org/10.1155/2015/561547] [PMID: 25961025]
[10]
Xu, Y.; Li, L.; Ding, J.; Wu, L.Y.; Mai, G.; Zhou, F. Gly-PseAAC: identifying protein lysine glycation through sequences. Gene, 2017, 602, 1-7.
[http://dx.doi.org/10.1016/j.gene.2016.11.021] [PMID: 27845204]
[11]
Ju, Z.; Sun, J.; Li, Y.; Wang, L. Predicting lysine glycation sites using bi-profile bayes feature extraction. Comput. Biol. Chem., 2017, 71, 98-103.
[http://dx.doi.org/10.1016/j.compbiolchem.2017.10.004] [PMID: 29040908]
[12]
Zhao, X.; Zhao, X.; Bao, L.; Zhang, Y.; Dai, J.; Yin, M. Glypre: in silico prediction of protein glycation sites by fusing multiple features and support vector machine. Molecules, 2017, 22(11), 1891.
[http://dx.doi.org/10.3390/molecules22111891] [PMID: 29099805]
[13]
Reddy, H.M.; Sharma, A.; Dehzangi, A.; Shigemizu, D.; Chandra, A.A.; Tsunoda, T. GlyStruct: glycation prediction using structural properties of amino acid residues. BMC Bioinformatics, 2019, 19(Suppl. 13), 547.
[http://dx.doi.org/10.1186/s12859-018-2547-x] [PMID: 30717650]
[14]
Li, T.; Yin, Q.; Song, R. Multidimensional scaling method for prediction of lysine glycation sites. Computing, 2019, 101, 705-724.
[http://dx.doi.org/10.1007/s00607-019-00710-x]
[15]
Yu, J.; Shi, S.; Zhang, F. PredGly: predicting lysine glycation sites for Homo sapiens based on XGboost feature optimization. Bioinformatics, 2019, 35(16), 2749-2756.
[PMID: 30590442]
[16]
Xue, B.; Zhang, M.; Browne, W.; Yao, X. A survey on evolutionary computation approaches to feature selection. IEEE Transact. Evolut. Comp., 2016, 20, 606-626.
[http://dx.doi.org/10.1109/TEVC.2015.2504420]
[17]
Zhao, X.S.; Bao, L.L. An improved binary differential evolution algorithm for feature selection in molecular signatures. Mol. Inform., 2018, 37(4)e1700081
[PMID: 29106044]
[18]
Liu, B.; Tian, M.; Zhang, C.; Li, X. Discrete biogeography based optimization for feature selection in molecule signatures. Mol. Inform., 2015, 34(4), 197-215.
[http://dx.doi.org/10.1002/minf.201400065] [PMID: 27490166]
[19]
Karaboga, D. An idea based on honey bee swarm for numerical optimization. Technical Report-TRO6; Erciyes University, Engineering Faculty, Computer Engineering Department, 2005.
[20]
Alomari, O.; Othman, Z.A. Bees algorithm for feature selection in network anomaly detection. J. Appl. Sci. Res., 2012, 8, 1748-1756.
[21]
Chahkandi, V.; Yaghoobi, M.; Veisi, G. Feature selection with Chaotic Hybrid Artificial Bee Colony algorithm based on Fuzzy (CHABCF). J. Soft Comput. Appl., 2013, 2013, 1-8.
[http://dx.doi.org/10.5899/2013/jsca-00014]
[22]
Hancer, E.; Xue, B.; Karaboga, D. A binary ABC algorithm based on advanced similarity scheme for feature selection. Appl. Soft Comput., 2015, 36, 334-348.
[http://dx.doi.org/10.1016/j.asoc.2015.07.023]
[23]
Moayedikia, A.; Jensen, R.; Wiil, U.K. Weighted bee colony algorithm for discrete optimization problems with application to feature selection. Eng. Appl. Artif. Intell., 2015, 44, 153-167.
[http://dx.doi.org/10.1016/j.engappai.2015.06.003]
[24]
Marinaki, M.; Marinakis, Y. A bumble bees mating optimization algorithm for the feature selection problem. Int. J. Mach. Learn. Cybern., 2016, 7, 519-538.
[http://dx.doi.org/10.1007/s13042-014-0276-7]
[25]
Yavuz, G.; Aydin, D. Angle modulated artificial bee colony algorithms for feature selection. Appl. Comput. Intell. Soft Comput., 2016, 2016, 7.
[http://dx.doi.org/10.1155/2016/9569161]
[26]
Kuo, R.J.; Huang, S.B.L.; ZulVia, F.E. Artificial bee colony-based support vector machines with feature selection and parameter optimization for rule extraction. Knowl. Inf. Syst., 2018, 55, 253-274.
[http://dx.doi.org/10.1007/s10115-017-1083-8]
[27]
Zorarpacı, E.; Özel, S.A. A hybrid approach of differential evolution and artificial bee colony for feature selection. Expert Syst. Appl., 2016, 62, 91-103.
[http://dx.doi.org/10.1016/j.eswa.2016.06.004]
[28]
Liu, Z.; Wang, Y.; Gao, T.; Pan, Z.; Cheng, H.; Yang, Q.; Cheng, Z.; Guo, A.; Ren, J.; Xue, Y. CPLM: a database of protein lysine modifications. Nucleic Acids Res., 2014, 42(Database issue), D531-D536.
[http://dx.doi.org/10.1093/nar/gkt1093] [PMID: 24214993]
[29]
Xu, H.; Zhou, J.; Lin, S.; Deng, W.; Zhang, Y.; Xue, Y. PLMD: an updated data resource of protein lysine modifications. J. Genet. Genomics, 2017, 44(5), 243-250.
[http://dx.doi.org/10.1016/j.jgg.2017.03.007] [PMID: 28529077]
[30]
Niu, S.; Huang, T.; Feng, K.; Cai, Y.; Li, Y. Prediction of tyrosine sulfation with mRMR feature selection and analysis. J. Proteome Res., 2010, 9(12), 6490-6497.
[http://dx.doi.org/10.1021/pr1007152] [PMID: 20973568]
[31]
Huang, S.Y.; Shi, S.P.; Qiu, J.D.; Sun, X.Y.; Suo, S.B.; Liang, R.P. PredSulSite: prediction of protein tyrosine sulfation sites with multiple features and analysis. Anal. Biochem., 2012, 428(1), 16-23.
[http://dx.doi.org/10.1016/j.ab.2012.06.003] [PMID: 22691961]
[32]
McGuffin, L.J.; Bryson, K.; Jones, D.T. The PSIPRED protein structure prediction server. Bioinformatics, 2000, 16(4), 404-405.
[http://dx.doi.org/10.1093/bioinformatics/16.4.404] [PMID: 10869041]
[33]
Zhang, W.; Xu, X.; Yin, M.; Luo, N.; Zhang, J.; Wang, J. Prediction of methylation sites using the composition of K-spaced amino acid pairs. Protein Pept. Lett., 2013, 20(8), 911-917.
[http://dx.doi.org/10.2174/0929866511320080008] [PMID: 23276225]
[34]
Chen, Y.Z.; Tang, Y.R.; Sheng, Z.Y.; Zhang, Z. Prediction of mucin-type O-glycosylation sites in mammalian proteins using the composition of k-spaced amino acid pairs. BMC Bioinformatics, 2008, 9, 101.
[http://dx.doi.org/10.1186/1471-2105-9-101] [PMID: 18282281]
[35]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn., 1995, 20, 273-297.
[http://dx.doi.org/10.1007/BF00994018]
[36]
Chang, C.C.; Lin, C.J. LIBSVM: A Library for support vector machines. ACM T. Intel. Syst. Tec., 2011, 2, 389-396.
[http://dx.doi.org/10.1145/1961189.1961199]
[37]
Hasan, M.M.; Manavalan, B.; Shamima, K.M. Prediction of S-nitrosylation sites by integrating support vector machine and random forest. Mol. Biosyst., 2019, 15, 451-458.
[38]
Basith, S.; Manavalan, B.; Shin, T.H.; Lee, G. SDM6A: a web-based integrative machine-learning framework for predicting 6mA sites in the rice genome. Mol. Ther-Nucl. Mol. Ther. Nucleic Acids, 2019, 18, 131-141.
[http://dx.doi.org/10.1016/j.omtn.2019.08.011] [PMID: 31542696]
[39]
Manavalan, B.; Basith, S.; Shin, T.H. mAHTPred: A sequence-based meta-predictor for improVing the prediction of anti-hypertensive peptides using effective feature representation. Bioinformatics, 2018.
[PMID: 30590410]
[40]
Wei, L.; Su, R.; Luan, S.; Liao, Z.; Manavalan, B.; Zou, Q.; Shi, X. Iterative feature representations improve N4-methylcytosine site prediction. Bioinformatics, 2019.btz408
[http://dx.doi.org/10.1093/bioinformatics/btz408] [PMID: 31099381]
[41]
Zhao, X.; Ning, Q.; Ai, M.; Chai, H.; Yang, G. Identification of S-glutathionylation sites in species-specific proteins by incorporating five sequence-derived features into the general pseudo-amino acid composition. J. Theor. Biol., 2016, 398, 96-102.
[http://dx.doi.org/10.1016/j.jtbi.2016.03.030] [PMID: 27025952]
[42]
Furey, T.S.; Cristianini, N.; Duffy, N.; Bednarski, D.W.; Schummer, M.; Haussler, D. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics, 2000, 16(10), 906-914.
[http://dx.doi.org/10.1093/bioinformatics/16.10.906] [PMID: 11120680]
[43]
Ding, C.; Peng, H. Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol., 2005, 3(2), 185-205.
[http://dx.doi.org/10.1142/S0219720005001004] [PMID: 15852500]
[44]
Zhang, J.; Sun, P.; Zhao, X.; Ma, Z. PECM: prediction of extracellular matrix proteins using the concept of Chou’s pseudo amino acid composition. J. Theor. Biol., 2014, 363, 412-418.
[http://dx.doi.org/10.1016/j.jtbi.2014.08.002] [PMID: 25123433]
[45]
Manavalan, B.; Basith, S.; Shin, T.H. AtbPpred: A robust sequence-based prediction of anti-tubercular peptides using extremely randomized trees Comput. Struct. Biotec, 2019, 17, 972-981.
[46]
Chen, Z.; Zhao, P.; Li, F.; Wang, Y.; Smith, A.I.; Webb, G.I.; Akutsu, T.; Baggag, A.; Bensmail, H.; Song, J. Comprehensive reView and assessment of computational methods for predicting RNA post-transcriptional modification sites from RNA sequences. Brief. Bioinform., 2020, 21(5), 1676-1696.
[http://dx.doi.org/10.1093/bib/bbz112] [PMID: 31714956]
[47]
Boopathi, V.; Subramaniyam, S.; Malik, A.; Lee, G.; Manavalan, B.; Yang, D.C. mACPpred: A support vector machine-based meta-predictor for identification of anticancer peptides. Int. J. Mol. Sci., 2019, 20(8), 1964.
[http://dx.doi.org/10.3390/ijms20081964] [PMID: 31013619]
[48]
Li, F.; Chen, J.; Leier, A.; Marquez-Lago, T.; Liu, Q.; Wang, Y.; Revote, J.; Smith, A.I.; Akutsu, T.; Webb, G.I.; Kurgan, L.; Song, J. DeepCleave: A deep learning predictor for caspase and matrix metalloprotease substrates and cleavage sites. Bioinformatics, 2020, 36(4), 1057-1065.
[http://dx.doi.org/10.1093/bioinformatics/btz721] [PMID: 31566664]
[49]
Manavalan, B.; Basith, S.; Shin, T.H.; Wei, L.; Lee, G. Meta-4mCpred: A sequence-based meta-predictor for accurate DNA 4mC site prediction using effective feature representation. Mol. Ther. Nucleic Acids, 2019, 16, 733-744.
[http://dx.doi.org/10.1016/j.omtn.2019.04.019] [PMID: 31146255]
[50]
Zhang, Y.; Yu, S.; xie, R.; Li, J.; Leier, A. Marquez-Lago, T.T.; Akutsu, T.; Smith, A.I.; Ge, Z.; Wang, J.; Lithgow, T.; Song, J. PeNGaRoo, a combined gradient boosting and ensemble learning framework for predicting non-classical secreted proteins Bioinformatics, 2019, 1, 9.
[http://dx.doi.org/10.1093/bioinformatics/btz629] [PMID: 31393553]

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy