Abstract
Background: Glycation is a nonenzymatic post-translational modification process by attaching a sugar molecule to a protein or lipid molecule. It may impair the function and change the characteristic of the proteins which may lead to some metabolic diseases. In order to understand the underlying molecular mechanisms of glycation, computational prediction methods have been developed because of their convenience and high speed. However, a more effective computational tool is still a challenging task in computational biology.
Methods: In this study, we showed an accurate identification tool named ABC-Gly for predicting lysine glycation sites. At first, we utilized three informative features, including position-specific amino acid propensity, secondary structure and the composition of k-spaced amino acid pairs to encode the peptides. Moreover, to sufficiently exploit discriminative features thus can improve the prediction and generalization ability of the model, we developed a two-step feature selection, which combined the Fisher score and an improved binary artificial bee colony algorithm based on the support vector machine. Finally, based on the optimal feature subset, we constructed an effective model by using the Support Vector Machine on the training dataset.
Results: The performance of the proposed predictor ABC-Gly was measured with the sensitivity of 76.43%, the specificity of 91.10%, the balanced accuracy of 83.76%, the Area Under the receiveroperating characteristic Curve (AUC) of 0.9313, a Matthew’s Correlation Coefficient (MCC) of 0.6861 by 10-fold cross-validation on training dataset, and a balanced accuracy of 59.05% on independent dataset. Compared to the state-of-the-art predictors on the training dataset, the proposed predictor achieved significant improvement in the AUC of 0.156 and MCC of 0.336.
Conclusion: The detailed analysis results indicated that our predictor may serve as a powerful complementary tool to other existing methods for predicting protein lysine glycation. The source code and datasets of the ABC-Gly were provided in the Supplementary File 1.
Keywords: Lysine, glycation, secondary structure, fisher score, support vector machine, artificial bee colony algorithm.
Graphical Abstract
[http://dx.doi.org/10.1002/jps.22504] [PMID: 21287557]
[http://dx.doi.org/10.1002/jms.137] [PMID: 11333439]
[http://dx.doi.org/10.2174/092986707780830989] [PMID: 17584071]
[http://dx.doi.org/10.1007/s00125-005-1810-7] [PMID: 15988580]
[PMID: 9881959]
[http://dx.doi.org/10.1007/s00216-011-4679-y] [PMID: 21274518]
[http://dx.doi.org/10.1093/protein/gzm035] [PMID: 17652129]
[http://dx.doi.org/10.1093/glycob/cwl009] [PMID: 16762979]
[http://dx.doi.org/10.1155/2015/561547] [PMID: 25961025]
[http://dx.doi.org/10.1016/j.gene.2016.11.021] [PMID: 27845204]
[http://dx.doi.org/10.1016/j.compbiolchem.2017.10.004] [PMID: 29040908]
[http://dx.doi.org/10.3390/molecules22111891] [PMID: 29099805]
[http://dx.doi.org/10.1186/s12859-018-2547-x] [PMID: 30717650]
[http://dx.doi.org/10.1007/s00607-019-00710-x]
[PMID: 30590442]
[http://dx.doi.org/10.1109/TEVC.2015.2504420]
[PMID: 29106044]
[http://dx.doi.org/10.1002/minf.201400065] [PMID: 27490166]
[http://dx.doi.org/10.5899/2013/jsca-00014]
[http://dx.doi.org/10.1016/j.asoc.2015.07.023]
[http://dx.doi.org/10.1016/j.engappai.2015.06.003]
[http://dx.doi.org/10.1007/s13042-014-0276-7]
[http://dx.doi.org/10.1155/2016/9569161]
[http://dx.doi.org/10.1007/s10115-017-1083-8]
[http://dx.doi.org/10.1016/j.eswa.2016.06.004]
[http://dx.doi.org/10.1093/nar/gkt1093] [PMID: 24214993]
[http://dx.doi.org/10.1016/j.jgg.2017.03.007] [PMID: 28529077]
[http://dx.doi.org/10.1021/pr1007152] [PMID: 20973568]
[http://dx.doi.org/10.1016/j.ab.2012.06.003] [PMID: 22691961]
[http://dx.doi.org/10.1093/bioinformatics/16.4.404] [PMID: 10869041]
[http://dx.doi.org/10.2174/0929866511320080008] [PMID: 23276225]
[http://dx.doi.org/10.1186/1471-2105-9-101] [PMID: 18282281]
[http://dx.doi.org/10.1007/BF00994018]
[http://dx.doi.org/10.1145/1961189.1961199]
[http://dx.doi.org/10.1016/j.omtn.2019.08.011] [PMID: 31542696]
[PMID: 30590410]
[http://dx.doi.org/10.1093/bioinformatics/btz408] [PMID: 31099381]
[http://dx.doi.org/10.1016/j.jtbi.2016.03.030] [PMID: 27025952]
[http://dx.doi.org/10.1093/bioinformatics/16.10.906] [PMID: 11120680]
[http://dx.doi.org/10.1142/S0219720005001004] [PMID: 15852500]
[http://dx.doi.org/10.1016/j.jtbi.2014.08.002] [PMID: 25123433]
[http://dx.doi.org/10.1093/bib/bbz112] [PMID: 31714956]
[http://dx.doi.org/10.3390/ijms20081964] [PMID: 31013619]
[http://dx.doi.org/10.1093/bioinformatics/btz721] [PMID: 31566664]
[http://dx.doi.org/10.1016/j.omtn.2019.04.019] [PMID: 31146255]
[http://dx.doi.org/10.1093/bioinformatics/btz629] [PMID: 31393553]