Abstract
Aim and Objective: Protein malonylation is a newly discovered post-translational modification. Malonylation is known to closely be associated with type 2 diabetes and to play its regulatory role in fatty acid oxidation and the associated genetic disease. Identifying protein malonylations might lay a solid foundation to explore malonylation function. Due to the limitations of experimental techniques, it is a great challenge to fast and accurately identify malonylation sites.
Methods: We proposed a computational method to predict malonylation sites and to analyze malonylation pattern. We firstly extracted protein segments so that the lysine is at the center of each segment. Then, each segment was encoded by the pseudo amino acid compositions. The support vector machine classifier trained by a training dataset was built to distinguish malonylation sites from non-malonylation ones.
Results: The leave-one-out test on the training dataset reached the accuracy of 0.7733, and the independent test on the testing dataset got 0.8889. Furthermore, the classifier also successfully identified 144 of 160 putative malonylation sites. Analyses on the differences between malonylation and non-malonylation segments implicated that lysine malonylation should follow a specific pattern, e.g. lysine with its neighbors being Glycine and Alanine might be more likely to be malonylated. Therefore, the proposed method is expected to be a promising tool to identify malonylation sites.
Keywords: Protein post-translational modification, lysine malonylation, support vector machine, pseudo amino acid composition, leave-one-out test, dataset.