Abstract
Protein methylation is an important and reversible post-translational modification which regulates diverse protein properties. Many methylation sites on arginine and lysine have been identification through experiments. However, experimental identification without prior knowledge is laborious and costly. Hence, there is interest in the development of computational methods for reliable prediction of methylation sites. Prediction of methylation sites may provide researches with useful information for further productivity in methylation candidate sites discovery. This work proposes Methcrf, a computational predictor based on conditional random field (CRF) for predicting protein methylation sites limit to lysine and arginine residues due to the absence of enough experimentally verified data for other residues. The approach is developed to consider combining protein sequence features with structural information such as solvent accessibility of amino acids that surround the methylation sites. In 10-fold cross validation Methcrf can achieve the area under receiver operating characteristic curve (AUC) of 0.85 and 0.80 for arginine and lysine, respectively. The proposed method has comparable performance with previous methods for accurately predicting methylation sites.
Keywords: CRF, methylation, prediction, post-translational modification, arginine, lysine, computational methods, protein sequence, proteome, cellular plasticity