Abstract
In protein-RNA interactions, amino acids often exhibit different preferences for its RNA partners with different neighbor amino acids. Hence, the interaction propensity of an amino acid can be better assessed by considering neighbors of the amino acid than examining the amino acid alone. In this study, we computed the interaction propensity of three consecutive amino acids (called amino acid triplet or triple amino acids) from the rigorous analysis of the recent structure data of protein-RNA complexes. We used the interaction propensity to predict RNA-binding sites in protein sequences with a support vector machine (SVM) classifier, and observed that the interaction propensities of amino acid triplets are more effective than other biochemical properties of amino acids for predicting RNA-binding sites in proteins. Experimental results with non-redundant 134 protein sequences showed that the SVM classifier achieved a sensitivity of 77% and specificity of 76% and that the three-residue interaction propensity resulted in a better performance than single- or fiveresidue interaction propensities. Comparison of the SVM classifier with RNABindR and BindN demonstrated that it outperforms the other two methods in the net prediction and correlation coefficient. Our SVM classifier can also be used to predict protein-binding nucleotides in RNA sequences.
Keywords: Binding site, interaction propensity, prediction, rna-binding protein, SVM