Abstract
Computational approaches are able to analyze protein-protein interactions (PPIs) from a different angle of view by complementing the experimental ones. And they are very efficient in determining whether two proteins can interact with each other. In this paper, KNNs (K-nearest neighbors) is applied to predict the PPIs by coding each protein with the physical and chemical properties of its residues, predicted secondary structures and amino acid compositions. mRMR (minimum-redundancy maximum-relevance) feature selection is adopted to select a compact feature set, features of which are considered to be important for the determination of PPI-nesses. Because the size of the negative dataset (containing non-interactive protein pairs) is much larger than that of the positive dataset (containing interactive protein pairs), the negative dataset is divided into 5 portions and each portion is combined with the positive dataset for one prediction. Thus 5 predictions are performed and the final results are obtained through voting. As a result, the prediction achieves an overall accuracy of 0.8369 with sensitivity of 0.7356. The predictor, developed by this research for the prediction of the fruit fly PPI-nesses, is available for public use at http://chemdata.shu.edu.cn/ppip.
Keywords: Bioinformatics, feature selection, KNNs, protein-protein interactions, unbalanced data, mRMR (minimum-redundancy maximum-relevance), PPI-nesses, negative dataset