Abstract
Protein-Protein Interaction (PPI) prediction is a well known problem in Bioinformatics, for which a large number of techniques have been proposed in the past. However, prediction results have not been sufficiently satisfactory for guiding biologists in web-lab experiments. One reason is that not all useful information, such as pairwise protein interaction information based on sequence alignment, has been integrated together in PPI prediction. Alignment is a basic concept to measure sequence similarity in Proteomics that has been used in a number of applications ranging from protein recognition to protein subcellular localization. In this article, we propose a novel integrated approach to predicting PPI based on sequence alignment by jointly using a k-Nearest Neighbor classifier (SA-kNN) and a Support Vector Machine (SVM). SVM is a machine learning technique used in a wide range of Bioinformatics applications, thanks to the ability to alleviate the overfitting problems. We demonstrate that in our approach the two methods, SA-kNN and SVM, are complementary, which are combined in an ensemble to overcome their respective limitations. While the SVM is trained on Amino Acid (AA) compositions and protein signatures mined from literature, the SA-kNN makes use of the similarity of two protein pairs through alignment. Experimentally, our technique leads to a significant gain in accuracy, precision and sensitivity measures at ∼5%, 16% and 10% respectively.
Keywords: Protein-protein interaction, SVM, k-nearest neighbor, smith-waterman, stacking, ensamble learning, PPIs, PPI prediction, Bioinformatics, single instance encoding