Abstract
With a huge amount of protein sequence data, the computational method for protein – protein interaction (PPI) prediction using only the protein sequences information have drawn increasing interest. In this article, we propose a sequence- based method based on a novel representation of local protein sequence descriptors. Local descriptors account for the interactions between residues in both continuous and discontinuous regions of a protein sequence, so this method enables us to extract more PPI information from the sequence. A series of elaborate experiments are performed to optimize the prediction model by varying the parameter k and the distance measuring function of the k-nearest neighbors learning system and the ways of coding a protein pair. When performed on the PPI data of Saccharomyces cerevisiae, the method achieved 86.15% prediction accuracy with 81.03% sensitivity at the precision of 90.24%. An independent data set of 986 Escherichia coli PPIs was used to evaluate this prediction model and the prediction accuracy is 73.02%. Given the complex nature of PPIs, the performance of our method is promising, and it can be a helpful supplement for PPIs prediction.
Keywords: Feature representation, KNNs, local descriptors, PPIs prediction, protein sequence, sequence-based method