Abstract
Protein-protein interactions (PPIs) are crucial to most biochemical processes in human beings. Although many human PPIs have been identified by experiments, the number is still limited compared to the available protein sequences of human organisms. Recently, many computational methods have been proposed to facilitate the recognition of novel human PPIs. However the existing methods only concentrated on the information of individual PPI, while the systematic characteristic of protein-protein interaction networks (PINs) was ignored. In this study, a new method was proposed by combining the global information of PINs and protein sequence information. Random forest (RF) algorithm was implemented to develop the prediction model, and a high accuracy of 91.88% was obtained. Furthermore, the RF model was tested by using three independent datasets with good performances, suggesting that our method is a useful tool for identification of PPIs and investigation into PINs as well.
Keywords: Auto covariance, hierarchical random graph, link prediction, protein-protein interactions, protein-protein interaction networks, random forest, cellular automaton image, non-interacting protein pairs, WoLFPSORT package, polarizability, proteomics studies, HIV-1 reverse transcriptase, PINs, paralogous verification methods, protein-protein sequence, solvent accessible surface area, net charge index, RF algorithm, independent dataset, human PPIs, human PINsAuto covariance, hierarchical random graph, link prediction, protein-protein interactions, protein-protein interaction networks, random forest, cellular automaton image, non-interacting protein pairs, WoLFPSORT package, polarizability, proteomics studies, HIV-1 reverse transcriptase, PINs, paralogous verification methods, protein-protein sequence, solvent accessible surface area, net charge index, RF algorithm, independent dataset, human PPIs, human PINs