Abstract
Background: Research has shown that essential proteins play important roles in the development and survival of organisms. Because of the high costs of traditional biological experiments, several computational prediction methods based on known protein-protein interactions (PPIs) have been recently proposed to detect essential proteins.
Objective: Here, a novel prediction model called IoMCD is proposed to identify essential proteins by combining known PPIs with a variety of biological information about proteins, including gene expression data and homologous information of proteins.
Methods: Compared to the traditional state-of-the-art prediction models, IoMCD involves two kinds of weights that are obtained, respectively, by extracting topological features of proteins from the original known protein–protein interaction (PPI) networks and calculating the Pearson correlation coefficients (PCCs) between the gene expression data of proteins. Based on these two kinds of weights and adopting a cross-entropy method, a unique weight is assigned to each protein. Subsequently, the homologous information of proteins is used to calculate an initial score for each protein. Finally, based on the unique weights and initial score of proteins, an iterative method is designed to measure the essentialities of proteins.
Results: Intensive experiments were performed, and simulation results showed that the prediction accuracy of IoMCD, based on the dataset downloaded from the DIP and Gavin databases, was 92.16% and 89.71%, respectively, in the top 1% of the predicted essential proteins.
Conclusion: Both simulation results demonstrated that IoMCD can achieve excellent prediction accuracy and could be an effective method for essential protein prediction.
Keywords: Essential proteins, cross-entropy, orthologous proteins, weighted protein-protein interaction network, computational, biological.
Graphical Abstract