PEPRF: Identification of Essential Proteins by Integrating Topological Features of PPI Network and Sequence-based Features via Random Forest

Chuanyan       Wu; Bentao       Lin; Kai       Shi; Qingju       Zhang; Rui       Gao; Zhiguo       Yu; Yang       De Marinis; Yusen       Zhang; Zhi-Ping       Liu

doi:10.2174/1574893616666210617162258

Abstract

Background: Essential proteins play an important role in the process of life, which can be identified by experimental methods and computational approaches. Experimental approaches to identify essential proteins are of high accuracy but with the limitation of time and resource-consuming.

Objective: Herein, we present a computational model (PEPRF) to identify essential proteins based on machine learning.

Methods: Different features of proteins were extracted. Topological features of Protein-Protein Interaction (PPI) network-based are extracted. Based on the protein sequence, graph theory-based features, information- based features, composition and physichemical features, etc., were extracted. Finally, 282 features are constructed. In order to select the features that contributed most to the identification, ReliefF- based feature selection method was adopted to measure the weights of these features.

Results: As a result, 212 features were curated to train random forest classifiers. Finally, PEPRF get the AUC of 0.71 and an accuracy of 0.742.

Conclusion: Our results show that PEPRF may be applied as an efficient tool to identify essential proteins.

Keywords: Essential protein prediction, graph energy, feature extraction, ReliefF-based feature selection, random forest classifier, PEPRF.

« Previous Next »

Graphical Abstract

Rights & Permissions Print Cite

Article Metrics

16

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/1574893616666210617162258	Print ISSN 1574-8936
Publisher Name Bentham Science Publisher	Online ISSN 2212-392X

Current Bioinformatics

PEPRF: Identification of Essential Proteins by Integrating Topological Features of PPI Network and Sequence-based Features via Random Forest

Abstract Play Pause

Graphical Abstract

Related Journals

Related Books

Abstract