Abstract
It is a fundamental challenge that identifying disease genes from a large number of candidates for a specific disease. As the biological experiment-based methods are generally timeconsuming and laborious, it has become a new strategy to identify disease candidates by using computational approaches. In this paper, we proposed an algorithm based on the search engine ranking method, named PDGTR, to prioritize disease candidates. Firstly, we constructed a weighted human disease network by calculating the topological similarity and phenotype similarity of each pair of diseases. Then, we calculated the similarities of all the genes by using the protein-protein interaction network and the edge clustering coefficient. For a specific disease, a logistic regression model was used to generate the prior-knowledge of each gene. Finally, the search engine ranking based algorithm PDGTR was applied to prioritize the disease candidates. The proposed algorithm PDGTR was tested on five typical cancers: Breast Cancer, Colorectal Cancer, Hepatocellular carcinoma, Gastric Cancer and Osteoporosis, and compared with four state-of-the-art algorithms: RWR, DADA, PRINCE and PRP. The experimental results based on the leave-one-out cross validation, precision, ROC curve, and enrichment show that the proposed algorithm PDGTR outperforms RWR, DADA, PRINCE and PRP. Moreover, some potential disease genes were predicted by PDGTR and already mentioned by some literatures.
Keywords: systems biology, protein-protein interaction network, disease gene, search engine algorithm, random walk, disease similarity.
Graphical Abstract