Abstract
Background: In cases where only a single group (or class) of samples is available for a given problem, positive unlabeled learning algorithms can be applied. One such case is the interactions between various biological/chemical entity pairs, where only the set of interacting entities can be collected, not the “noninteracting” ones.
Objective: We aim to improve the performance of deriving protein-protein and protein-ligand interactions. We argue that the positive-unlabeled learning algorithms can be applied to this problem.
Method: In this paper, we propose some modifications to two of the existing methods for protein-protein and protein-ligand interaction network derivation. First, we extend the algorithms to use Random Forests and then we devise an ensemble classifier from these two based on voting.
Results: We report the evaluation results of the proposed algorithms in comparison to the original methods and well-known biological network derivation algorithms. We achieved significant improvements in terms of different metrics.
Conclusion: The results are promising in the sense that proposed methods either perform competitively or better than previous methods. This motivates us in applying the proposed methods to other data sets and similar problems.
Keywords: Binary classification, positive unlabeled learning, protein-ligand interaction networks, protein-protein interaction networkS, random forests, support vector machines.
Graphical Abstract