Abstract
Background: As a crucial component of the entire protein-protein interaction (PPI) network, protein-peptide interactions are ubiquitous in living cells. These interactions play important roles in signaling transduction and regulation. Compared with laborious and time-consuming experimental approaches, predicting protein-peptide interactions with effective computational methods could be convenient and rapid.
Method: This study proposed a novel method for the prediction of interactions between proteins and peptides using various features extracted from both proteins and peptides. The traditional amino acid composition as well as pseudo-amino acid composition and features derived from 205 domains were utilized to represent a protein-peptide interaction. The predictor was constructed based on four different machine learning algorithms including SMO (sequential minimal optimization), IB1 (nearest neighbor algorithm), dagging, and random forest (RF). All features were analyzed by some feature selection technologies, such as the maximum relevance minimum redundancy method and the incremental feature selection method, to extract optimal features. Additionally, an optimal predictor based on IB1 was constructed according to the extracted optimal features.
Results: MCC values of 0.4436 for the cross-validation test of the training set and 0.4444 for the independent test set were obtained with the IB1 algorithm. Different encoding methods were compared. The domain-based method outperformed the pseudo-amino acid composition method. An optimal feature set of 230 features was selected, which contributed most to the prediction of the protein-peptide pairs.
Conclusion: Several important domains related to some features in the optimal feature set were deemed to play key roles in determining the protein-peptide interactions.
Keywords: Protein-peptide interactions, maximum relevance minimum redundancy, incremental feature selection, functional domain composition, pseudo-amino acid composition.
Graphical Abstract