Abstract
Non-coding RNAs (ncRNAs) play significant roles in various physiological and pathological proces ses via interacting with the proteins. The existing experimental methods used for predicting ncRNA-protein interactions are costly and time-consuming. Therefore, an increasing number of machine learning models have been developed to efficiently predict ncRNA-protein interactions (ncRPIs), including shallow machine learning and deep learning models, which have achieved dramatic advancements on the identification of ncRPIs. In this review, we provided an overview of the recent advances in various machine learning methods for predicting ncRPIs, mainly focusing on ncRNAs-protein interaction databases, classical datasets, ncRNA/protein sequence encoding methods, conventional machine learning-based models, deep learning-based models, and the two integration- based models. Furthermore, we compared the reported accuracy of these approaches and discussed the potential and limitation of deep learning applications in ncRPIs. Finding that the predictive performance of integrated deep learning is the best, and those deep learning-based methods do not always perform better than shallow machine learning-based methods. We discussed the potential of using deep learning and proposed a research approach on the basis of the existing research. We believe that the model based on integrated deep learning is able to achieve a higher accuracy in the prediction if substantial experimental data were available in the near future.
Keywords: ncRNA-protein interactions, ncRNAs-protein interaction databases, classical datasets, sequence encoding methods, conventional machine learning and deep learning, genome sequencing.
Graphical Abstract