Abstract
Background: RNA-binding protein plays an important role in regulating splicing, RNA transport, and other post-transcriptional processes, identifying special RNA binding domains, and interacting with RNA.
Objective: This paper proposes a deep learning framework, DeepFusion-RBP, composed of three submodels. A sliding window is used to obtain sub-sequences, local features are obtained, and then the model is customized for each feature.
Methods: The main advantage of this research is using the sliding window method to cut the original sequence. While expanding the data set, this method avoids filling in too much meaningless data. Then, the model is customized for each feature to accurately perform RNA binding protein classification, with specific methods such as LSTM, Conv1D, Amino acid embedding, etc.
Results: To test whether the customized model can improve the final prediction effect, we used different combinations of sub-models and test sets of different lengths. The prediction ACC, F1-score and MCC of DeepFusion-RBP are 92.62%, 91.29%, and 84.96%, respectively, with cross-validation. At the same time, DeepFusion-RBP also showed excellent performance on three independent verification sets.
Conclusion: The results of 10-fold cross-validation and the independent verification set tests both suggested that the proposed models for different features and intercepting sub-sequences produce a certain improvement in the prediction effect of the model. The data supporting the findings of the article are available at https://github.com/mmwangxu/DeepFusion-RBP-tool.
Keywords: RNA-binding protein, LSTM, deep learning, PSSM, protein sequence, word embedding.
Graphical Abstract