TP-MV: Therapeutic Peptides Prediction by Multi-view Learning

Ke       Yan; Hongwu       Lv; Jie      Wen; Yichen      Guo; Bin       Liu

doi:10.2174/1574893617666211220153429

Abstract

Background: Therapeutic peptide prediction is critical for drug development and therapy. Researchers have been studying this essential task, developing several computational methods to identify different therapeutic peptide types.

Objective: Most predictors are the specific methods for certain peptides. Currently, developing methods to predict the presence of multiple peptides remains a challenging problem. Moreover, it is still challenging to combine different features to make the therapeutic prediction.

Methods: In this paper, we proposed a new ensemble method TP-MV for general therapeutic peptide recognition. TP-MV is developed using the stacking framework in conjunction with the KNN, SVM, ET, RF, and XGB. Then TP-MV constructs a multi-view learning model as meta-classifiers to extract the discriminative feature for different peptides.

Results: In the experiment, the proposed method outperforms the other existing methods on the benchmark datasets, indicating that the proposed method has the ability to predict multiple therapeutic peptides simultaneously.

Conclusion: The TP-MV is a useful tool for predicting therapeutic peptides.

Keywords: Therapeutic peptide recognition, stacking method, multi-view learning method, ensemble learning, sequence analysis, AAC.

« Previous Next »

Graphical Abstract

[1] 
Cai L, Wang L, Fu X, Xia C, Zeng X, Zou Q. ITP-Pred: An interpretable
method for predicting, therapeutic peptides with fused features
low-dimension representation. Brief Bioinform  2020; 22(4): bbaa367.
[PMID:  33313672] 
[2] 
Chowdhury FT, Shohan MUS, Islam T, Mimu TT, Palit P. A therapeutic approach against Leishmania donovani by predicting RNAi molecules against the surface protein, GP63. Curr Bioinform  2019; 14: 541-50.
[http://dx.doi.org/10.2174/1574893613666180828095737] 
[3] 
Wei L, Zhou C, Chen H, Song J, Su R. ACPred-FL: A sequence-based predictor using effective feature representation to improve the prediction of anti-cancer peptides. Bioinformatics  2018; 34(23): 4007-16.
[http://dx.doi.org/10.1093/bioinformatics/bty451] [PMID:  29868903] 
[4] 
Liang X, Li F, Chen J, et al. Large-scale comparative review and assessment of computational methods for anti-cancer peptide identification. Brief Bioinform  2020; 22(4): bbaa312.
[PMID: 33316035] 
[5] 
Qiang X, Zhou C, Ye X, Du PF, Su R, Wei L. CPPred-FL: A sequence-based predictor for large-scale identification of cell-penetrating peptides by feature representation learning. Brief Bioinform  2018; 21: 11-23.
[http://dx.doi.org/10.1093/bib/bby091] [PMID:  30239616] 
[6] 
Zhang J, Liu B. A review on the recent developments of sequence-based protein feature extraction methods. Curr Bioinform  2019; 14: 190-9.
[http://dx.doi.org/10.2174/1574893614666181212102749] 
[7] 
Shen HB, Chou KC. PseAAC: A flexible web server for generating various kinds of protein pseudo amino acid composition. Anal Biochem  2008; 373(2): 386-8.
[http://dx.doi.org/10.1016/j.ab.2007.10.012] [PMID:  17976365] 
[8] 
Naseer S, Hussain W, Khan YD, Rasool N. Sequence-based identification of arginine amidation sites in proteins using deep representations of proteins and PseAAC. Curr Bioinform  2020; 15: 937-48.
[http://dx.doi.org/10.2174/1574893615666200129110450] 
[9] 
Khan YD, Alzahrani E, Alghamdi W, Ullah MZ. Sequence-based identification of allergen proteins developed by integration of PseAAC and statistical moments via 5-step rule. Curr Bioinform  2020; 15: 1046-55.
[http://dx.doi.org/10.2174/1574893615999200424085947] 
[10] 
Hasan MAM, Ben Islam MK, Rahman J, Ahmad S. Citrullination site prediction by incorporating sequence coupled effects into PseAAC and resolving data imbalance issue. Curr Bioinform  2020; 15: 235-45.
[http://dx.doi.org/10.2174/1574893614666191202152328] 
[11] 
Amanat S, Ashraf A, Hussain W, Rasool N, Khan YD. Identification of lysine carboxylation sites in proteins by integrating statistical moments and position relative features via general PseAAC. Curr Bioinform  2020; 15: 396-407.
[http://dx.doi.org/10.2174/1574893614666190723114923] 
[12] 
Zhang YP, Zou Q. PPTPP: A novel therapeutic peptide prediction method using physicochemical property encoding and adaptive feature representation learning. Bioinformatics  2020; 36(13): 3982-7.
[http://dx.doi.org/10.1093/bioinformatics/btaa275] [PMID:  32348463] 
[13] 
Tyagi A, Tuknait A, Anand P, et al. CancerPPD: A database of anticancer peptides and proteins. Nucleic Acids Res  2015; 43(Database issue): D837-43.
[http://dx.doi.org/10.1093/nar/gku892] [PMID:  25270878] 
[14] 
Chen W, Ding H, Feng P, Lin H, Chou K-C. iACP: A sequence-based tool for identifying anticancer peptides. Oncotarget  2016; 7(13): 16895-909.
[http://dx.doi.org/10.18632/oncotarget.7815] [PMID:  26942877] 
[15] 
Liu B. BioSeq-Analysis: A platform for DNA, RNA and protein sequence analysis based on machine learning approaches. Brief Bioinform  2019; 20(4): 1280-94.
[http://dx.doi.org/10.1093/bib/bbx165] [PMID:  29272359] 
[16] 
Liu B, Gao X, Zhang H. BioSeq-Analysis2.0: An updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches. Nucleic Acids Res  2019; 47(20): e127.
[http://dx.doi.org/10.1093/nar/gkz740] [PMID:  31504851] 
[17] 
Liu B, Liu F, Wang X, Chen J, Fang L, Chou K-C. Pse-in-One: A web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic Acids Res  2015; 43(W1): W65-71.
[http://dx.doi.org/10.1093/nar/gkv458] [PMID:  25958395] 
[18] 
Schaduangrat N, Nantasenamat C, Prachayasittikul V, Shoombuatong W. ACPred: A Computational tool for the prediction and analysis of anticancer peptides. Molecules  2019; 24(10): 1973.
[http://dx.doi.org/10.3390/molecules24101973] [PMID:  31121946] 
[19] 
Wei L, Zhou C, Su R, Zou Q. PEPred-Suite: Improved and robust prediction of therapeutic peptides using adaptive feature representation learning. Bioinformatics  2019; 35(21): 4272-80.
[http://dx.doi.org/10.1093/bioinformatics/btz246] [PMID:  30994882] 
[20] 
Hasan MM, Schaduangrat N, Basith S, Lee G, Shoombuatong W, Manavalan B. HLPpred-Fuse: Improved and robust prediction of hemolytic peptide and its activity by fusing multiple feature representation. Bioinformatics  2020; 36(11): 3350-6.
[http://dx.doi.org/10.1093/bioinformatics/btaa160] [PMID:  32145017] 
[21] 
Wei L, Zou Q. Recent progress in machine learning-based methods for protein fold recognition. Int J Mol Sci  2016; 17(12): 2118.
[http://dx.doi.org/10.3390/ijms17122118] [PMID:  27999256] 
[22] 
Yan K, Fang X, Xu Y, Liu B. Protein fold recognition based on multi-view modeling. Bioinformatics  2019; 35(17): 2982-90.
[http://dx.doi.org/10.1093/bioinformatics/btz040] [PMID:  30668845] 
[23] 
Lata S, Sharma BK, Raghava GP. Analysis and prediction of antibacterial peptides. BMC Bioinformatics  2007; 8: 263.
[http://dx.doi.org/10.1186/1471-2105-8-263] [PMID:  17645800] 
[24] 
Wei L, Xing P, Su R, Shi G, Ma ZS, Zou Q. CPPred-RF: A sequence-based predictor for identifying cell-penetrating peptides and their uptake efficiency. J Proteome Res  2017; 16(5): 2044-53.
[http://dx.doi.org/10.1021/acs.jproteome.7b00019] [PMID:  28436664] 
[25] 
Rajput A, Gupta AK, Kumar M. Prediction and analysis of quorum sensing peptides based on sequence features. PLoS One  2015; 10(3)e0120066
[http://dx.doi.org/10.1371/journal.pone.0120066] [PMID:  25781990] 
[26] 
Khatun MS, Hasan MM, Shoombuatong W, Kurata H. ProIn-Fuse: Improved and robust prediction of proinflammatory peptides by fusing of multiple feature representations. J Comput Aided Mol Des  2020; 34(12): 1229-36.
[http://dx.doi.org/10.1007/s10822-020-00343-9] [PMID:  32964284] 
[27] 
Wu C, Gao R, Zhang Y, De Marinis Y. PTPD: Predicting therapeutic peptides by deep learning and word2vec. BMC Bioinformatics  2019; 20(1): 456.
[http://dx.doi.org/10.1186/s12859-019-3006-z] [PMID:  31492094] 
[28] 
Altschul SF, Madden TL, Schäffer AA, et al. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res  1997; 25(17): 3389-402.
[http://dx.doi.org/10.1093/nar/25.17.3389] [PMID:  9254694] 
[29] 
Remmert M, Biegert A, Hauser A, Söding J. HHblits: Lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods  2011; 9(2): 173-5.
[http://dx.doi.org/10.1038/nmeth.1818] [PMID:  22198341] 
[30] 
Zou Q, Hu Q, Guo M, Wang G. HAlign: Fast multiple similar DNA/RNA sequence alignment based on the centre star strategy. Bioinformatics  2015; 31(15): 2475-81.
[http://dx.doi.org/10.1093/bioinformatics/btv177] [PMID:  25812743] 
[31] 
Liu B, Wang X, Lin L, Dong Q, Wang X. A discriminative method for protein remote homology detection and fold recognition combining Top-n-grams and latent semantic analysis. BMC Bioinformatics  2008; 9: 510.
[http://dx.doi.org/10.1186/1471-2105-9-510] [PMID:  19046430] 
[32] 
Liu B, Zhang D, Xu R, et al. Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection. Bioinformatics  2014; 30(4): 472-9.
[http://dx.doi.org/10.1093/bioinformatics/btt709] [PMID:  24318998] 
[33] 
Liu B, Xu J, Lan X, et al. iDNA-Prot|dis: Identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition. PLoS One  2014; 9(9): e106691.
[http://dx.doi.org/10.1371/journal.pone.0106691] [PMID:  25184541] 
[34] 
Chou KC. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins  2001; 43(3): 246-55.
[http://dx.doi.org/10.1002/prot.1035] [PMID:  11288174] 
[35] 
Wang N, Zhang J, Liu B. IDRBP-PPCT: Identifying nucleic acid-binding proteins based on position-specific score matrix and position-specific frequency matrix cross transformation. IEEE/ACM Trans Comput Biol Bioinformatics 2021. [Epub ahead of print].
[http://dx.doi.org/10.1109/TCBB.2021.3069263] [PMID:  33780341] 
[36] 
Xu R, Zhou J, Wang H, He Y, Wang X, Liu B. Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation. BMC Syst Biol  2015; 9(Suppl. 1): S10.
[http://dx.doi.org/10.1186/1752-0509-9-S1-S10] [PMID:  25708928] 
[37] 
Zhang J, Liu B. Psfm-dbt: Identifying DNA-binding proteins by combing position specific frequency matrix and distance-bigram transformation. Int J Mol Sci  2017; 18(9): 1856.
[http://dx.doi.org/10.3390/ijms18091856] [PMID:  28841194] 
[38] 
Hu J, Han K, Li Y, Yang J-Y, Shen H-B, Yu D-J. TargetCrys: Protein crystallization prediction by fusing multi-view features with two-layered SVM. Amino Acids  2016; 48(11): 2533-47.
[http://dx.doi.org/10.1007/s00726-016-2274-4] [PMID:  27299433] 
[39] 
Xiang S, Nie F, Meng G, Pan C, Zhang C. Discriminative least squares regression for multiclass classification and feature selection. Neural Networks Learn Sys  2012; 23: 1738-54.
[40] 
Fu X, Cai L, Zeng X, Zou Q. StackCPPred: A stacking and pairwise energy content-based prediction of cell-penetrating peptides and their uptake efficiency. Bioinformatics  2020; 36(10): 3028-34.
[http://dx.doi.org/10.1093/bioinformatics/btaa131] [PMID:  32105326] 
[41] 
Maier O, Wilms M, von der Gablentz J, Krämer UM, Münte TF, Handels H. Extra tree forests for sub-acute ischemic stroke lesion segmentation in MR sequences. J Neurosci Methods  2015; 240: 89-100.
[http://dx.doi.org/10.1016/j.jneumeth.2014.11.011] [PMID:  25448384] 
[42] 
Weinberger KQ, Saul LK. Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res  2009; 10: 207-44.
[43] 
Yang C, Chen M, Yuan Q. The application of XGBoost and SHAP
to examining the factors in freight truck-related crashes: An exploratory
analysis. Accid Anal Prev  2021; 158: 106153.
[http://dx.doi.org/10.1016/j.aap.2021.106153 ] [PMID:  34034073] 
[44] 
Lundberg S, Lee S-I. A unified approach to interpreting model
predictions. arXiv  2017; 2017: 1705.07874.

Rights & Permissions Print Cite

Article Metrics

25

1

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/1574893617666211220153429	Print ISSN 1574-8936
Publisher Name Bentham Science Publisher	Online ISSN 2212-392X

Current Bioinformatics

TP-MV: Therapeutic Peptides Prediction by Multi-view Learning

Abstract Play Pause

Graphical Abstract

Related Journals

Related Books

Abstract