Generic placeholder image

Current Bioinformatics

Editor-in-Chief

ISSN (Print): 1574-8936
ISSN (Online): 2212-392X

Research Article

TP-MV: Therapeutic Peptides Prediction by Multi-view Learning

Author(s): Ke Yan, Hongwu Lv, Jie Wen, Yichen Guo and Bin Liu*

Volume 17, Issue 2, 2022

Published on: 29 December, 2021

Page: [174 - 183] Pages: 10

DOI: 10.2174/1574893617666211220153429

Price: $65

conference banner
Abstract

Background: Therapeutic peptide prediction is critical for drug development and therapy. Researchers have been studying this essential task, developing several computational methods to identify different therapeutic peptide types.

Objective: Most predictors are the specific methods for certain peptides. Currently, developing methods to predict the presence of multiple peptides remains a challenging problem. Moreover, it is still challenging to combine different features to make the therapeutic prediction.

Methods: In this paper, we proposed a new ensemble method TP-MV for general therapeutic peptide recognition. TP-MV is developed using the stacking framework in conjunction with the KNN, SVM, ET, RF, and XGB. Then TP-MV constructs a multi-view learning model as meta-classifiers to extract the discriminative feature for different peptides.

Results: In the experiment, the proposed method outperforms the other existing methods on the benchmark datasets, indicating that the proposed method has the ability to predict multiple therapeutic peptides simultaneously.

Conclusion: The TP-MV is a useful tool for predicting therapeutic peptides.

Keywords: Therapeutic peptide recognition, stacking method, multi-view learning method, ensemble learning, sequence analysis, AAC.

Graphical Abstract

[1]
Cai L, Wang L, Fu X, Xia C, Zeng X, Zou Q. ITP-Pred: An interpretable method for predicting, therapeutic peptides with fused features low-dimension representation. Brief Bioinform 2020; 22(4): bbaa367.
[PMID: 33313672]
[2]
Chowdhury FT, Shohan MUS, Islam T, Mimu TT, Palit P. A therapeutic approach against Leishmania donovani by predicting RNAi molecules against the surface protein, GP63. Curr Bioinform 2019; 14: 541-50.
[http://dx.doi.org/10.2174/1574893613666180828095737]
[3]
Wei L, Zhou C, Chen H, Song J, Su R. ACPred-FL: A sequence-based predictor using effective feature representation to improve the prediction of anti-cancer peptides. Bioinformatics 2018; 34(23): 4007-16.
[http://dx.doi.org/10.1093/bioinformatics/bty451] [PMID: 29868903]
[4]
Liang X, Li F, Chen J, et al. Large-scale comparative review and assessment of computational methods for anti-cancer peptide identification. Brief Bioinform 2020; 22(4): bbaa312.
[PMID: 33316035]
[5]
Qiang X, Zhou C, Ye X, Du PF, Su R, Wei L. CPPred-FL: A sequence-based predictor for large-scale identification of cell-penetrating peptides by feature representation learning. Brief Bioinform 2018; 21: 11-23.
[http://dx.doi.org/10.1093/bib/bby091] [PMID: 30239616]
[6]
Zhang J, Liu B. A review on the recent developments of sequence-based protein feature extraction methods. Curr Bioinform 2019; 14: 190-9.
[http://dx.doi.org/10.2174/1574893614666181212102749]
[7]
Shen HB, Chou KC. PseAAC: A flexible web server for generating various kinds of protein pseudo amino acid composition. Anal Biochem 2008; 373(2): 386-8.
[http://dx.doi.org/10.1016/j.ab.2007.10.012] [PMID: 17976365]
[8]
Naseer S, Hussain W, Khan YD, Rasool N. Sequence-based identification of arginine amidation sites in proteins using deep representations of proteins and PseAAC. Curr Bioinform 2020; 15: 937-48.
[http://dx.doi.org/10.2174/1574893615666200129110450]
[9]
Khan YD, Alzahrani E, Alghamdi W, Ullah MZ. Sequence-based identification of allergen proteins developed by integration of PseAAC and statistical moments via 5-step rule. Curr Bioinform 2020; 15: 1046-55.
[http://dx.doi.org/10.2174/1574893615999200424085947]
[10]
Hasan MAM, Ben Islam MK, Rahman J, Ahmad S. Citrullination site prediction by incorporating sequence coupled effects into PseAAC and resolving data imbalance issue. Curr Bioinform 2020; 15: 235-45.
[http://dx.doi.org/10.2174/1574893614666191202152328]
[11]
Amanat S, Ashraf A, Hussain W, Rasool N, Khan YD. Identification of lysine carboxylation sites in proteins by integrating statistical moments and position relative features via general PseAAC. Curr Bioinform 2020; 15: 396-407.
[http://dx.doi.org/10.2174/1574893614666190723114923]
[12]
Zhang YP, Zou Q. PPTPP: A novel therapeutic peptide prediction method using physicochemical property encoding and adaptive feature representation learning. Bioinformatics 2020; 36(13): 3982-7.
[http://dx.doi.org/10.1093/bioinformatics/btaa275] [PMID: 32348463]
[13]
Tyagi A, Tuknait A, Anand P, et al. CancerPPD: A database of anticancer peptides and proteins. Nucleic Acids Res 2015; 43(Database issue): D837-43.
[http://dx.doi.org/10.1093/nar/gku892] [PMID: 25270878]
[14]
Chen W, Ding H, Feng P, Lin H, Chou K-C. iACP: A sequence-based tool for identifying anticancer peptides. Oncotarget 2016; 7(13): 16895-909.
[http://dx.doi.org/10.18632/oncotarget.7815] [PMID: 26942877]
[15]
Liu B. BioSeq-Analysis: A platform for DNA, RNA and protein sequence analysis based on machine learning approaches. Brief Bioinform 2019; 20(4): 1280-94.
[http://dx.doi.org/10.1093/bib/bbx165] [PMID: 29272359]
[16]
Liu B, Gao X, Zhang H. BioSeq-Analysis2.0: An updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches. Nucleic Acids Res 2019; 47(20): e127.
[http://dx.doi.org/10.1093/nar/gkz740] [PMID: 31504851]
[17]
Liu B, Liu F, Wang X, Chen J, Fang L, Chou K-C. Pse-in-One: A web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic Acids Res 2015; 43(W1): W65-71.
[http://dx.doi.org/10.1093/nar/gkv458] [PMID: 25958395]
[18]
Schaduangrat N, Nantasenamat C, Prachayasittikul V, Shoombuatong W. ACPred: A Computational tool for the prediction and analysis of anticancer peptides. Molecules 2019; 24(10): 1973.
[http://dx.doi.org/10.3390/molecules24101973] [PMID: 31121946]
[19]
Wei L, Zhou C, Su R, Zou Q. PEPred-Suite: Improved and robust prediction of therapeutic peptides using adaptive feature representation learning. Bioinformatics 2019; 35(21): 4272-80.
[http://dx.doi.org/10.1093/bioinformatics/btz246] [PMID: 30994882]
[20]
Hasan MM, Schaduangrat N, Basith S, Lee G, Shoombuatong W, Manavalan B. HLPpred-Fuse: Improved and robust prediction of hemolytic peptide and its activity by fusing multiple feature representation. Bioinformatics 2020; 36(11): 3350-6.
[http://dx.doi.org/10.1093/bioinformatics/btaa160] [PMID: 32145017]
[21]
Wei L, Zou Q. Recent progress in machine learning-based methods for protein fold recognition. Int J Mol Sci 2016; 17(12): 2118.
[http://dx.doi.org/10.3390/ijms17122118] [PMID: 27999256]
[22]
Yan K, Fang X, Xu Y, Liu B. Protein fold recognition based on multi-view modeling. Bioinformatics 2019; 35(17): 2982-90.
[http://dx.doi.org/10.1093/bioinformatics/btz040] [PMID: 30668845]
[23]
Lata S, Sharma BK, Raghava GP. Analysis and prediction of antibacterial peptides. BMC Bioinformatics 2007; 8: 263.
[http://dx.doi.org/10.1186/1471-2105-8-263] [PMID: 17645800]
[24]
Wei L, Xing P, Su R, Shi G, Ma ZS, Zou Q. CPPred-RF: A sequence-based predictor for identifying cell-penetrating peptides and their uptake efficiency. J Proteome Res 2017; 16(5): 2044-53.
[http://dx.doi.org/10.1021/acs.jproteome.7b00019] [PMID: 28436664]
[25]
Rajput A, Gupta AK, Kumar M. Prediction and analysis of quorum sensing peptides based on sequence features. PLoS One 2015; 10(3)e0120066
[http://dx.doi.org/10.1371/journal.pone.0120066] [PMID: 25781990]
[26]
Khatun MS, Hasan MM, Shoombuatong W, Kurata H. ProIn-Fuse: Improved and robust prediction of proinflammatory peptides by fusing of multiple feature representations. J Comput Aided Mol Des 2020; 34(12): 1229-36.
[http://dx.doi.org/10.1007/s10822-020-00343-9] [PMID: 32964284]
[27]
Wu C, Gao R, Zhang Y, De Marinis Y. PTPD: Predicting therapeutic peptides by deep learning and word2vec. BMC Bioinformatics 2019; 20(1): 456.
[http://dx.doi.org/10.1186/s12859-019-3006-z] [PMID: 31492094]
[28]
Altschul SF, Madden TL, Schäffer AA, et al. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res 1997; 25(17): 3389-402.
[http://dx.doi.org/10.1093/nar/25.17.3389] [PMID: 9254694]
[29]
Remmert M, Biegert A, Hauser A, Söding J. HHblits: Lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods 2011; 9(2): 173-5.
[http://dx.doi.org/10.1038/nmeth.1818] [PMID: 22198341]
[30]
Zou Q, Hu Q, Guo M, Wang G. HAlign: Fast multiple similar DNA/RNA sequence alignment based on the centre star strategy. Bioinformatics 2015; 31(15): 2475-81.
[http://dx.doi.org/10.1093/bioinformatics/btv177] [PMID: 25812743]
[31]
Liu B, Wang X, Lin L, Dong Q, Wang X. A discriminative method for protein remote homology detection and fold recognition combining Top-n-grams and latent semantic analysis. BMC Bioinformatics 2008; 9: 510.
[http://dx.doi.org/10.1186/1471-2105-9-510] [PMID: 19046430]
[32]
Liu B, Zhang D, Xu R, et al. Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection. Bioinformatics 2014; 30(4): 472-9.
[http://dx.doi.org/10.1093/bioinformatics/btt709] [PMID: 24318998]
[33]
Liu B, Xu J, Lan X, et al. iDNA-Prot|dis: Identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition. PLoS One 2014; 9(9): e106691.
[http://dx.doi.org/10.1371/journal.pone.0106691] [PMID: 25184541]
[34]
Chou KC. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins 2001; 43(3): 246-55.
[http://dx.doi.org/10.1002/prot.1035] [PMID: 11288174]
[35]
Wang N, Zhang J, Liu B. IDRBP-PPCT: Identifying nucleic acid-binding proteins based on position-specific score matrix and position-specific frequency matrix cross transformation. IEEE/ACM Trans Comput Biol Bioinformatics 2021. [Epub ahead of print].
[http://dx.doi.org/10.1109/TCBB.2021.3069263] [PMID: 33780341]
[36]
Xu R, Zhou J, Wang H, He Y, Wang X, Liu B. Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation. BMC Syst Biol 2015; 9(Suppl. 1): S10.
[http://dx.doi.org/10.1186/1752-0509-9-S1-S10] [PMID: 25708928]
[37]
Zhang J, Liu B. Psfm-dbt: Identifying DNA-binding proteins by combing position specific frequency matrix and distance-bigram transformation. Int J Mol Sci 2017; 18(9): 1856.
[http://dx.doi.org/10.3390/ijms18091856] [PMID: 28841194]
[38]
Hu J, Han K, Li Y, Yang J-Y, Shen H-B, Yu D-J. TargetCrys: Protein crystallization prediction by fusing multi-view features with two-layered SVM. Amino Acids 2016; 48(11): 2533-47.
[http://dx.doi.org/10.1007/s00726-016-2274-4] [PMID: 27299433]
[39]
Xiang S, Nie F, Meng G, Pan C, Zhang C. Discriminative least squares regression for multiclass classification and feature selection. Neural Networks Learn Sys 2012; 23: 1738-54.
[40]
Fu X, Cai L, Zeng X, Zou Q. StackCPPred: A stacking and pairwise energy content-based prediction of cell-penetrating peptides and their uptake efficiency. Bioinformatics 2020; 36(10): 3028-34.
[http://dx.doi.org/10.1093/bioinformatics/btaa131] [PMID: 32105326]
[41]
Maier O, Wilms M, von der Gablentz J, Krämer UM, Münte TF, Handels H. Extra tree forests for sub-acute ischemic stroke lesion segmentation in MR sequences. J Neurosci Methods 2015; 240: 89-100.
[http://dx.doi.org/10.1016/j.jneumeth.2014.11.011] [PMID: 25448384]
[42]
Weinberger KQ, Saul LK. Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 2009; 10: 207-44.
[43]
Yang C, Chen M, Yuan Q. The application of XGBoost and SHAP to examining the factors in freight truck-related crashes: An exploratory analysis. Accid Anal Prev 2021; 158: 106153.
[http://dx.doi.org/10.1016/j.aap.2021.106153 ] [PMID: 34034073]
[44]
Lundberg S, Lee S-I. A unified approach to interpreting model predictions. arXiv 2017; 2017: 1705.07874.

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy