Prediction of Plant Ubiquitylation Proteins and Sites by Fusing Multiple
Features

Meng-Yue      Guan; Wang-Ren      Qiu; Qian-Kun      Wang; Xuan      Xiao

doi:10.2174/1574893618666230908092847

Abstract

Introduction: Protein ubiquitylation is an important post-translational modification (PTM), which is considered to be one of the most important processes regulating cell function and various diseases. Therefore, accurate prediction of ubiquitylation proteins and their PTM sites is of great significance for the study of basic biological processes and the development of related drugs. Researchers have developed some large-scale computational methods to predict ubiquitylation sites, but there is still much room for improvement. Much of the research related to ubiquitylation is cross-species while the life pattern is diversified, and the prediction method always shows its specificity in practical application. This study just aims at the issue of plants and has constructed computational methods for identifying ubiquitylation protein and ubiquitylation sites.

Methods: In this work, we constructed two predictive models to identify plant ubiquitylation proteins and sites. First, in the ubiquitylation proteins prediction model, in order to better reflect protein sequence information and obtain better prediction results, the KNN scoring matrix model based on functional domain Gene Ontology (GO) annotation and word embedding model, i.e. Skip-Gram and Continuous Bag of Words (CBOW), are used to extract the features, and the light gradient boosting machine (LGBM) is selected as the ubiquitylation proteins prediction engine.

Results: As a result, accuracy (ACC), Precision, recall rate (Recall), F1_score and AUC are respectively 85.12%, 80.96%, 72.80%, 76.37% and 0.9193 in the 10-fold cross-validations on independent dataset. In the ubiquitylation sites prediction model, Skip-Gram, CBOW and enhanced amino acid composition (EAAC) feature extraction codes were used to extract protein sequence fragment features, and the predicted results on training and independent test data have also achieved good performance.

Conclusion: In a word, the comparison results demonstrate that our models have a decided advantage in predicting ubiquitylation proteins and sites, and it may provide useful insights for studying the mechanisms and modulation of ubiquitination pathways. The datasets and source codes used in this study are available at: https://github.com/gmywqk/Ub-PS-Fuse.

« Previous Next »

[1]
He D, Li M, Damaris RN, Bu C, Xue J, Yang P. Quantitative ubiquitylomics approach for characterizing the dynamic change and extensive modulation of ubiquitylation in rice seed germination. Plant J  2020; 101(6): 1430-47.
 [http://dx.doi.org/10.1111/tpj.14593] [PMID: 31677306]

[2]
Yadav S, Gupta M, Bist AS. Prediction of ubiquitination sites using UbiNets. Adv Fuzzy Syst  2018; 2018: 1-10.
 [http://dx.doi.org/10.1155/2018/5125103]

[3]
Xu G, Jaffrey SR. The new landscape of protein ubiquitination. Nat Biotechnol  2011; 29(12): 1098-100.
 [http://dx.doi.org/10.1038/nbt.2061] [PMID: 22158364]

[4]
Starita L, Parvin JD. The multiple nuclear functions of BRCA1: Transcription, ubiquitination and DNA repair. Curr Opin Cell Biol  2003; 15(3): 345-50.
 [http://dx.doi.org/10.1016/S0955-0674(03)00042-5] [PMID: 12787778]

[5]
Park HB, Kim JW, Baek KH. Regulation of Wnt signaling through ubiquitination and deubiquitination in cancers. Int J Mol Sci  2020; 21(11): 3904.
 [http://dx.doi.org/10.3390/ijms21113904] [PMID: 32486158]

[6]
Porro A, Berti M, Pizzolato J, et al. FAN1 interaction with ubiquitylated PCNA alleviates replication stress and preserves genomic integrity independently of BRCA2. Nat Commun  2017; 8(1): 1073.
 [http://dx.doi.org/10.1038/s41467-017-01074-6] [PMID: 29051491]

[7]
Stankovic-Valentin N, Melchior F. Control of SUMO and Ubiquitin by ROS: Signaling and disease implications. Mol Aspects Med  2018; 63: 3-17.
 [http://dx.doi.org/10.1016/j.mam.2018.07.002] [PMID: 30059710]

[8]
Corn JE, Vucic D. Ubiquitin in inflammation: The right linkage makes all the difference. Nat Struct Mol Biol  2014; 21(4): 297-300.
 [http://dx.doi.org/10.1038/nsmb.2808] [PMID: 24699077]

[9]
Tung CW, Ho SY. Computational identification of ubiquitylation sites from protein sequences. BMC Bioinformatics  2008; 9(1): 310.
 [http://dx.doi.org/10.1186/1471-2105-9-310] [PMID: 18625080]

[10]
Tsuchida S, Satoh M, Takiwaki M, Nomura F. Ubiquitination in periodontal disease: A review. Int J Mol Sci  2017; 18(7): 1476.
 [http://dx.doi.org/10.3390/ijms18071476] [PMID: 28698506]

[11]
Chan CH, Jo U, Kohrman A, et al. Posttranslational regulation of Akt in human cancer. Cell Biosci  2014; 4(1): 59.
 [http://dx.doi.org/10.1186/2045-3701-4-59] [PMID: 25309720]

[12]
Schmidt MF, Gan ZY, Komander D, Dewson G. Ubiquitin signalling in neurodegeneration: Mechanisms and therapeutic opportunities. Cell Death Differ  2021; 28(2): 570-90.
 [http://dx.doi.org/10.1038/s41418-020-00706-7] [PMID: 33414510]

[13]
Yamada T, Murata D, Adachi Y, et al. Mitochondrial stasis reveals p62-mediated ubiquitination in Parkin-independent mitophagy and mitigates nonalcoholic fatty liver disease. Cell Metab  2018; 28(4): 588-604.e5.
 [http://dx.doi.org/10.1016/j.cmet.2018.06.014] [PMID: 30017357]

[14]
Lu D, Lin W, Gao X, et al. Direct ubiquitination of pattern recognition receptor FLS2 attenuates plant innate immunity. Science  2011; 332(6036): 1439-42.
 [http://dx.doi.org/10.1126/science.1204903] [PMID: 21680842]

[15]
Marino D, Peeters N, Rivas S. Ubiquitination during plant immune signaling. Plant Physiol  2012; 160(1): 15-27.
 [http://dx.doi.org/10.1104/pp.112.199281] [PMID: 22689893]

[16]
Li F, Zhang Y, Purcell AW, et al. Positive-unlabelled learning of glycosylation sites in the human proteome. BMC Bioinformatics  2019; 20(1): 112.
 [http://dx.doi.org/10.1186/s12859-019-2700-1] [PMID: 30841845]

[17]
Luo F, Wang M, Liu Y, Zhao XM, Li A. DeepPhos: Prediction of protein phosphorylation sites with deep learning. Bioinformatics  2019; 35(16): 2766-73.
 [http://dx.doi.org/10.1093/bioinformatics/bty1051] [PMID: 30601936]

[18]
Chen X, Qiu JD, Shi SP, Suo SB, Huang SY, Liang RP. Incorporating key position and amino acid residue features to identify general and species-specific Ubiquitin conjugation sites. Bioinformatics  2013; 29(13): 1614-22.
 [http://dx.doi.org/10.1093/bioinformatics/btt196] [PMID: 23626001]

[19]
Qiu W, Xu C, Xiao X, Xu D. Computational prediction of ubiquitination proteins using evolutionary profiles and functional domain annotation. Curr Genomics  2019; 20(5): 389-99.
 [http://dx.doi.org/10.2174/1389202919666191014091250] [PMID: 32476995]

[20]
Qiu WR, Sun BQ, Xiao X, Xu D, Chou KC. iPhos-PseEvo: Identifying human phosphorylated proteins by incorporating evolutionary information into general PseAAC via grey system theory. Mol Inform  2017; 36(5-6): 1600010.
 [http://dx.doi.org/10.1002/minf.201600010] [PMID: 28488814]

[21]
Qiu WR, Xu A, Xu ZC, Zhang CH, Xiao X. Identifying acetylation protein by fusing its PseAAC and functional domain annotation. Front Bioeng Biotechnol  2019; 7: 311.
 [http://dx.doi.org/10.3389/fbioe.2019.00311] [PMID: 31867311]

[22]
Qiu W-R, Wang Q-K, Guan M-Y, Jia J-H, Xiao X. Predicting S-nitrosylation proteins and sites by fusing multiple features. Math Biosci Eng  2021; 18(6): 9132-47.
 [http://dx.doi.org/10.3934/mbe.2021450] [PMID: 34814339]

[23]
Qiu WR, Guan MY, Wang QK, Lou LL, Xiao X. Identifying pupylation proteins and sites by incorporating multiple methods. Front Endocrinol  2022; 13: 849549.
 [http://dx.doi.org/10.3389/fendo.2022.849549] [PMID: 35557849]

[24]
Wang H, Wang Z, Li Z, Lee TY. Incorporating deep learning with word embedding to identify plant ubiquitylation sites. Front Cell Dev Biol  2020; 8: 572195.
 [http://dx.doi.org/10.3389/fcell.2020.572195] [PMID: 33102477]

[25]
Siraj A, Lim DY, Tayara H, Chong KT. Ubicomb: A hybrid deep learning model for predicting plant-specific protein ubiquitylation sites. Genes  2021; 12(5): 717.
 [http://dx.doi.org/10.3390/genes12050717] [PMID: 34064731]

[26]
Yin S, Zheng J, Jia C, Zou Q, Lin Z, Shi H. UPFPSR: A ubiquitylation predictor for plant through combining sequence information and random forest. Math Biosci Eng  2022; 19(1): 775-91.
 [http://dx.doi.org/10.3934/mbe.2022035] [PMID: 34903012]

[27]
Xu H, Zhou J, Lin S, Deng W, Zhang Y, Xue Y. PLMD: An updated data resource of protein lysine modifications. J Genet Genomics  2017; 44(5): 243-50.
 [http://dx.doi.org/10.1016/j.jgg.2017.03.007] [PMID: 28529077]

[28]
Boutet E, Lieberherr D, Tognolli M, et al. UniProtKB/Swiss-Prot, the manually annotated section of the UniProt KnowledgeBase: How to use the entry view. In: Plant Bioinformatics.  Springer 2016; pp. 23-54.
 [http://dx.doi.org/10.1007/978-1-4939-3167-5_2]

[29]
Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv  2013; 2013: 13013781.

[30]
Yang KK, Wu Z, Bedbrook CN, Arnold FH, Wren J. Learned protein embeddings for machine learning. Bioinformatics  2018; 34(15): 2642-8.
 [http://dx.doi.org/10.1093/bioinformatics/bty178] [PMID: 29584811]

[31]
Liu B. Text sentiment analysis based on CBOW model and deep learning in big data environment. J Ambient Intell Humaniz Comput  2020; 11(2): 451-8.
 [http://dx.doi.org/10.1007/s12652-018-1095-6]

[32]
The UniProt Consortium. UniProt: The universal protein knowledgebase. Nucleic Acids Res  2017; 45(D1): D158-69.
 [http://dx.doi.org/10.1093/nar/gkw1099] [PMID: 27899622]

[33]
Hasan MAM, Ahmad S. mLysPTMpred: Multiple lysine PTM site prediction using combination of SVM with resolving data imbalance issue. Nat Sci  2018; 10(9): 370-84.
 [http://dx.doi.org/10.4236/ns.2018.109035]

[34]
Wang M, Cui X, Li S, et al. DeepMal: Accurate prediction of protein malonylation sites by deep neural networks. Chemom Intell Lab Syst  2020; 207: 104175.
 [http://dx.doi.org/10.1016/j.chemolab.2020.104175]

[35]
Dou L, Li X, Zhang L, Xiang H, Xu L. iGlu_AdaBoost: Identification of lysine glutarylation using the AdaBoost classifier. J Proteome Res  2021; 20(1): 191-201.
 [http://dx.doi.org/10.1021/acs.jproteome.0c00314] [PMID: 33090794]

[36]
Manavalan B, Shin TH, Kim MO, Lee G. PIP-EL: A new ensemble learning method for improved proinflammatory peptide predictions. Front Immunol  2018; 9: 1783.
 [http://dx.doi.org/10.3389/fimmu.2018.01783] [PMID: 30108593]

[37]
Li F, Chen J, Ge Z, et al. Computational prediction and interpretation of both general and specific types of promoters in Escherichia coli by exploiting a stacked ensemble-learning framework. Brief Bioinform  2021; 22(2): 2126-40.
 [http://dx.doi.org/10.1093/bib/bbaa049] [PMID: 32363397]

[38]
Xie R, Li J, Wang J, et al. DeepVF: A deep learning-based hybrid framework for identifying virulence factors using the stacking strategy. Brief Bioinform  2021; 22(3): bbaa125.
 [http://dx.doi.org/10.1093/bib/bbaa125] [PMID: 32599617]

[39]
Tian L, Feng L, Yang L, Guo Y. Stock price prediction based on LSTM and LightGBM hybrid model. J Supercomput  2022; 78(9): 11768-93.
 [http://dx.doi.org/10.1007/s11227-022-04326-5]

[40]
Liu Y, Yu Z, Chen C, Han Y, Yu B. Prediction of protein crotonylation sites through LightGBM classifier based on SMOTE and elastic net. Anal Biochem  2020; 609: 113903.
 [http://dx.doi.org/10.1016/j.ab.2020.113903] [PMID: 32805274]

[41]
Zhou K, Hu Y, Pan H, et al. Fast prediction of reservoir permeability based on embedded feature selection and LightGBM using direct logging data. Meas Sci Technol  2020; 31(4): 045101.
 [http://dx.doi.org/10.1088/1361-6501/ab4a45]

[42]
Chen C, Zhang Q, Ma Q, Yu B, Light GBM-PPI. LightGBM-PPI: Predicting protein-protein interactions through LightGBM with multi-information fusion. Chemom Intell Lab Syst  2019; 191: 54-64.
 [http://dx.doi.org/10.1016/j.chemolab.2019.06.003]

[43]
Liang W, Luo S, Zhao G, Wu H. Predicting hard rock pillar stability using GBDT, XGBoost, and LightGBM algorithms. Mathematics  2020; 8(5): 765.
 [http://dx.doi.org/10.3390/math8050765]

[44]
Cai CZ, Han LY, Ji ZL, Chen X, Chen YZ. SVM-Prot: Web-based support vector machine software for functional classification of a protein from its primary sequence. Nucleic Acids Res  2003; 31(13): 3692-7.
 [http://dx.doi.org/10.1093/nar/gkg600] [PMID: 12824396]

[45]
Zavaljevski N, Stevens FJ, Reifman J. Support vector machines with selective kernel scaling for protein classification and identification of key amino acid positions. Bioinformatics  2002; 18(5): 689-96.
 [http://dx.doi.org/10.1093/bioinformatics/18.5.689] [PMID: 12050065]

[46]
Gordon AD, Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and regression trees. Biometrics  1984; 40(3): 874.
 [http://dx.doi.org/10.2307/2530946]

[47]
Boulesteix AL, Janitza S, Kruppa J, König IR. Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics. Wiley Interdiscip Rev Data Min Knowl Discov  2012; 2(6): 493-507.
 [http://dx.doi.org/10.1002/widm.1072]

[48]
Ahmad MW, Mourshed M, Rezgui Y. Trees vs Neurons: Comparison between random forest and ANN for high-resolution prediction of building energy consumption. Energy Build  2017; 147: 77-89.
 [http://dx.doi.org/10.1016/j.enbuild.2017.04.038]

[49]
Noble WS. What is a support vector machine? Nat Biotechnol  2006; 24(12): 1565-7.
 [http://dx.doi.org/10.1038/nbt1206-1565] [PMID: 17160063]

[50]
Cui D, Curry D. Prediction in marketing using the support vector machine. Mark Sci  2005; 24(4): 595-615.
 [http://dx.doi.org/10.1287/mksc.1050.0123]

[51]
Tong S, Chang E. Support vector machine active learning for image retrieval. Proceedings of the ninth ACM international conference on Multimedia.  Ottawa, Ontario, Canada. 2001; pp. 107-8.
 [http://dx.doi.org/10.1145/500141.500159]

[52]
Wang D, Liang Y, Xu D. Capsule network for protein post-translational modification site prediction. Bioinformatics  2019; 35(14): 2386-94.
 [http://dx.doi.org/10.1093/bioinformatics/bty977] [PMID: 30520972]

[53]
Xu H, Jia P, Zhao Z. Deep4mC: Systematic assessment and computational prediction for DNA N4-methylcytosine sites by deep learning. Brief Bioinform  2021; 22(3): bbaa099.
 [http://dx.doi.org/10.1093/bib/bbaa099] [PMID: 32578842]

[54]
Soliman NF, Abd Alhalem SM, El-Shafai W, et al. Bidirectional long short-term memory network for taxonomic classification. Intell Autom Soft Comput  2022; 33(1): 103-16.
 [http://dx.doi.org/10.32604/iasc.2022.017691]

[55]
Graves A. Long short-term memory.  In: Supervised Sequence Labelling with Recurrent Neural Networks.  Berlin, Heidelberg: Springer 2012; 385: pp. 37-45.
 [http://dx.doi.org/10.1007/978-3-642-24797-2_4]

[56]
Qiao Y, Zhu X, Gong H. BERT-Kcr: Prediction of lysine crotonylation sites by a transfer learning method with pre-trained BERT models. Bioinformatics  2022; 38(3): 648-54.
 [http://dx.doi.org/10.1093/bioinformatics/btab712] [PMID: 34643684]

[57]
Xiao X, Shao YT, Cheng X, Stamatovic B. iAMP-CA2L: A new CNN-BiLSTM-SVM classifier based on cellular automata image for identifying antimicrobial peptides and their functional types. Brief Bioinform  2021; 22(6): bbab209.
 [http://dx.doi.org/10.1093/bib/bbab209] [PMID: 34086856]

[58]
Chen W, Chen G, Zhao L, Chen CYC. Predicting drug–target interactions with deep-embedding learning of graphs and sequences. J Phys Chem A  2021; 125(25): 5633-42.
 [http://dx.doi.org/10.1021/acs.jpca.1c02419] [PMID: 34142824]

Rights & Permissions Print Cite

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/1574893618666230908092847	Print ISSN 1574-8936
Publisher Name Bentham Science Publisher	Online ISSN 2212-392X

1 x Application (74.37%)	389ms
1 x Booting (25.63%)	134ms

Bindings	0: 59467608 1: 59467608 2: 59467608 3: 59467608 4: 59467608 5: 59467608 6: 59467608 7: 59467608 8: 59467608 9: 59467608 10: 59467608 11: 59467608 12: 59467608 13: 59467608 14: 59467608 15: 59467608
Backtrace	14. app/Models/UserAccess/UserAccess.php:784 15. middleware::check_user_access:114 16. vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php:183 17. vendor/laravel/framework/src/Illuminate/Routing/Middleware/SubstituteBindings.php:50 18. vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php:183

Bindings	0: 65
Backtrace	14. app/Models/Journal.php:64 15. app/Http/Controllers/ArticleController.php:2516 16. app/Http/Controllers/ArticleController.php:2035 17. vendor/laravel/framework/src/Illuminate/Routing/Controller.php:54 18. vendor/laravel/framework/src/Illuminate/Routing/ControllerDispatcher.php:43

Bindings	0: 134350
Backtrace	13. app/Models/Keywords.php:43 14. app/Http/Controllers/ArticleController.php:2519 15. app/Http/Controllers/ArticleController.php:2035 16. vendor/laravel/framework/src/Illuminate/Routing/Controller.php:54 17. vendor/laravel/framework/src/Illuminate/Routing/ControllerDispatcher.php:43

Bindings	0: A
Backtrace	13. app/Models/BundleOffer.php:57 14. app/Http/Controllers/ArticleController.php:2535 15. app/Http/Controllers/ArticleController.php:2035 16. vendor/laravel/framework/src/Illuminate/Routing/Controller.php:54 17. vendor/laravel/framework/src/Illuminate/Routing/ControllerDispatcher.php:43

Bindings	0: 1
Backtrace	13. app/Models/Article.php:1960 14. app/Http/Controllers/ArticleController.php:2062 15. vendor/laravel/framework/src/Illuminate/Routing/Controller.php:54 16. vendor/laravel/framework/src/Illuminate/Routing/ControllerDispatcher.php:43 17. vendor/laravel/framework/src/Illuminate/Routing/Route.php:260

Bindings	0: 220814
Backtrace	14. app/Models/Meta.php:30 15. app/Http/Controllers/ArticleController.php:2101 16. vendor/laravel/framework/src/Illuminate/Routing/Controller.php:54 17. vendor/laravel/framework/src/Illuminate/Routing/ControllerDispatcher.php:43 18. vendor/laravel/framework/src/Illuminate/Routing/Route.php:260

Bindings	0: CBIO
Backtrace	13. app/Models/Article.php:2025 14. app/Http/Controllers/ArticleController.php:2112 15. vendor/laravel/framework/src/Illuminate/Routing/Controller.php:54 16. vendor/laravel/framework/src/Illuminate/Routing/ControllerDispatcher.php:43 17. vendor/laravel/framework/src/Illuminate/Routing/Route.php:260

Bindings	0: 458
Backtrace	14. app/Models/Article.php:1819 15. app/Http/Controllers/ArticleController.php:2470 16. vendor/laravel/framework/src/Illuminate/Routing/Controller.php:54 17. vendor/laravel/framework/src/Illuminate/Routing/ControllerDispatcher.php:43 18. vendor/laravel/framework/src/Illuminate/Routing/Route.php:260

Bindings	0: 469
Backtrace	14. app/Models/Article.php:1831 15. app/Http/Controllers/ArticleController.php:2472 16. vendor/laravel/framework/src/Illuminate/Routing/Controller.php:54 17. vendor/laravel/framework/src/Illuminate/Routing/ControllerDispatcher.php:43 18. vendor/laravel/framework/src/Illuminate/Routing/Route.php:260

Bindings	0: en
Backtrace	14. app/Http/Controllers/ArticleController.php:2497 15. vendor/laravel/framework/src/Illuminate/Routing/Controller.php:54 16. vendor/laravel/framework/src/Illuminate/Routing/ControllerDispatcher.php:43 17. vendor/laravel/framework/src/Illuminate/Routing/Route.php:260 18. vendor/laravel/framework/src/Illuminate/Routing/Route.php:206

Bindings	0: 1 1: J 2: 65 3: 1740774933 4: 1740774933 5: A
Backtrace	13. app/Models/Banner.php:95 14. app/Http/helpers.php:404 17. vendor/laravel/framework/src/Illuminate/Filesystem/Filesystem.php:124 18. vendor/laravel/framework/src/Illuminate/View/Engines/PhpEngine.php:58 19. vendor/laravel/framework/src/Illuminate/View/Engines/CompilerEngine.php:73

Current Bioinformatics

Prediction of Plant Ubiquitylation Proteins and Sites by Fusing Multiple Features

Abstract Play Pause

Related Journals

Related Books

Abstract