Abstract
Background: Ubiquitination, as a post-translational modification, is a crucial biological process in cell signaling, apoptosis, and localization. Identification of ubiquitination proteins is of fundamental importance for understanding the molecular mechanisms in biological systems and diseases. Although high-throughput experimental studies using mass spectrometry have identified many ubiquitination proteins and ubiquitination sites, the vast majority of ubiquitination proteins remain undiscovered, even in well-studied model organisms.
Objective: To reduce experimental costs, computational methods have been introduced to predict ubiquitination sites, but the accuracy is unsatisfactory. If it can be predicted whether a protein can be ubiquitinated or not, it will help in predicting ubiquitination sites. However, all the computational methods so far can only predict ubiquitination sites.
Methods: In this study, the first computational method for predicting ubiquitination proteins without relying on ubiquitination site prediction has been developed. The method extracts features from sequence conservation information through a grey system model, as well as functional domain annotation and subcellular localization.
Results: Together with the feature analysis and application of the relief feature selection algorithm, the results of 5-fold cross-validation on three datasets achieved a high accuracy of 90.13%, with Matthew’s correlation coefficient of 80.34%. The predicted results on an independent test data achieved 87.71% as accuracy and 75.43% of Matthew’s correlation coefficient, better than the prediction from the best ubiquitination site prediction tool available.
Conclusion: Our study may guide experimental design and provide useful insights for studying the mechanisms and modulation of ubiquitination pathways. The code is available at: https://github.com/Chunhuixu/UBIPredic_QWRCHX.
Keywords: Ubiquitination, machine learning, random forest, protein annotation, subcellular localization, functional domain.
Graphical Abstract
[http://dx.doi.org/10.1016/S0955-0674(03)00010-3] [PMID: 12648674]
[http://dx.doi.org/10.1038/nrm1700] [PMID: 16064136]
[http://dx.doi.org/10.1038/sj.emboj.7600808] [PMID: 16148945]
[http://dx.doi.org/10.1038/nrc1994] [PMID: 16990855]
[http://dx.doi.org/10.4172/jpb.1000091] [PMID: 20148194]
[http://dx.doi.org/10.7326/0003-4819-145-9-200611070-00010] [PMID: 17088581]
[http://dx.doi.org/10.1146/annurev.med.50.1.57] [PMID: 10073263]
[http://dx.doi.org/10.1042/BCJ20160719] [PMID: 27834739]
[http://dx.doi.org/10.1186/s12859-016-0959-z] [PMID: 26940649]
[http://dx.doi.org/10.1186/s12859-016-0959-z] [PMID: 26940649]
[http://dx.doi.org/10.1093/bib/bbu031] [PMID: 25212598]
[http://dx.doi.org/10.1002/prot.22555] [PMID: 19722269]
[http://dx.doi.org/10.1007/s00726-011-0835-0] [PMID: 21267749]
[http://dx.doi.org/10.3390/ijms12128347] [PMID: 22272076]
[http://dx.doi.org/10.1371/journal.pone.0022930] [PMID: 21829559]
[http://dx.doi.org/10.1093/bioinformatics/btt196] [PMID: 23626001]
[http://dx.doi.org/10.1186/s12918-015-0246-z] [PMID: 26818456]
[http://dx.doi.org/10.1002/minf.201600010]
[http://dx.doi.org/10.2174/157016409789973707]
[http://dx.doi.org/10.1002/prot.1035] [PMID: 11288174]
[http://dx.doi.org/10.1093/nar/gkv458] [PMID: 25958395]
[http://dx.doi.org/10.1038/75556] [PMID: 10802651]
[http://dx.doi.org/10.1093/bioinformatics/btl677] [PMID: 17237066]
[http://dx.doi.org/10.1093/nar/gkw1099] [PMID: 27899622]
[http://dx.doi.org/10.1093/bioinformatics/btl158] [PMID: 16731699]
[http://dx.doi.org/10.1016/S0968-0004(98)01336-X] [PMID: 10087920]
[PMID: 14681407]
[http://dx.doi.org/10.1093/nar/27.1.260] [PMID: 9847196]
[http://dx.doi.org/10.1093/nar/gkh088] [PMID: 14681379]
[http://dx.doi.org/10.1093/nar/gkp885] [PMID: 19858104]
[http://dx.doi.org/10.1186/1471-2105-5-28] [PMID: 15113407]
[http://dx.doi.org/10.1093/nar/gkn785] [PMID: 18940856]
[http://dx.doi.org/10.1093/database/bas019] [PMID: 22508994]
[http://dx.doi.org/10.1007/BF02459570] [PMID: 2185863]
[http://dx.doi.org/10.1021/pr025527k] [PMID: 12645914]
[http://dx.doi.org/10.1007/BF00994018]
[http://dx.doi.org/10.1109/TIT.1967.1053964]
[http://dx.doi.org/10.1109/34.709601]
[http://dx.doi.org/10.1016/j.febslet.2006.10.017] [PMID: 17069811]
[http://dx.doi.org/10.1007/s00726-008-0212-9] [PMID: 19037711]
[http://dx.doi.org/10.1016/j.jtbi.2010.12.024] [PMID: 21168420]
[http://dx.doi.org/10.1016/j.bbrc.2007.06.027] [PMID: 17586467]
[http://dx.doi.org/10.1016/j.ab.2007.07.006] [PMID: 17698024]
[http://dx.doi.org/10.2174/0929867043364667] [PMID: 15279552]
[http://dx.doi.org/10.1093/nar/29.14.2994] [PMID: 11452024]
[http://dx.doi.org/10.1371/journal.pone.0049040] [PMID: 23189138]
[PMID: 1322398]
[http://dx.doi.org/10.1093/nar/gkn923] [PMID: 19033363]