Abstract
Background: Protein lysine crotonylation (Kcr), a newly discovered important posttranslational modification (PTM), is typically localized at the transcription start site and regulates gene expression, which is associated with a variety of pathological conditions such as developmental defects and malignant transformation.
Objective: Identifying Kcr sites is advantageous for the discovery of its biological mechanism and the development of new drugs for related diseases. However, traditional experimental methods for identifying Kcr sites are expensive and inefficient, necessitating the development of new computational techniques.
Methods: In this work, to accurately identify Kcr sites, we propose a model for ensemble learning called Stacking-Kcr. Firstly, extract features from sequence information, physicochemical properties, and sequence fragment similarity. Then, the two characteristics of sequence information and physicochemical properties are fused using automatic encoder and serial, respectively. Finally, the fused two features and sequence fragment similarity features are then respectively input into the four base classifiers, a meta classifier is constructed using the first level prediction results, and the final forecasting results are obtained.
Results: The five-fold cross-validation of this model has achieved an accuracy of 0.828 and an AUC of 0.910. This shows that the Stacking-Kcr method has obvious advantages over traditional machine learning methods. On independent test sets, Stacking-Kcr achieved an accuracy of 84.89% and an AUC of 92.21%, which was higher than 1.7% and 0.8% of other state-of-the-art tools. Additionally, we trained Stacking-Kcr on the phosphorylation site, and the result is superior to the current model.
Conclusion: These outcomes are additional evidence that Stacking-Kcr has strong application potential and generalization performance.
Graphical Abstract
[http://dx.doi.org/10.1007/BF01738673] [PMID: 4587539]
[http://dx.doi.org/10.1016/j.cell.2007.02.005] [PMID: 17320507]
[http://dx.doi.org/10.1038/nrm3931] [PMID: 25549891]
[http://dx.doi.org/10.1111/jcmm.14650] [PMID: 31475443]
[http://dx.doi.org/10.1126/sciadv.aay4697] [PMID: 32201722]
[http://dx.doi.org/10.1016/j.cell.2011.08.008] [PMID: 21925322]
[http://dx.doi.org/10.1038/s41467-017-02651-5] [PMID: 29317660]
[http://dx.doi.org/10.1038/s41467-018-05567-w] [PMID: 30154464]
[http://dx.doi.org/10.1172/JCI98071] [PMID: 29457784]
[http://dx.doi.org/10.1016/j.molcel.2017.07.011] [PMID: 28803779]
[PMID: 27125278]
[http://dx.doi.org/10.1093/bioinformatics/btab712] [PMID: 34643684]
[http://dx.doi.org/10.1016/j.artmed.2017.02.007] [PMID: 28283358]
[http://dx.doi.org/10.1016/j.jmgm.2017.08.020] [PMID: 28886434]
[http://dx.doi.org/10.1016/j.ygeno.2017.10.008] [PMID: 29107015]
[http://dx.doi.org/10.1016/j.ab.2020.113903] [PMID: 32805274]
[http://dx.doi.org/10.1093/bib/bbac243] [PMID: 35753698]
[http://dx.doi.org/10.1016/j.jmb.2022.167604]
[http://dx.doi.org/10.1016/j.jmb.2022.167549] [PMID: 35662472]
[http://dx.doi.org/10.1186/s12859-022-04789-6] [PMID: 35768759]
[http://dx.doi.org/10.1093/bib/bbaa255] [PMID: 33099604]
[http://dx.doi.org/10.1093/bib/bbab031] [PMID: 33634313]
[PMID: 34498677]
[http://dx.doi.org/10.1155/2021/1835056] [PMID: 34306171]
[http://dx.doi.org/10.1093/bioinformatics/btw255] [PMID: 27307608]
[http://dx.doi.org/10.1038/nmeth.3547] [PMID: 26301843]
[http://dx.doi.org/10.1093/bib/bbab492] [PMID: 34882222]
[http://dx.doi.org/10.1093/bib/bbac531] [PMID: 36511221]
[http://dx.doi.org/10.1109/MSP.2017.2738401]
[http://dx.doi.org/10.1093/bib/bbab244] [PMID: 34184738]
[PMID: 21051339]
[http://dx.doi.org/10.1093/bioinformatics/btq003] [PMID: 20053844]
[http://dx.doi.org/10.1093/bib/bbab146] [PMID: 34002774]
[http://dx.doi.org/10.1142/S0219720020500183] [PMID: 32501138]
[http://dx.doi.org/10.1021/jm9700575] [PMID: 9651153]
[http://dx.doi.org/10.1186/1471-2105-9-101] [PMID: 18282281]
[http://dx.doi.org/10.1016/j.gpb.2018.08.004] [PMID: 30639696]
[PMID: 17281648]
[http://dx.doi.org/10.1016/j.bbapap.2013.04.006] [PMID: 23603789]
[http://dx.doi.org/10.1371/journal.pone.0022930] [PMID: 21829559]
[http://dx.doi.org/10.1016/j.compbiolchem.2020.107238] [PMID: 32114285]
[http://dx.doi.org/10.1016/j.gpb.2018.10.010] [PMID: 32592791]
[http://dx.doi.org/10.1016/j.jtbi.2017.06.003] [PMID: 28625475]
[http://dx.doi.org/10.1093/bib/bbab569] [PMID: 35089332]
[http://dx.doi.org/10.1016/j.ymthe.2022.05.001]
[http://dx.doi.org/10.1093/bib/bbab376] [PMID: 34532736]
[http://dx.doi.org/10.1186/s12859-022-04715-w] [PMID: 35590258]
[http://dx.doi.org/10.1023/A:1010933404324]
[http://dx.doi.org/10.1093/bioinformatics/bty995] [PMID: 30520961]
[http://dx.doi.org/10.1155/2014/972125] [PMID: 24688449]
[http://dx.doi.org/10.1016/S0167-9473(01)00065-2]
[http://dx.doi.org/10.1145/2939672.2939785]
[http://dx.doi.org/10.1016/j.compbiomed.2020.103964] [PMID: 32911276]
[http://dx.doi.org/10.1186/s12859-018-2527-1] [PMID: 30598073]
[http://dx.doi.org/10.1128/mSystems.00303-18] [PMID: 30984872]
[http://dx.doi.org/10.1016/j.compbiomed.2011.04.008 ] [PMID: 21601841]
[http://dx.doi.org/10.1093/bib/bbz156] [PMID: 31867611]
[http://dx.doi.org/10.3390/ijms21072274] [PMID: 32218345]
[http://dx.doi.org/10.1093/bib/bbaa119] [PMID: 32591774]
[http://dx.doi.org/10.1002/med.21658] [PMID: 31922268]
[http://dx.doi.org/10.1016/j.compbiolchem.2019.05.008 ] [PMID: 31151025]
[http://dx.doi.org/10.1093/bib/bby124] [PMID: 30649170]
[http://dx.doi.org/10.1093/bib/bbab310] [PMID: 34396388]
[http://dx.doi.org/10.1109/TCBB.2023.3272400]
[http://dx.doi.org/10.1093/bioinformatics/bty356] [PMID: 29718118]
[http://dx.doi.org/10.1093/bioinformatics/btab551] [PMID: 34320631]
[http://dx.doi.org/10.1093/bib/bbac563] [PMID: 36545790]
[http://dx.doi.org/10.1016/j.gpb.2020.09.002] [PMID: 33359676]
[http://dx.doi.org/10.1038/s41586-021-03493-4 ] [PMID: 33845483]