Stacking-Kcr: A Stacking Model for Predicting the Crotonylation Sites of
Lysine by Fusing Serial and Automatic Encoder

Ying      Liang; Suhui      Li; Xiya      You; You      Guo; Jianjun      Tang

doi:10.2174/0115748936272040231117114252

Abstract

Background: Protein lysine crotonylation (Kcr), a newly discovered important posttranslational modification (PTM), is typically localized at the transcription start site and regulates gene expression, which is associated with a variety of pathological conditions such as developmental defects and malignant transformation.

Objective: Identifying Kcr sites is advantageous for the discovery of its biological mechanism and the development of new drugs for related diseases. However, traditional experimental methods for identifying Kcr sites are expensive and inefficient, necessitating the development of new computational techniques.

Methods: In this work, to accurately identify Kcr sites, we propose a model for ensemble learning called Stacking-Kcr. Firstly, extract features from sequence information, physicochemical properties, and sequence fragment similarity. Then, the two characteristics of sequence information and physicochemical properties are fused using automatic encoder and serial, respectively. Finally, the fused two features and sequence fragment similarity features are then respectively input into the four base classifiers, a meta classifier is constructed using the first level prediction results, and the final forecasting results are obtained.

Results: The five-fold cross-validation of this model has achieved an accuracy of 0.828 and an AUC of 0.910. This shows that the Stacking-Kcr method has obvious advantages over traditional machine learning methods. On independent test sets, Stacking-Kcr achieved an accuracy of 84.89% and an AUC of 92.21%, which was higher than 1.7% and 0.8% of other state-of-the-art tools. Additionally, we trained Stacking-Kcr on the phosphorylation site, and the result is superior to the current model.

Conclusion: These outcomes are additional evidence that Stacking-Kcr has strong application potential and generalization performance.

« Previous

Graphical Abstract

[1]
Soffer RL. Post-translational modification of proteins catalyzed by aminoacyl-tRNA-protein transferases. Mol Cell Biochem  1973; 2(1): 3-14.
 [http://dx.doi.org/10.1007/BF01738673] [PMID: 4587539]

[2]
Kouzarides T. Chromatin modifications and their function. Cell  2007; 128(4): 693-705.
 [http://dx.doi.org/10.1016/j.cell.2007.02.005] [PMID: 17320507]

[3]
Verdin E, Ott M. 50 years of protein acetylation: From gene regulation to epigenetics, metabolism and beyond. Nat Rev Mol Cell Biol  2015; 16(4): 258-64.
 [http://dx.doi.org/10.1038/nrm3931] [PMID: 25549891]

[4]
Wan J, Liu H, Chu J, Zhang H. Functions and mechanisms of lysine crotonylation. J Cell Mol Med  2019; 23(11): 7163-9.
 [http://dx.doi.org/10.1111/jcmm.14650] [PMID: 31475443]

[5]
Yu H, Bu C, Liu Y, et al. Global crotonylome reveals CDYL-regulated RPA1 crotonylation in homologous recombination–mediated DNA repair. Sci Adv  2020; 6(11): eaay4697.
 [http://dx.doi.org/10.1126/sciadv.aay4697] [PMID: 32201722]

[6]
Tan M, Luo H, Lee S, et al. Identification of 67 histone marks and histone lysine crotonylation as a new type of histone modification. Cell  2011; 146(6): 1016-28.
 [http://dx.doi.org/10.1016/j.cell.2011.08.008] [PMID: 21925322]

[7]
Fellows R, Denizot J, Stellato C, et al. Microbiota derived short chain fatty acids promote histone crotonylation in the colon through histone deacetylases. Nat Commun  2018; 9(1): 105.
 [http://dx.doi.org/10.1038/s41467-017-02651-5] [PMID: 29317660]

[8]
Huang H, Zhang D, Wang Y, et al. Lysine benzoylation is a histone mark regulated by SIRT2. Nat Commun  2018; 9(1): 3374.
 [http://dx.doi.org/10.1038/s41467-018-05567-w] [PMID: 30154464]

[9]
Jiang G, Nguyen D, Archin NM, et al. HIV latency is reversed by ACSS2-driven histone crotonylation. J Clin Invest  2018; 128(3): 1190-8.
 [http://dx.doi.org/10.1172/JCI98071] [PMID: 29457784]

[10]
Liu S, Yu H, Liu Y, et al. Chromodomain protein CDYL acts as a crotonyl-CoA hydratase to regulate histone crotonylation and spermatogenesis. Mol Cell  2017; 67(5): 853-866.e5.
 [http://dx.doi.org/10.1016/j.molcel.2017.07.011] [PMID: 28803779]

[11]
Ruiz-Andres O, Sanchez-Niño MD, Cannata-Ortiz P, et al. Histone lysine crotonylation during acute kidney injury in mice. Dis Model Mech  2016; 9(6): 633-45.
 [PMID: 27125278]

[12]
Qiao Y, Zhu X, Gong H. BERT-Kcr: Prediction of lysine crotonylation sites by a transfer learning method with pre-trained BERT models. Bioinformatics  2022; 38(3): 648-54.
 [http://dx.doi.org/10.1093/bioinformatics/btab712] [PMID: 34643684]

[13]
Huang GH, Zeng WFA. Discrete hidden Markov model for detecting histone crotonyllysine sites, match-communications in mathematical and in computer. Chemistry  2016; 75: 717-30.

[14]
Qiu WR, Sun BQ, Tang H, Huang J, Lin H. Identify and analysis crotonylation sites in histone by using support vector machines. Artif Intell Med  2017; 83: 75-81.
 [http://dx.doi.org/10.1016/j.artmed.2017.02.007] [PMID: 28283358]

[15]
Ju Z, He JJ. Prediction of lysine crotonylation sites by incorporating the composition of k -spaced amino acid pairs into Chou’s general PseAAC. J Mol Graph Model  2017; 77: 200-4.
 [http://dx.doi.org/10.1016/j.jmgm.2017.08.020] [PMID: 28886434]

[16]
Qiu WR, Sun BQ, Xiao X, et al. iKcr-PseEns: Identify lysine crotonylation sites in histone proteins with pseudo components and ensemble classifier. Genomics  2017; S0888754317301386.
 [http://dx.doi.org/10.1016/j.ygeno.2017.10.008] [PMID: 29107015]

[17]
Liu Y, Yu Z, Chen C, Han Y, Yu B. Prediction of protein crotonylation sites through LightGBM classifier based on SMOTE and elastic net. Anal Biochem  2020; 609: 113903.
 [http://dx.doi.org/10.1016/j.ab.2020.113903] [PMID: 32805274]

[18]
Jeon YJ, Hasan MM, Park HW, Lee KW, Manavalan B. TACOS: A novel approach for accurate prediction of cell-specific long noncoding RNAs subcellular localization. Brief Bioinform  2022; 23(4): bbac243.
 [http://dx.doi.org/10.1093/bib/bbac243] [PMID: 35753698]

[19]
Manavalan B, Patra MC. An updated cell-penetrating peptides and their uptake efficiency predictor. J Mol Biol  2022; 434(11): 167604.
 [http://dx.doi.org/10.1016/j.jmb.2022.167604]

[20]
Shoombuatong W, Basith S, Pitti T, Lee G, Manavalan B. THRONE: A new approach for accurate prediction of human rna n7-methylguanosine sites. J Mol Biol  2022; 434(11): 167549.
 [http://dx.doi.org/10.1016/j.jmb.2022.167549] [PMID: 35662472]

[21]
Liang Y, Wu Y, Zhang Z, Liu N, Peng J, Tang J. Hyb4mC: A hybrid DNA2vec-based model for DNA N4-methylcytosine sites prediction. BMC Bioinformatics  2022; 23(1): 258.
 [http://dx.doi.org/10.1186/s12859-022-04789-6] [PMID: 35768759]

[22]
Lv H, Dao FY, Guan ZX, Yang H, Li YW, Lin H. Deep-Kcr: Accurate detection of lysine crotonylation sites using deep learning method. Brief Bioinform  2021; 22(4): bbaa255.
 [http://dx.doi.org/10.1093/bib/bbaa255] [PMID: 33099604]

[23]
Lv H, Dao FY, Zulfiqar H, et al. A sequence-based deep learning approach to predict CTCF-mediated chromatin loop. Brief Bioinform  2021; 22(5): bbab031.
 [http://dx.doi.org/10.1093/bib/bbab031] [PMID: 33634313]

[24]
Zeng M, Wu Y, Lu C, et al. DeepLncLoc: A deep learning framework for long non-coding RNA subcellular localization prediction based on subsequence embedding. Bioinformatics 2021.
 [PMID: 34498677]

[25]
Gunasekaran H, Ramalakshmi K, Rex Macedo Arokiaraj A, Deepa Kanmani S, Venkatesan C, Suresh Gnana Dhas C. Analysis of DNA sequence classification using CNN and hybrid models. Comput Math Methods Med  2021; 2021: 1-12.
 [http://dx.doi.org/10.1155/2021/1835056] [PMID: 34306171]

[26]
Alipanahi B. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat Biotechnol  2015; 33: 831.

[27]
Zeng H, Edwards MD, Liu G, Gifford DK. Convolutional neural network architectures for predicting DNA–protein binding. Bioinformatics  2016; 32(12): i121-7.
 [http://dx.doi.org/10.1093/bioinformatics/btw255] [PMID: 27307608]

[28]
Zhou J, Troyanskaya OG. Predicting effects of noncoding variants with deep learning–based sequence model. Nat Methods  2015; 12(10): 931-4.
 [http://dx.doi.org/10.1038/nmeth.3547] [PMID: 26301843]

[29]
Khanal J, Tayara H, Zou Q, To Chong K. DeepCap-Kcr: Accurate identification and investigation of protein lysine crotonylation sites based on capsule network. Brief Bioinform  2022; 23(1): bbab492.
 [http://dx.doi.org/10.1093/bib/bbab492] [PMID: 34882222]

[30]
Zhang Z, Xu J, Wu Y, Liu N, Wang Y, Liang Y. CapsNet-LDA: Predicting lncRNA-disease associations using attention mechanism and capsule network based on multi-view data. Brief Bioinform  2023; 24(1): bbac531.
 [http://dx.doi.org/10.1093/bib/bbac531] [PMID: 36511221]

[31]
Ramachandram D, Taylor GW. Deep multimodal learning: A survey on recent advances and trends. IEEE Signal Process Mag  2017; 34(6): 96-108.
 [http://dx.doi.org/10.1109/MSP.2017.2738401]

[32]
Lv H, Dao FY, Zulfiqar H, Lin H. DeepIPs: Comprehensive assessment and computational identification of phosphorylation sites of SARS-CoV-2 infection using a deep learning-based approach. Brief Bioinform  2021; 22(6): bbab244.
 [http://dx.doi.org/10.1093/bib/bbab244] [PMID: 34184738]

[33]
UniProt Consortium Ongoing and future developments at the universal protein resource. Nucleic Acids Res  2011; 39(Database issue): D214-9.
 [PMID: 21051339]

[34]
Huang Y, Niu B, Gao Y, Fu L, Li W. CD-HIT Suite: A web server for clustering and comparing biological sequences. Bioinformatics  2010; 26(5): 680-2.
 [http://dx.doi.org/10.1093/bioinformatics/btq003] [PMID: 20053844]

[35]
Chen YZ, Wang ZZ, Wang Y, Ying G, Chen Z, Song J. nhKcr: A new bioinformatics tool for predicting crotonylation sites on human nonhistone proteins based on deep learning. Brief Bioinform  2021; 22(6): bbab146.
 [http://dx.doi.org/10.1093/bib/bbab146] [PMID: 34002774]

[36]
Chen Z, Zhao P, Li F, et al. PROSPECT: A web server for predicting protein histidine phosphorylation sites. J Bioinform Comput Biol  2020; 18(4): 2050018.
 [http://dx.doi.org/10.1142/S0219720020500183] [PMID: 32501138]

[37]
Sandberg M, Eriksson L, Jonsson J, Sjöström M, Wold S. New chemical descriptors relevant for the design of biologically active peptides. A multivariate characterization of 87 amino acids. J Med Chem  1998; 41(14): 2481-91.
 [http://dx.doi.org/10.1021/jm9700575] [PMID: 9651153]

[38]
Chen YZ, Tang YR, Sheng ZY, Zhang Z. Prediction of mucin-type O-glycosylation sites in mammalian proteins using the composition of k-spaced amino acid pairs. BMC Bioinformatics  2008; 9(1): 101.
 [http://dx.doi.org/10.1186/1471-2105-9-101] [PMID: 18282281]

[39]
Chen Z, He N, Huang Y, Qin WT, Liu X, Li L. Integration of a deep learning classifier with a random forest approach for predicting malonylation sites. Genomics Proteomics Bioinformatics  2018; 16(6): 451-9.
 [http://dx.doi.org/10.1016/j.gpb.2018.08.004] [PMID: 30639696]

[40]
Li A, Wang L, Shi Y, Wang M, Jiang Z, Feng H. Phosphorylation site prediction with a modified k-nearest neighbor algorithm and BLOSUM62 matrix. Conf Proc IEEE Eng Med Biol Soc  2005; 2005: 6075-8.
 [PMID: 17281648]

[41]
Chen Z, Zhou Y, Song J, Zhang Z. hCKSAAP_UbSite: Improved prediction of human ubiquitination sites by exploiting amino acid pattern and properties. Biochim Biophys Acta Proteins Proteomics  2013; 1834(8): 1461-7.
 [http://dx.doi.org/10.1016/j.bbapap.2013.04.006] [PMID: 23603789]

[42]
Chen Z, Chen YZ, Wang XF, Wang C, Yan RX, Zhang Z. Prediction of ubiquitination sites by using the composition of k-spaced amino acid pairs. PLoS One  2011; 6(7): e22930.
 [http://dx.doi.org/10.1371/journal.pone.0022930] [PMID: 21829559]

[43]
Mosharaf MP, Hassan MM, Ahmed FF, Khatun MS, Moni MA, Mollah MNH. Computational prediction of protein ubiquitination sites mapping on Arabidopsis thaliana. Comput Biol Chem  2020; 85: 107238.
 [http://dx.doi.org/10.1016/j.compbiolchem.2020.107238] [PMID: 32114285]

[44]
Kao HJ, Nguyen VN, Huang KY, Chang WC, Lee TY. SuccSite: Incorporating amino acid composition and informative k-spaced amino acid pairs to identify protein succinylation sites. Genomics Proteomics Bioinformatics  2020; 18(2): 208-19.
 [http://dx.doi.org/10.1016/j.gpb.2018.10.010] [PMID: 32592791]

[45]
Wen YT, Lei HJ, You ZH, Lei BY, Chen X, Li LP. Prediction of protein-protein interactions by label propagation with protein evolutionary and chemical information derived from heterogeneous network. J Theor Biol  2017; 430: 9-20.
 [http://dx.doi.org/10.1016/j.jtbi.2017.06.003] [PMID: 28625475]

[46]
Stahlschmidt SR, Ulfenborg B, Synnergren J. Multimodal deep learning for biomedical data fusion: A review. Brief Bioinform  2022; 23(2): bbab569.
 [http://dx.doi.org/10.1093/bib/bbab569] [PMID: 35089332]

[47]
Ballard DH. Modular learning in neural networks. Proc Sixth Nat Conf Artif Intell  1987; 279-84.

[48]
Hasan MM, Tsukiyama S, Cho JY, et al. Deepm5C: A deep-learning-based hybrid framework for identifying human RNA N5-methylcytosine sites using a stacking strategy. Mol Ther  2022; 30(8): 2856-67.
 [http://dx.doi.org/10.1016/j.ymthe.2022.05.001]

[49]
Basith S, Lee G, Manavalan B. STALLION: A stacking-based ensemble learning framework for prokaryotic lysine acetylation site prediction. Brief Bioinform  2022; 23(1): bbab376.
 [http://dx.doi.org/10.1093/bib/bbab376] [PMID: 34532736]

[50]
Bupi N, Sangaraju V K, Phan L T, et al. An effective integrated machine learning framework for identifying severity of tomato yellow leaf curl virus and their experimental validation. Research  2023; 6(0016)

[51]
Liang Y, Zhang ZQ, Liu NN, Wu YN, Gu CL, Wang YL. MAGCNSE: Predicting lncRNA-disease associations using multi-view attention graph convolutional network and stacking ensemble model. BMC Bioinformatics  2022; 23(1): 189.
 [http://dx.doi.org/10.1186/s12859-022-04715-w] [PMID: 35590258]

[52]
Breiman L. Random forests. Mach Learn  2001; 45(1): 5-32.
 [http://dx.doi.org/10.1023/A:1010933404324]

[53]
Wang X, Yu B, Ma A, Chen C, Liu B, Ma Q. Protein–protein interaction sites prediction by ensemble random forests with synthetic minority oversampling technique. Bioinformatics  2019; 35(14): 2395-402.
 [http://dx.doi.org/10.1093/bioinformatics/bty995] [PMID: 30520961]

[54]
Wang J, Zhou S, Yi Y, Kong J. An improved feature selection based on effective range for classification. ScientificWorldJournal  2014; 2014: 1-8.
 [http://dx.doi.org/10.1155/2014/972125] [PMID: 24688449]

[55]
Friedman JH. Stochastic gradient boosting. Comput Stat Data Anal  2002; 38(4): 367-78.
 [http://dx.doi.org/10.1016/S0167-9473(01)00065-2]

[56]
Chen T, Guestrin C. Xgboost: A scalable tree boosting system. Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining  785-94.
 [http://dx.doi.org/10.1145/2939672.2939785]

[57]
Sharma A, Singh B. AE-LGBM: Sequence-based novel approach to detect interacting protein pairs via ensemble of autoencoder and LightGBM. Comput Biol Med  2020; 125: 103964.
 [http://dx.doi.org/10.1016/j.compbiomed.2020.103964] [PMID: 32911276]

[58]
Deng L, Pan J, Xu X, Yang W, Liu C, Liu H. PDRLGB: Precise DNA-binding residue prediction using a light gradient boosting machine. BMC Bioinformatics  2018; 19(S19): 522.
 [http://dx.doi.org/10.1186/s12859-018-2527-1] [PMID: 30598073]

[59]
Yang S, Fu C, Lian X, Dong X, Zhang Z. Understanding humanvirus protein-protein interactions using a human protein complex-based analysis framework. mSystems  2019; 4(2): e00303-18.
 [http://dx.doi.org/10.1128/mSystems.00303-18] [PMID: 30984872]

[60]
Kiranyaz S, Ince T, Pulkkinen J, et al. Classification and retrieval on macroinvertebrate image databases. Comput Biol Med  2011; 41(7): 463-72.
 [http://dx.doi.org/10.1016/j.compbiomed.2011.04.008 ] [PMID: 21601841]

[61]
Shi Q, Chen W, Huang S, Wang Y, Xue Z. Deep learning for mining protein data. Brief Bioinform  2021; 22(1): 194-218.
 [http://dx.doi.org/10.1093/bib/bbz156] [PMID: 31867611]

[62]
Wang B, Mei C, Wang Y, et al. Imbalance data processing strategy for protein interaction sites prediction. IEEE/ACM Trans Comput Biol Bioinformatics 2019.

[63]
Deng A, Zhang H, Wang W, et al. Developing computational model to predict protein-protein interaction sites based on the XGBoost algorithm. Int J Mol Sci  2020; 21(7): 2274.
 [http://dx.doi.org/10.3390/ijms21072274] [PMID: 32218345]

[64]
Yue Z, Chu X, Xia J. PredCID: Prediction of driver frameshift indels in human cancer. Brief Bioinform  2021; 22(3): bbaa119.
 [http://dx.doi.org/10.1093/bib/bbaa119] [PMID: 32591774]

[65]
Basith S, Manavalan B, Hwan Shin T, Lee G. Machine intelligence in peptide therapeutics: A next‐generation tool for rapid disease screening. Med Res Rev  2020; 40(4): 1276-314.
 [http://dx.doi.org/10.1002/med.21658] [PMID: 31922268]

[66]
Shoombuatong W, Schaduangrat N, Pratiwi R, Nantasenamat C. THPep: A machine learning-based approach for predicting tumor homing peptides. Comput Biol Chem  2019; 80: 441-51.
 [http://dx.doi.org/10.1016/j.compbiolchem.2019.05.008 ] [PMID: 31151025]

[67]
Su R, Hu J, Zou Q, Manavalan B, Wei L. Empirical comparison and analysis of web-based cell-penetrating peptide prediction tools. Brief Bioinform  2020; 21(2): 408-20.
 [http://dx.doi.org/10.1093/bib/bby124] [PMID: 30649170]

[68]
Jiang M, Zhao B, Luo S, et al. NeuroPpred-Fuse: An interpretable stacking model for prediction of neuropeptides by fusing sequence information and feature selection methods. Brief Bioinform  2021; 22(6): bbab310.
 [http://dx.doi.org/10.1093/bib/bbab310] [PMID: 34396388]

[69]
Liu N, Zhang Z, Wu Y, Wang Y, Liang Y. CRBSP:Prediction of CircRNA-RBP binding sites based on multimodal intermediate fusion. EEE/ACM Trans Comput Biol Bioinform  2023; 20(5): 2898-906.
 [http://dx.doi.org/10.1109/TCBB.2023.3272400]

[70]
Dang TH, Le HQ, Nguyen TM, Vu ST. D3NER: Biomedical named entity recognition using CRF-biLSTM improved with fine-tuned embeddings of various linguistic information. Bioinformatics  2018; 34(20): 3539-46.
 [http://dx.doi.org/10.1093/bioinformatics/bty356] [PMID: 29718118]

[71]
Yang H, Wang M, Liu X, Zhao XM, Li A. PhosIDN: An integrated deep neural network for improving protein phosphorylation site prediction by combining sequence and protein–protein interaction information. Bioinformatics  2021; 37(24): 4668-76.
 [http://dx.doi.org/10.1093/bioinformatics/btab551] [PMID: 34320631]

[72]
Tang Z, Zhang T, Yang B, Su J, Song Q. spaCI: Deciphering spatial cellular communications through adaptive graph model. Brief Bioinform  2023; 24(1): bbac563.
 [http://dx.doi.org/10.1093/bib/bbac563] [PMID: 36545790]

[73]
Song Q, Su J, Miller LD, Zhang W. scLM: Automatic detection of consensus gene clusters across multiple single-cell datasets. Genomics Proteomics Bioinformatics  2021; 19(2): 330-41.
 [http://dx.doi.org/10.1016/j.gpb.2020.09.002] [PMID: 33359676]

[74]
Stukalov A, Girault V, Grass V, et al. Multilevel proteomics reveals host perturbations by SARS-CoV-2 and SARS-CoV. Nature  2021; 594(7862): 246-52.
 [http://dx.doi.org/10.1038/s41586-021-03493-4 ] [PMID: 33845483]

Rights & Permissions Print Cite

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/0115748936272040231117114252	Print ISSN 1574-8936
Publisher Name Bentham Science Publisher	Online ISSN 2212-392X

Current Bioinformatics

Stacking-Kcr: A Stacking Model for Predicting the Crotonylation Sites of Lysine by Fusing Serial and Automatic Encoder

Abstract Play Pause

Graphical Abstract

Related Journals

Related Books

Abstract