Abstract
Acetylation on lysine residues is considered one of the most potent protein post-translational modifications, owing to its crucial role in cellular metabolism and regulatory processes. Recent advances in experimental techniques have unraveled several lysine acetylation substrates and sites. However, owing to its cost-ineffectiveness, cumbersome process, time-consumption, and labor-intensiveness, several efforts have been geared towards the development of computational tools. In particular, machine learning (ML)-based approaches hold great promise in the rapid discovery of lysine acetylation modification sites, which could be witnessed by the growing number of prediction tools. Recently, several ML methods have been developed for the prediction of lysine acetylation sites, owing to their time- and cost-effectiveness. In this review, we present a complete survey of the state-of-the-art ML predictors for lysine acetylation. We discuss a variety of key aspects for developing a successful predictor, including operating ML algorithms, feature selection methods, validation techniques, and software utility. Initially, we review lysine acetylation site databases, current ML approaches, working principles, and their performances. Lastly, we discuss the shortcomings and future directions of ML approaches in the prediction of lysine acetylation sites. This review may act as a useful guide for the experimentalists in choosing the right ML tool for their research. Moreover, it may help bioinformaticians in the development of more accurate and advanced MLbased predictors in protein research.
Keywords: Protein, post-translational modification, lysine, acetylation, machine learning, feature encoding, prediction model.
[http://dx.doi.org/10.1016/j.chembiol.2020.07.002] [PMID: 32698016]
[http://dx.doi.org/10.1016/j.csbj.2017.03.004] [PMID: 28458782]
[http://dx.doi.org/10.1186/s12859-019-2632-9] [PMID: 30674277]
[http://dx.doi.org/10.1038/srep00090] [PMID: 22034591]
[http://dx.doi.org/10.1093/nar/gkr1122] [PMID: 22135298]
[http://dx.doi.org/10.1016/j.gene.2005.09.010] [PMID: 16289629]
[http://dx.doi.org/10.1074/jbc.R000023200] [PMID: 11013267]
[http://dx.doi.org/10.1093/nar/gkh252] [PMID: 14960713]
[http://dx.doi.org/10.1016/S1097-2765(04)00094-2] [PMID: 15023334]
[http://dx.doi.org/10.1038/ncb1343] [PMID: 16341205]
[http://dx.doi.org/10.1016/S0960-9822(00)00445-0] [PMID: 10801418]
[http://dx.doi.org/10.1126/science.1094637] [PMID: 14976264]
[http://dx.doi.org/10.1186/s12859-019-2938-7] [PMID: 31208321]
[http://dx.doi.org/10.1016/j.cell.2009.03.018] [PMID: 19345187]
[http://dx.doi.org/10.1074/jbc.M111.257055] [PMID: 21917920]
[http://dx.doi.org/10.1038/icb.2011.99] [PMID: 22083525]
[http://dx.doi.org/10.1385/1-59259-828-5:099] [PMID: 15273407]
[http://dx.doi.org/10.1038/nbt0502-512] [PMID: 11981568]
[http://dx.doi.org/10.1021/bi00413a052] [PMID: 3167022]
[http://dx.doi.org/10.1093/nar/28.1.10] [PMID: 10592169]
[http://dx.doi.org/10.1093/nar/gkn892] [PMID: 18988627]
[http://dx.doi.org/10.1101/gr.1680803] [PMID: 14525934]
[http://dx.doi.org/10.1093/nar/gkq1159] [PMID: 21081558]
[http://dx.doi.org/10.1007/978-1-4939-3167-5_2] [PMID: 26519399]
[http://dx.doi.org/10.1074/mcp.M900030-MCP200] [PMID: 19366988]
[http://dx.doi.org/10.1002/jcc.21569] [PMID: 20839302]
[http://dx.doi.org/10.1093/nar/gkq939] [PMID: 21059677]
[http://dx.doi.org/10.1093/nar/gks437]
[http://dx.doi.org/10.1093/nar/gkt1093] [PMID: 24214993]
[http://dx.doi.org/10.1093/nar/gky1074] [PMID: 30418626]
[http://dx.doi.org/10.1093/nar/gkv1240] [PMID: 26578568]
[http://dx.doi.org/10.1016/j.jgg.2017.03.007] [PMID: 28529077]
[http://dx.doi.org/10.1016/j.bbrc.2006.08.199] [PMID: 17045240]
[http://dx.doi.org/10.3389/fimmu.2018.01695] [PMID: 30100904]
[http://dx.doi.org/10.3389/fimmu.2018.01783] [PMID: 30108593]
[http://dx.doi.org/10.2174/092986609788923338] [PMID: 19689425]
[http://dx.doi.org/10.1016/j.jtbi.2010.01.013] [PMID: 20085770]
[http://dx.doi.org/10.1039/c2mb25251a] [PMID: 22936054]
[http://dx.doi.org/10.1039/c2mb05502c] [PMID: 22402705]
[http://dx.doi.org/10.1371/journal.pone.0049108] [PMID: 23173045]
[http://dx.doi.org/10.1021/pr301007j] [PMID: 23298314]
[http://dx.doi.org/10.1371/journal.pone.0089575] [PMID: 24586884]
[http://dx.doi.org/10.1155/2014/528650] [PMID: 25147802]
[http://dx.doi.org/10.1093/bioinformatics/btw380] [PMID: 27334473]
[http://dx.doi.org/10.1038/srep05765] [PMID: 25042424]
[http://dx.doi.org/10.1371/journal.pone.0155370] [PMID: 27183223]
[http://dx.doi.org/10.1093/bioinformatics/bty444] [PMID: 29868863]
[http://dx.doi.org/10.1016/j.jtbi.2018.10.047] [PMID: 30365945]
[http://dx.doi.org/10.1093/nar/30.1.245] [PMID: 11752306]
[http://dx.doi.org/10.1016/j.chemolab.2020.103999]
[http://dx.doi.org/10.1093/bib/bbaa275] [PMID: 33152766]
[http://dx.doi.org/10.1016/j.ymthe.2021.04.004] [PMID: 33823302]
[http://dx.doi.org/10.1093/bfgp/elaa028] [PMID: 33491072]
[http://dx.doi.org/10.1093/bib/bbab167] [PMID: 33975333]
[http://dx.doi.org/10.1093/bioinformatics/btab133] [PMID: 33638635]
[http://dx.doi.org/10.1093/bib/bbab172] [PMID: 33963832]
[http://dx.doi.org/10.1093/bioinformatics/btaa914] [PMID: 33119044]
[http://dx.doi.org/10.1093/bib/bby124] [PMID: 30649170]
[http://dx.doi.org/10.1093/bioinformatics/btaa160] [PMID: 32145017]
[PMID: 32910169]
[http://dx.doi.org/10.2174/1389202921666200219125625] [PMID: 32655295]
[http://dx.doi.org/10.1002/med.21658] [PMID: 31922268]
[http://dx.doi.org/10.1016/j.ygeno.2020.09.065] [PMID: 33017626]
[http://dx.doi.org/10.1007/s10822-020-00323-z] [PMID: 32557165]
[http://dx.doi.org/10.1021/acs.jcim.0c00707] [PMID: 33094610]
[http://dx.doi.org/10.1093/bib/bbab047] [PMID: 33751027]
[http://dx.doi.org/10.1093/bib/bbaa356] [PMID: 33279983]
[http://dx.doi.org/10.1093/bib/bbaa255] [PMID: 33099604]
[PMID: 34184738]
[http://dx.doi.org/10.1002/jcc.26223] [PMID: 32449536]
[http://dx.doi.org/10.1016/j.jmb.2021.166860] [PMID: 33539888]
[http://dx.doi.org/10.1093/bib/bbz123] [PMID: 31633777]
[http://dx.doi.org/10.1093/bioinformatics/btaa702] [PMID: 32766811]
[http://dx.doi.org/10.1093/bioinformatics/bty977] [PMID: 30520972]
[http://dx.doi.org/10.1038/nrg3920] [PMID: 25948244]
[http://dx.doi.org/10.3389/fpls.2020.583323] [PMID: 33193532]
[http://dx.doi.org/10.1016/j.jplph.2020.153354] [PMID: 33385619]
[http://dx.doi.org/10.1016/j.tplants.2014.08.004] [PMID: 25223304]
[http://dx.doi.org/10.1186/s12870-017-1059-6] [PMID: 28662679]
[http://dx.doi.org/10.3389/fpls.2020.555071] [PMID: 33424874]