Abstract
Background: Carboxylation is one of the most biologically important post-translational modifications and occurs on lysine, arginine, and glutamine residues of a protein. Among all these three, the covalent attachment of the carboxyl group with the lysine side chain is the most frequent and biologically important type of carboxylation. For studying such biological functions, it is essential to correctly determine the lysine sites sensitive to carboxylation.
Objective: Herein, we present a computational model for the prediction of the carboxylysine site which is based on machine learning.
Methods: Various position and composition relative features have been incorporated into the Pse- AAC for construction of feature vectors and a neural network is employed as a classifier. The model is validated by jackknife, cross-validation, self-consistency, and independent testing.
Results: The results of the self-consistency test elaborated that model has 99.76% Acc, 99.76% Sp, 99.76% Sp, and 0.99 MCC.Using the jackknife method, prediction model validation gave 97.07% Acc, while for 10-fold cross-validation, prediction model validation gave 95.16% Acc.
Conclusion: The results of independent dataset testing were 94.3% which illustrated that the proposed model has better performance as compared to the existing model PreLysCar; however, the accuracy can be improved further, in the future, due to the increasing number of carboxylysine sites in proteins.
Keywords: Carboxylation, carboxylysine, statistical moments, PseAAC, 5-step rule, lysine.
Graphical Abstract
[http://dx.doi.org/10.1046/j.0014-2956.2001.02524.x] [PMID: 11722566]
[http://dx.doi.org/10.1073/pnas.241442898] [PMID: 11724923]
[http://dx.doi.org/10.1073/pnas.1210754109]
[http://dx.doi.org/10.1021/ja303168n] [PMID: 22702961]
[http://dx.doi.org/10.1126/science.7855593] [PMID: 7855593]
[http://dx.doi.org/10.1021/bi00648a012] [PMID: 3199]
[http://dx.doi.org/10.1110/ps.083495908] [PMID: 18434499]
[http://dx.doi.org/10.1007/978-94-007-6232-9_7]
[http://dx.doi.org/10.1016/S0969-2126(00)00109-X] [PMID: 10745008]
[http://dx.doi.org/10.1136/mp.53.1.8] [PMID: 10884915]
[http://dx.doi.org/10.1016/j.compbiolchem.2013.09.002] [PMID: 24184705]
[http://dx.doi.org/10.1107/S139900471302364X] [PMID: 24419378]
[http://dx.doi.org/10.1371/journal.pone.0181966] [PMID: 28797096]
[http://dx.doi.org/10.1155/2016/8370132]
[http://dx.doi.org/10.1007/s00232-016-9937-7] [PMID: 27866233]
[http://dx.doi.org/10.1007/s11033-018-4391-5] [PMID: 30238411]
[http://dx.doi.org/10.1007/s00521-013-1372-4]
[http://dx.doi.org/10.1155/2014/875879]
[http://dx.doi.org/10.1155/2014/723595]
[http://dx.doi.org/10.1016/j.ab.2018.04.021] [PMID: 29704476]
[http://dx.doi.org/10.1007/s11033-018-4417-z] [PMID: 30311130]
[http://dx.doi.org/10.1016/j.jtbi.2010.12.024] [PMID: 21168420]
[http://dx.doi.org/10.1016/j.ab.2018.09.002] [PMID: 30201554]
[PMID: 30010789]
[http://dx.doi.org/10.1016/j.jtbi.2018.09.005] [PMID: 30201434]
[PMID: 29842950]
[PMID: 30179658]
[http://dx.doi.org/10.1016/j.jtbi.2018.07.032] [PMID: 30056084]
[http://dx.doi.org/10.1016/j.jtbi.2018.05.033] [PMID: 29870696]
[PMID: 30196077]
[http://dx.doi.org/10.1093/bioinformatics/bts565] [PMID: 23060610]
[http://dx.doi.org/10.1093/protein/14.2.75] [PMID: 11297664]
[PMID: 30593778]
[http://dx.doi.org/10.1016/j.jtbi.2019.02.007] [PMID: 30768975]
[PMID: 30550863]
[http://dx.doi.org/10.1155/2013/530696]
[http://dx.doi.org/10.7717/peerj.171] [PMID: 24109555]
[http://dx.doi.org/10.1016/j.ygeno.2015.12.005] [PMID: 26724497]
[http://dx.doi.org/10.1002/minf.201600010] [PMID: 28488814]
[http://dx.doi.org/10.18632/oncotarget.9057] [PMID: 27147572]
[http://dx.doi.org/10.1093/nar/gku1019] [PMID: 25361964]
[http://dx.doi.org/10.1371/journal.pone.0105018] [PMID: 25121969]
[http://dx.doi.org/10.1016/j.jtbi.2016.01.020] [PMID: 26807806]
[http://dx.doi.org/10.18632/oncotarget.11975] [PMID: 27626500]
[http://dx.doi.org/10.18632/oncotarget.7815] [PMID: 26942877]
[http://dx.doi.org/10.1016/j.omtn.2017.04.008] [PMID: 28624202]
[http://dx.doi.org/10.1093/bioinformatics/btw539] [PMID: 27531102]
[http://dx.doi.org/10.18632/oncotarget.13758] [PMID: 27926534]
[http://dx.doi.org/10.1016/j.omtn.2017.03.006] [PMID: 28624191]
[http://dx.doi.org/10.1093/bioinformatics/btx579] [PMID: 28968797]
[http://dx.doi.org/10.1038/s41598-018-19491-y] [PMID: 29348418]
[http://dx.doi.org/10.1016/j.ygeno.2018.01.005] [PMID: 29360500]
[http://dx.doi.org/10.1039/C1MB05420A] [PMID: 22134333]
[http://dx.doi.org/10.1039/c3mb25466f] [PMID: 23370050]
[http://dx.doi.org/10.1016/j.jtbi.2011.06.005] [PMID: 21684290]
[http://dx.doi.org/10.1016/j.ab.2013.01.019] [PMID: 23395824]
[http://dx.doi.org/10.1039/c3mb25555g] [PMID: 23536215]
[http://dx.doi.org/10.3109/10409239509083488] [PMID: 7587280]
[http://dx.doi.org/10.1016/j.jtbi.2014.09.029] [PMID: 25264267]
[http://dx.doi.org/10.1007/s00726-014-1711-5] [PMID: 24623121]
[http://dx.doi.org/10.1016/j.bbrc.2005.06.075] [PMID: 15993842]
[http://dx.doi.org/10.1016/j.jtbi.2014.10.008] [PMID: 25454009]
[http://dx.doi.org/10.1016/j.jtbi.2014.04.006] [PMID: 24732262]
[http://dx.doi.org/10.1016/j.jtbi.2014.07.003] [PMID: 25026218]
[http://dx.doi.org/10.3390/ijms15021746] [PMID: 24469313]
[http://dx.doi.org/10.1007/s00726-006-0478-8] [PMID: 17235453]
[http://dx.doi.org/10.1039/c1mb05232b] [PMID: 21984117]
[http://dx.doi.org/10.1002/prot.10251] [PMID: 12471598]
[http://dx.doi.org/10.1039/C7MB00267J] [PMID: 28702580]
[http://dx.doi.org/10.1016/j.gene.2017.07.036] [PMID: 28728979]
[http://dx.doi.org/10.1093/bioinformatics/btx476] [PMID: 29036535]
[http://dx.doi.org/10.1093/bioinformatics/btx387] [PMID: 28172617]
[http://dx.doi.org/10.18632/oncotarget.17104] [PMID: 28476023]
[http://dx.doi.org/10.4236/ns.2009.12011]
[http://dx.doi.org/10.1093/bioinformatics/bty668] [PMID: 30052767]
[http://dx.doi.org/10.7150/ijbs.23328] [PMID: 29989091]
[PMID: 28171531]
[http://dx.doi.org/10.2174/1573406411666141229162834] [PMID: 25548930]
[http://dx.doi.org/10.2174/1568026617666170414145508] [PMID: 28413951]