Abstract
Background: Amino acid physicochemical properties encoded in protein primary structure play a crucial role in protein folding. However, it is not yet clear which of the properties are the most suitable for protein fold classification.
Objective: To avoid exhaustively searching the total properties space, an amino acid properties selection method was proposed in this study to rapidly obtain a suitable properties combination for protein fold classification.
Methods: The proposed amino acid properties selection method was based on sequential floating forward selection strategy. Beginning with an empty set, variable number of features were added iteratively until achieving the iteration termination condition.
Results: The experimental results indicate that the proposed method improved prediction accuracies by 0.26-5% on a widely used benchmark dataset with appropriately selected amino acid properties.
Conclusion: The proposed properties selection method can be extended to other biomolecule property related classification problems in bioinformatics.
Keywords: Protein fold classification, amino acid properties, sequential floating forward selection, auto-cross correlation, Naive Bayes, support vector machine.
Graphical Abstract
[http://dx.doi.org/10.1042/bj1280737] [PMID: 4565129]
[http://dx.doi.org/10.2174/138920305774329368] [PMID: 16248794]
[http://dx.doi.org/10.1016/S0022-2836(05)80134-2] [PMID: 7723011]
[http://dx.doi.org/10.1093/nar/gkt1242] [PMID: 24293656]
[http://dx.doi.org/10.1093/bib/bby053] [PMID: 29947743]
[http://dx.doi.org/10.1002/prot.23025] [PMID: 21538542]
[http://dx.doi.org/10.3390/ijms17122118] [PMID: 27999256]
[http://dx.doi.org/10.1073/pnas.92.19.8700] [PMID: 7568000]
[http://dx.doi.org/10.1093/bioinformatics/17.4.349] [PMID: 11301304]
[http://dx.doi.org/10.1093/bioinformatics/btl170] [PMID: 16672258]
[http://dx.doi.org/10.1093/bioinformatics/btp500] [PMID: 19706744]
[http://dx.doi.org/10.1109/TCBB.2013.2296317] [PMID: 26356019]
[http://dx.doi.org/10.1016/j.jtbi.2015.05.030] [PMID: 26079221]
[http://dx.doi.org/10.1016/j.jtbi.2015.12.018] [PMID: 26801876]
[http://dx.doi.org/10.1016/j.jtbi.2017.03.023] [PMID: 28351701]
[PMID: 28039166]
[http://dx.doi.org/10.1093/bioinformatics/btm527] [PMID: 17989092]
[http://dx.doi.org/10.1002/prot.1035] [PMID: 11288174]
[http://dx.doi.org/10.1371/journal.pone.0056499] [PMID: 23437146]
[http://dx.doi.org/10.1504/IJDMB.2015.066359] [PMID: 26255379]
[http://dx.doi.org/10.1093/bioinformatics/btm475] [PMID: 17942446]
[http://dx.doi.org/10.1186/1471-2105-15-S16-S12] [PMID: 25521502]
[http://dx.doi.org/10.1016/j.jtbi.2008.10.007] [PMID: 18996396]
[http://dx.doi.org/10.1186/1471-2105-15-S11-S14] [PMID: 25350499]
[http://dx.doi.org/10.1093/bioinformatics/btn112] [PMID: 18378524]
[http://dx.doi.org/10.1126/science.1219021] [PMID: 23180855]
[http://dx.doi.org/10.1109/TNB.2009.2016488] [PMID: 19278932]
[http://dx.doi.org/10.1038/14918] [PMID: 10542095]
[http://dx.doi.org/10.1093/bioinformatics/btp164] [PMID: 19351620]
[http://dx.doi.org/10.1093/bioinformatics/btw564] [PMID: 27565583]
[http://dx.doi.org/10.1186/1471-2105-14-233] [PMID: 23879571]
[http://dx.doi.org/10.1093/nar/gkr284] [PMID: 21609959]
[http://dx.doi.org/10.2174/1570178614666170511165837]
[http://dx.doi.org/10.1109/TNB.2016.2555951] [PMID: 28113908]
[http://dx.doi.org/10.1016/j.compbiomed.2011.05.015] [PMID: 21664609]
[http://dx.doi.org/10.1016/j.ab.2018.09.002] [PMID: 30201554]
[http://dx.doi.org/10.1038/srep33483] [PMID: 27641752]
[PMID: 17998252]
[http://dx.doi.org/10.1007/s10100-017-0479-6] [PMID: 29375266]
[http://dx.doi.org/10.1093/biomet/54.1-2.167] [PMID: 6049533]
[http://dx.doi.org/10.1007/s10994-005-4258-6]
[http://dx.doi.org/10.1016/j.ab.2014.04.032] [PMID: 24802134]
[http://dx.doi.org/10.1093/bioinformatics/bty943] [PMID: 30428009]
[http://dx.doi.org/10.1371/journal.pone.0145541] [PMID: 26713618]
[http://dx.doi.org/10.1145/1961189.1961199]
[http://dx.doi.org/10.1016/j.jtbi.2010.12.024] [PMID: 21168420]
[http://dx.doi.org/10.1039/C4MB00681J] [PMID: 25607774]
[http://dx.doi.org/10.1371/journal.pone.0075726] [PMID: 24130738]
[http://dx.doi.org/10.1016/j.peptides.2009.06.032] [PMID: 19591890]
[http://dx.doi.org/10.1093/nar/gks1450] [PMID: 23303794]
[http://dx.doi.org/10.1080/1062936X.2019.1573438] [PMID: 30739484]
[http://dx.doi.org/10.1155/2013/567529] [PMID: 24062796]
[http://dx.doi.org/10.1155/2013/530696] [PMID: 23762187]
[http://dx.doi.org/10.1016/j.jtbi.2017.12.025] [PMID: 29305179]
[http://dx.doi.org/10.1093/bioinformatics/btx479] [PMID: 28961687]
[http://dx.doi.org/10.18632/oncotarget.11975] [PMID: 27626500]
[http://dx.doi.org/10.1093/bioinformatics/bty827] [PMID: 30247625]
[http://dx.doi.org/10.1093/bioinformatics/btz015] [PMID: 30624619]
[http://dx.doi.org/10.7150/ijbs.24174] [PMID: 29989085]
[http://dx.doi.org/10.1016/j.omtn.2019.08.023]
[http://dx.doi.org/10.1093/bioinformatics/btu602] [PMID: 25231908]
[http://dx.doi.org/10.1016/j.knosys.2018.10.007]