Abstract
Background: Over the last few decades, a search for the theory of protein folding has grown into a full-fledged research field at the intersection of biology, chemistry and informatics. Despite enormous effort, there are still open questions and challenges, like understanding the rules by which amino acid sequence determines protein secondary structure.
Objective: In this review, we depict the progress of the prediction methods over the years and identify sources of improvement.
Methods: The protein secondary structure prediction problem is described followed by the discussion on theoretical limitations, description of the commonly used data sets, features and a review of three generations of methods with the focus on the most recent advances. Additionally, methods with available online servers are assessed on the independent data set.
Results: The state-of-the-art methods are currently reaching almost 88% for 3-class prediction and 76.5% for an 8-class prediction.
Conclusion: This review summarizes recent advances and outlines further research directions.
Keywords: Protein secondary structure prediction, multiple sequence alignment, PSSM, HHblits, deep neural networks, machine learning, protein early-stage structure.
Graphical Abstract
[http://dx.doi.org/10.1126/science.181.4096.223] [PMID: 4124164]
[http://dx.doi.org/10.1016/S0022-2836(05)80007-5] [PMID: 8289237]
[http://dx.doi.org/10.1073/pnas.37.4.205] [PMID: 14816373]
[http://dx.doi.org/10.1073/pnas.37.11.729] [PMID: 16578412]
[http://dx.doi.org/10.1093/bib/bbw129] [PMID: 28040746]
[http://dx.doi.org/10.1093/nar/gky092] [PMID: 29425356]
[http://dx.doi.org/10.1093/nar/gkw1099] [PMID: 27899622]
[http://dx.doi.org/10.1093/nar/28.1.235] [PMID: 10592235]
[http://dx.doi.org/10.1038/srep11476] [PMID: 26098304]
[http://dx.doi.org/10.1186/s12859-018-2280-5] [PMID: 30075707]
[http://dx.doi.org/10.1016/B978-8-1312-2297-3.50001-1]
[http://dx.doi.org/10.1016/j.jmgm.2017.07.015] [PMID: 28763690]
[http://dx.doi.org/10.1007/s00500-005-0489-5]
[http://dx.doi.org/10.1155/2013/347106] [PMID: 23766688]
[http://dx.doi.org/10.1002/bip.360221211] [PMID: 6667333]
[http://dx.doi.org/10.1002/(SICI)1097-0134(19990301)34:4<508:AID-PROT10>3.0.CO;2-4] [PMID: 10081963]
[http://dx.doi.org/10.1002/prot.340230412] [PMID: 8749853]
[http://dx.doi.org/10.1038/261552a0] [PMID: 934293]
[http://dx.doi.org/10.1155/JBB.2005.65] [PMID: 16046811]
[http://dx.doi.org/10.1007/s00894-013-1909-6] [PMID: 23812949]
[http://dx.doi.org/10.1006/jtbi.1995.0245] [PMID: 8746328]
[http://dx.doi.org/10.1109/34.824819]
[http://dx.doi.org/10.1021/ci200321u] [PMID: 22224407]
[http://dx.doi.org/10.1006/jsbi.2001.4336] [PMID: 11551180]
[http://dx.doi.org/10.1002/prot.21654] [PMID: 17932927]
[http://dx.doi.org/10.1021/bi00699a002] [PMID: 4358940]
[http://dx.doi.org/10.1073/pnas.86.1.152] [PMID: 2911565]
[http://dx.doi.org/10.1016/0014-5793(88)81066-4] [PMID: 3197832]
[http://dx.doi.org/10.1093/protein/5.7.647] [PMID: 1480619]
[http://dx.doi.org/10.1016/0022-2836(78)90297-8] [PMID: 642007]
[http://dx.doi.org/10.1073/pnas.70.5.1473] [PMID: 4514316]
[http://dx.doi.org/10.1016/0022-2836(74)90405-7] [PMID: 4427384]
[http://dx.doi.org/10.1073/pnas.90.16.7558] [PMID: 8356056]
[http://dx.doi.org/10.1006/jmbi.2001.4580] [PMID: 11327775]
[http://dx.doi.org/10.1093/bioinformatics/btg223] [PMID: 12967961]
[http://dx.doi.org/10.1186/1471-2105-9-49] [PMID: 18218144]
[http://dx.doi.org/10.1186/1471-2105-7-178] [PMID: 16571137]
[http://dx.doi.org/10.1006/jmbi.1999.3091] [PMID: 10493868]
[http://dx.doi.org/10.1002/prot.21298] [PMID: 17177203]
[http://dx.doi.org/10.1006/jmbi.1993.1413] [PMID: 8345525]
[http://dx.doi.org/10.1007/11816102_48]
[http://dx.doi.org/10.1002/1097-0134(20000815)40:3<502:AID-PROT170>3.0.CO;2-Q] [PMID: 10861942]
[http://dx.doi.org/10.1093/nar/25.17.3389] [PMID: 9254694]
[http://dx.doi.org/10.1073/pnas.84.13.4355] [PMID: 3474607]
[http://dx.doi.org/10.1073/pnas.89.22.10915] [PMID: 1438297]
[http://dx.doi.org/10.1093/bioinformatics/btr611] [PMID: 22065541]
[http://dx.doi.org/10.1109/TCBB.2010.93] [PMID: 20855926]
[http://dx.doi.org/10.1038/nmeth.1818] [PMID: 22198341]
[http://dx.doi.org/10.1038/srep02619] [PMID: 24018415]
[http://dx.doi.org/10.1186/s12859-016-1375-0] [PMID: 28155710]
[http://dx.doi.org/10.1111/j.1399-3011.1988.tb01261.x] [PMID: 3209351]
[http://dx.doi.org/10.1016/0022-2836(87)90501-8] [PMID: 3656439]
[http://dx.doi.org/10.1093/nar/28.1.374] [PMID: 10592278]
[http://dx.doi.org/10.1093/bioinformatics/btx218] [PMID: 28430949]
[http://dx.doi.org/10.1016/j.compbiomed.2011.08.005] [PMID: 21880310]
[http://dx.doi.org/10.1110/ps.037762.108] [PMID: 18780815]
[http://dx.doi.org/10.1002/prot.1035] [PMID: 11288174]
[http://dx.doi.org/10.1002/jcc.20554] [PMID: 17330882]
[http://dx.doi.org/10.1002/prot.10500] [PMID: 14517979]
[http://dx.doi.org/10.1016/j.jtbi.2008.11.003] [PMID: 19056401]
[http://dx.doi.org/10.1016/j.chemolab.2015.01.004]
[http://dx.doi.org/10.1016/j.jtbi.2012.10.033] [PMID: 23137835]
[http://dx.doi.org/10.1016/j.jtbi.2010.10.019] [PMID: 20969879]
[http://dx.doi.org/10.2174/092986611797200931] [PMID: 21605055]
[http://dx.doi.org/10.1016/j.ab.2013.05.024] [PMID: 23756733]
[http://dx.doi.org/10.2174/1389200219666181031105916] [PMID: 30378494]
[http://dx.doi.org/10.1002/prot.25487] [PMID: 29492997]
[http://dx.doi.org/10.1155/2013/530696]
[http://dx.doi.org/10.1016/S1088-467X(97)00008-5]
[http://dx.doi.org/10.1039/C4MB00316K] [PMID: 24931825]
[http://dx.doi.org/10.1016/j.neucom.2014.12.123]
[http://dx.doi.org/10.1093/bioinformatics/bty827] [PMID: 30247625]
[http://dx.doi.org/10.1016/0167-8655(94)90127-9]
[http://dx.doi.org/10.1007/BF01096763]
[http://dx.doi.org/10.1007/978-3-540-35488-8_1]
[http://dx.doi.org/10.1002/prot.21164] [PMID: 16948155]
[http://dx.doi.org/10.1002/(SICI)1097-0134(19990201)34:2<220::AID-PROT7>3.0.CO;2-K] [PMID: 10022357]
[http://dx.doi.org/10.1016/0005-2795(75)90109-9] [PMID: 1180967]
[http://dx.doi.org/10.1093/nar/gkt1240] [PMID: 24304899]
[http://dx.doi.org/10.1093/protein/13.9.607] [PMID: 11054454]
[http://dx.doi.org/10.1002/pro.5560030317] [PMID: 8019422]
[http://dx.doi.org/10.1002/(SICI)1097-0134(1999)37:3+<2::AIDPROT2>3.0.CO;2-2] [PMID: 10526346]
[http://dx.doi.org/10.1002/prot.23200] [PMID: 21997831]
[http://dx.doi.org/10.1002/jcc.21968] [PMID: 22045506]
[http://dx.doi.org/10.1021/ci400647u] [PMID: 24571803]
[http://dx.doi.org/10.1186/1471-2105-15-S8-S3] [PMID: 25080939]
[http://dx.doi.org/10.1002/prot.25064] [PMID: 27171127]
[http://dx.doi.org/10.1002/prot.25415] [PMID: 29082672]
[http://dx.doi.org/10.1093/nar/gki402]
[http://dx.doi.org/10.1093/bioinformatics/btg224] [PMID: 12912846]
[http://dx.doi.org/10.1002/pmic.201100196] [PMID: 21805636]
[http://dx.doi.org/10.1002/prot.10082] [PMID: 11933069]
[http://dx.doi.org/10.1002/prot.10328] [PMID: 12577269]
[http://dx.doi.org/10.1093/bioinformatics/btl158] [PMID: 16731699]
[http://dx.doi.org/10.1093/bioinformatics/btq003] [PMID: 20053844]
[http://dx.doi.org/10.1101/289033]
[http://dx.doi.org/10.1093/bioinformatics/btt344] [PMID: 23772049]
[http://dx.doi.org/10.1093/nar/gkv332] [PMID: 25883141]
[http://dx.doi.org/10.1093/nar/gkz297] [PMID: 31251384]
[http://dx.doi.org/10.1093/bioinformatics/btv665] [PMID: 26568622]
[http://dx.doi.org/10.1110/ps.035691.108] [PMID: 18519808]
[http://dx.doi.org/10.1002/prot.21020] [PMID: 16799934]
[http://dx.doi.org/10.1093/bioinformatics/btu352] [PMID: 24860169]
[http://dx.doi.org/10.1038/nprot.2012.085] [PMID: 22814390]
[http://dx.doi.org/10.1002/prot.340180402] [PMID: 8208723]
[http://dx.doi.org/10.1371/journal.pone.0205214] [PMID: 30620738]
[http://dx.doi.org/10.1093/bioinformatics/bts475] [PMID: 22847931]
[http://dx.doi.org/10.1145/2347736.2347755]