Abstract
Proteins are large molecules consisting of a linear sequence of amino acids. Protein performs biological functions with specific 3D structures. The main factors that drive proteins to form these structures are constraint between residues. These constraints usually lead to important inter-residue relationships, including short-range inter-residue contacts and long-range interresidue distances. Thus, a highly accurate prediction of inter-residue contact and distance information is of great significance for protein tertiary structure computations. Some methods have been proposed for inter-residue contact prediction, most of which focus on contact map prediction and some reviews have summarized the progresses. However, inter-residue distance prediction is found to provide better guidance for protein structure prediction than contact map prediction in recent years. The methods for inter-residue distance prediction can be roughly divided into two types according to the consideration of distance value: one is based on multi-classification with discrete value and the other is based on regression with continuous value. Here, we summarize these algorithms and show that they have obtained good results. Compared to contact map prediction, distance map prediction is in its infancy. There is a lot to do in the future including improving distance map prediction precision and incorporating them into residue-residue distanceguided ab initio protein folding.
Keywords: Machine learning, deep learning, protein structure prediction, contact map, inter-residue distance, distance map.
Graphical Abstract
[http://dx.doi.org/10.1006/jmbi.1993.1332] [PMID: 7685827]
[http://dx.doi.org/10.1093/bioinformatics/btv472] [PMID: 26275894]
[http://dx.doi.org/10.1093/bioinformatics/btu791] [PMID: 25431331]
[http://dx.doi.org/10.1093/bioinformatics/btx781] [PMID: 29228185]
[http://dx.doi.org/10.1093/bioinformatics/bty481] [PMID: 29931279]
[http://dx.doi.org/10.2174/0929867324666170623092503] [PMID: 28641555]
[http://dx.doi.org/10.2174/1386207319666160927111347] [PMID: 27686428]
[http://dx.doi.org/10.1002/jcc.26048] [PMID: 31410856]
[http://dx.doi.org/10.1093/bioinformatics/btz183] [PMID: 30873528]
[http://dx.doi.org/10.1023/A:1008380219900]
[http://dx.doi.org/10.1016/j.toxicon.2005.07.018] [PMID: 16185738]
[http://dx.doi.org/10.1137/120875909]
[http://dx.doi.org/10.1186/1471-2105-14-S9-S7] [PMID: 23901894]
[http://dx.doi.org/10.1007/s11538-009-9431-9] [PMID: 19533250]
[http://dx.doi.org/10.1038/s41580-019-0163-x]
[http://dx.doi.org/10.2174/1574893613666181109130430]
[http://dx.doi.org/10.1073/pnas.1111471108]
[http://dx.doi.org/10.1016/j.jcp.2014.07.024]
[http://dx.doi.org/10.1371/journal.pone.0092721] [PMID: 24663061]
[http://dx.doi.org/10.1093/bioinformatics/bty341] [PMID: 29718112]
[http://dx.doi.org/10.1016/j.cels.2017.11.014] [PMID: 29275173]
[http://dx.doi.org/10.1093/bioinformatics/bty1036] [PMID: 30590407]
[http://dx.doi.org/10.1016/j.bbrc.2016.01.188] [PMID: 26920058]
[http://dx.doi.org/10.1093/bioinformatics/bti671] [PMID: 16159918]
[http://dx.doi.org/10.1186/s12859-019-3051-7]
[http://dx.doi.org/10.1093/bioinformatics/btu500] [PMID: 25064567]
[http://dx.doi.org/10.1371/journal.pone.0028766] [PMID: 22163331]
[http://dx.doi.org/10.1093/bioinformatics/btt211] [PMID: 23812992]
[http://dx.doi.org/10.1093/bioinformatics/btr638] [PMID: 22101153]
[http://dx.doi.org/10.1186/1471-2105-15-85] [PMID: 24669753]
[http://dx.doi.org/10.1016/j.csbj.2018.10.009] [PMID: 30505403]
[http://dx.doi.org/10.1073/pnas.1314045110] [PMID: 24009338]
[http://dx.doi.org/10.1186/1471-2105-15-6] [PMID: 24410833]
[http://dx.doi.org/10.1093/bioinformatics/bty1036] [PMID: 30590407]
[http://dx.doi.org/10.1186/1472-6807-9-5] [PMID: 19183478]
[http://dx.doi.org/10.1371/journal.pone.0205214] [PMID: 30620738]
[http://dx.doi.org/10.1371/journal.pcbi.1005324] [PMID: 28056090]
[http://dx.doi.org/10.1002/prot.25810] [PMID: 31471916]
[http://dx.doi.org/10.1093/bioinformatics/bty278] [PMID: 29949980]
[http://dx.doi.org/10.1109/CVPR.2016.90]
[http://dx.doi.org/10.1371/journal.pcbi.1003889] [PMID: 25375897]
[http://dx.doi.org/10.1073/pnas.1821309116]
[http://dx.doi.org/10.1109/TCBB.2008.27] [PMID: 18670040]
[http://dx.doi.org/10.1002/pro.5560051108] [PMID: 8931140]
[http://dx.doi.org/10.1016/S1359-0278(97)00041-2] [PMID: 9377713]
[http://dx.doi.org/10.1093/bioinformatics/btu458] [PMID: 25161237]
[http://dx.doi.org/10.1002/prot.24829] [PMID: 25974172]
[http://dx.doi.org/10.1186/s12859-018-2032-6] [PMID: 29370750]
[http://dx.doi.org/10.1186/s12859-017-1807-5] [PMID: 28851269]
[http://dx.doi.org/10.1002/prot.25407] [PMID: 29071738]
[http://dx.doi.org/10.1002/jcc.540100706]
[http://dx.doi.org/10.1016/0165-6147(89)90173-9]
[http://dx.doi.org/10.1007/s10957-011-9806-6]
[http://dx.doi.org/10.1111/j.1475-3995.2009.00757.x]
[http://dx.doi.org/10.1080/10556788.2011.643888]
[http://dx.doi.org/10.1186/s12859-018-2065-x] [PMID: 29745828]