Abstract
Understanding the interaction mechanism of proteins and nucleic acids is one of the most fundamental problems for genome editing with engineered nucleases. Due to some limitations of experimental investigations, computational methods have played an important role in obtaining the knowledge of protein-nucleic acid interaction. Over the past few years, dozens of computational tools have been used for identification of nucleic acid binding site for site-specific proteins and design of site-specific nucleases because of their significant advantages in genome editing. Here, we review existing widely-used computational tools for target prediction of site-specific proteins as well as off-target prediction of site-specific nucleases. This article provides a list of on-line prediction tools according to their features followed by the description of computational methods used by these tools, which range from various sequence mapping algorithms (like Bowtie, FetchGWI and BLAST) to different machine learning methods (such as Support Vector Machine, hidden Markov models, Random Forest, elastic network and deep neural networks). We also make suggestions on the further development in improving the accuracy of prediction methods. This survey will provide a reference guide for computational biologists working in the field of genome editing.
Keywords: Site-specific protein, engineered nuclease, target prediction, machine learning, genome editing, protein nucleic acid interaction.
Graphical Abstract
[http://dx.doi.org/10.1016/j.tibtech.2013.04.004] [PMID: 23664777]
[http://dx.doi.org/10.1128/MCB.14.12.8096] [PMID: 7969147]
[http://dx.doi.org/10.1074/jbc.M110.202507] [PMID: 21233213]
[http://dx.doi.org/10.1126/science.2028256] [PMID: 2028256]
[http://dx.doi.org/10.1126/science.1215670] [PMID: 22223738]
[http://dx.doi.org/10.1126/science.1216211] [PMID: 22223736]
[http://dx.doi.org/10.1038/nrmicro3279] [PMID: 24909109]
[http://dx.doi.org/10.1016/j.cell.2015.10.008] [PMID: 26478180]
[http://dx.doi.org/10.1002/prot.22846] [PMID: 20848643]
[http://dx.doi.org/10.1039/C7MB00290D] [PMID: 28702594]
[http://dx.doi.org/10.1080/15476286.2018.1457935] [PMID: 29583068]
[http://dx.doi.org/10.2174/0929866525666180905104904] [PMID: 30182833]
[http://dx.doi.org/10.1016/j.omtn.2018.09.020] [PMID: 30388620]
[http://dx.doi.org/10.3389/fgene.2018.00239] [PMID: 30023002]
[http://dx.doi.org/10.1093/bib/bbv066] [PMID: 26283676]
[http://dx.doi.org/10.1093/bioinformatics/btt426] [PMID: 24002109]
[http://dx.doi.org/10.1371/journal.pcbi.1004975] [PMID: 27415801]
[http://dx.doi.org/10.1371/journal.pcbi.1005912] [PMID: 29253885]
[PMID: 28025197]
[PMID: 27345524]
[http://dx.doi.org/10.1371/journal.pcbi.1005455] [PMID: 28339468]
[http://dx.doi.org/10.1093/bioinformatics/bty503] [PMID: 29939227]
[http://dx.doi.org/10.1093/bioinformatics/bty333] [PMID: 29701758]
[http://dx.doi.org/10.1371/journal.pcbi.1006418] [PMID: 30142158]
[http://dx.doi.org/10.1371/journal.pcbi.0010060] [PMID: 16292354]
[http://dx.doi.org/10.1093/nar/gkg161] [PMID: 12527760]
[http://dx.doi.org/10.3389/fpls.2015.00545] [PMID: 26284082]
[http://dx.doi.org/10.1002/pro.2034] [PMID: 22334576]
[http://dx.doi.org/10.1371/journal.pone.0076045] [PMID: 24130757]
[http://dx.doi.org/10.1038/s41598-019-39616-1] [PMID: 30816277]
[http://dx.doi.org/10.1038/nbt.2623] [PMID: 23792628]
[http://dx.doi.org/10.1101/gr.162339.113] [PMID: 24253446]
[http://dx.doi.org/10.1038/nbt0202-135] [PMID: 11821858]
[http://dx.doi.org/10.1073/pnas.95.18.10570] [PMID: 9724744]
[http://dx.doi.org/10.1038/nature03630] [PMID: 15973413]
[http://dx.doi.org/10.1146/annurev-phyto-080508-081936] [PMID: 19400638]
[http://dx.doi.org/10.1126/science.1178817] [PMID: 19933106]
[http://dx.doi.org/10.1126/science.1178811] [PMID: 19933107]
[http://dx.doi.org/10.1534/genetics.110.120717] [PMID: 20660643]
[http://dx.doi.org/10.1093/nar/gkr597] [PMID: 21813459]
[http://dx.doi.org/10.1146/annurev-biochem-072911-172315] [PMID: 23495939]
[http://dx.doi.org/10.1073/pnas.1208507109] [PMID: 22949671]
[http://dx.doi.org/10.1038/nrmicro2577] [PMID: 21552286]
[http://dx.doi.org/10.1126/science.1225829] [PMID: 22745249]
[http://dx.doi.org/10.1186/1471-2164-10-421] [PMID: 19737425]
[http://dx.doi.org/10.1038/nbt.2517] [PMID: 23417094]
[http://dx.doi.org/10.1093/database/bav055] [PMID: 26120138]
[http://dx.doi.org/10.1371/journal.pone.0000579] [PMID: 17593978]
[http://dx.doi.org/10.1038/nmeth.1923] [PMID: 22388286]
[http://dx.doi.org/10.1093/nar/25.17.3389] [PMID: 9254694]
[http://dx.doi.org/10.1093/bioinformatics/16.10.906] [PMID: 11120680]
[http://dx.doi.org/10.1186/1471-2105-11-431] [PMID: 20718988]
[http://dx.doi.org/10.1093/nar/gkr367] [PMID: 21593126]
[http://dx.doi.org/10.1023/A:1010933404324]
[http://dx.doi.org/10.1111/j.1467-9868.2005.00503.x]
[http://dx.doi.org/10.1093/bioinformatics/btl482] [PMID: 17000749]
[http://dx.doi.org/10.1093/bioinformatics/btk036] [PMID: 16403791]
[http://dx.doi.org/10.1080/13658816.2016.1267736]
[http://dx.doi.org/10.1093/bioinformatics/btx780] [PMID: 29228193]
[http://dx.doi.org/10.1093/nar/gkl209] [PMID: 16845061]
[http://dx.doi.org/10.1093/bioinformatics/btn580] [PMID: 19008249]
[http://dx.doi.org/10.1093/nar/gkq319] [PMID: 20435679]
[http://dx.doi.org/10.1186/1471-2105-12-152] [PMID: 21569489]
[http://dx.doi.org/10.1093/nar/gks608] [PMID: 22693217]
[http://dx.doi.org/10.1371/journal.pone.0068464] [PMID: 23869221]
[http://dx.doi.org/10.1371/journal.pcbi.1002962] [PMID: 23526890]
[http://dx.doi.org/10.1093/bioinformatics/btt501] [PMID: 23995255]
[http://dx.doi.org/10.1093/nar/gkt789] [PMID: 24003033]
[http://dx.doi.org/10.1093/nar/gkt1326] [PMID: 24381193]
[http://dx.doi.org/10.1155/2013/270805] [PMID: 24199189]
[http://dx.doi.org/10.1093/bioinformatics/btu048] [PMID: 24463181]
[http://dx.doi.org/10.1038/nmeth.3473] [PMID: 26167643]
[http://dx.doi.org/10.1021/acssynbio.6b00343] [PMID: 28146356]
[http://dx.doi.org/10.1371/journal.pcbi.1005807] [PMID: 29036168]
[http://dx.doi.org/10.1186/s12859-017-1697-6] [PMID: 28587596]
[http://dx.doi.org/10.1093/bioinformatics/bty298] [PMID: 29672669]
[http://dx.doi.org/10.1093/bioinformatics/bty558] [PMID: 30423065]
[http://dx.doi.org/10.1093/bioinformatics/bty748] [PMID: 30169558]
[http://dx.doi.org/10.1093/bioinformatics/bty554] [PMID: 30423072]
[http://dx.doi.org/10.1371/journal.pcbi.0010001] [PMID: 16103898]
[http://dx.doi.org/10.1016/j.molcel.2008.06.016] [PMID: 18657511]
[http://dx.doi.org/10.1038/nmeth0508-374] [PMID: 18446154]
[http://dx.doi.org/10.1038/nprot.2009.98] [PMID: 19798082]
[http://dx.doi.org/10.1038/nbt1410] [PMID: 18587387]
[http://dx.doi.org/10.1093/bioinformatics/12.5.431] [PMID: 8996792]
[http://dx.doi.org/10.1093/nar/gkv469] [PMID: 25964300]
[http://dx.doi.org/10.1038/srep29635] [PMID: 27417285]
[http://dx.doi.org/10.1198/000313008X332421]
[http://dx.doi.org/10.1093/nar/gkr938] [PMID: 22084198]
[http://dx.doi.org/10.1371/journal.pone.0026715] [PMID: 22069465]
[http://dx.doi.org/10.1093/nar/gkt1363] [PMID: 24442582]
[http://dx.doi.org/10.1038/nmeth.1670] [PMID: 21822273]
[http://dx.doi.org/10.1038/nbt.2304] [PMID: 22781676]
[http://dx.doi.org/10.1093/nar/gkt085] [PMID: 23408851]
[http://dx.doi.org/10.1038/nbt.1927] [PMID: 21738127]
[http://dx.doi.org/10.1038/nbt.1940] [PMID: 21822240]
[http://dx.doi.org/10.1093/nar/29.1.308] [PMID: 11125122]
[http://dx.doi.org/10.1093/bioinformatics/btq671] [PMID: 21208982]
[http://dx.doi.org/10.1186/1748-7188-6-26] [PMID: 22115189]
[http://dx.doi.org/10.1126/science.1232033] [PMID: 23287722]
[http://dx.doi.org/10.1126/science.1231143] [PMID: 23287718]
[http://dx.doi.org/10.1073/pnas.1313587110] [PMID: 23940360]
[http://dx.doi.org/10.1038/nbt.3101] [PMID: 25503383]
[http://dx.doi.org/10.1038/nature14299] [PMID: 25830891]
[http://dx.doi.org/10.1038/nbt.3117] [PMID: 25513782]
[http://dx.doi.org/10.1038/nature16526] [PMID: 26735016]
[http://dx.doi.org/10.1126/science.aad5227] [PMID: 26628643]
[http://dx.doi.org/10.1371/journal.pone.0124633] [PMID: 25909470]
[http://dx.doi.org/10.1038/nbt.3437] [PMID: 26780180]
[http://dx.doi.org/10.1038/nbt.2800] [PMID: 24535568]
[http://dx.doi.org/10.1126/science.1246981] [PMID: 24336569]
[http://dx.doi.org/10.2307/2531595] [PMID: 3203132]
[http://dx.doi.org/10.1093/bioinformatics/14.9.755] [PMID: 9918945]
[http://dx.doi.org/10.1145/2939672.2939785]
[http://dx.doi.org/10.1109/TSMCB.2002.804363] [PMID: 15369099]
[http://dx.doi.org/10.1101/gr.191452.115] [PMID: 26063738]
[http://dx.doi.org/10.1038/nbt.2647] [PMID: 23873081]
[http://dx.doi.org/10.1093/nar/gkv575] [PMID: 26032770]
[http://dx.doi.org/10.1186/s13059-016-1012-2] [PMID: 27380939]
[http://dx.doi.org/10.1101/gr.097857.109] [PMID: 19858363]
[http://dx.doi.org/10.1101/gr.3715005] [PMID: 16024819]
[http://dx.doi.org/10.1038/nmeth.1906] [PMID: 22373907]
[http://dx.doi.org/10.1038/nprot.2017.124] [PMID: 29120462]
[http://dx.doi.org/10.1038/nmeth.1937] [PMID: 22426492]