Abstract
Background: DNA N6-methyladenine plays an important role in the restriction-modification system to isolate invasion from adventive DNA. The shortcomings of the high time consumption and high costs of experimental methods have been exposed, and some computational methods have emerged. The support vector machine theory has received extensive attention in the bioinformatics field due to its solid theoretical foundation and many good characteristics.
Objective: General machine learning methods include an important step of extracting features. The research has omitted this step and replaced with easy-to-obtain sequence distances matrix to obtain better results.
Methods: First sequence alignment technology was used to achieve the similarity matrix. Then, a novel transformation turned the similarity matrix into a distance matrix. Next, the similarity-distance matrix was made positive semi-definite so that it can be used in the kernel matrix. Finally, the LIBSVM software was applied to solve the support vector machine.
Results: The five-fold cross-validation of this model on rice and mouse data has achieved excellent accuracy rates of 92.04% and 96.51%, respectively. This shows that the DB-SVM method has obvious advantages over traditional machine learning methods. Meanwhile, this model achieved 0.943,0.982 and 0.818 accuracy; 0.944, 0.982, and 0.838 Matthews correlation coefficient; and 0.942, 0.982 and 0.840 F1 scores for the rice, M. musculus and cross-species genome datasets, respectively.
Conclusion: These outcomes show that this model outperforms the iIM-CNN and csDMA in the prediction of DNA 6mA modification, which is the latest research finding on DNA 6mA.
Keywords: DNA N6-methyladenine, sequence alignment, machine learning, support vector machine, distance matrixkernel method.
Graphical Abstract
[http://dx.doi.org/10.1093/nar/gki987] [PMID: 16326863]
[http://dx.doi.org/10.1038/175336a0] [PMID: 13235889]
[http://dx.doi.org/10.1038/2181066a0] [PMID: 5656625]
[http://dx.doi.org/10.1128/MMBR.00016-06] [PMID: 16959970]
[http://dx.doi.org/10.1038/nsmb.3145] [PMID: 26689968]
[http://dx.doi.org/10.1016/j.molcel.2018.06.015] [PMID: 30017583]
[http://dx.doi.org/10.2174/1566523220666200623170738] [PMID: 32576128]
[http://dx.doi.org/10.1016/j.ygeno.2018.01.005] [PMID: 29360500]
[http://dx.doi.org/10.1093/bioinformatics/btz015] [PMID: 30624619]
[http://dx.doi.org/10.1007/BF00994018]
[http://dx.doi.org/10.1093/bib/bbaa395] [PMID: 33415328]
[http://dx.doi.org/10.1016/j.omtn.2018.07.004] [PMID: 30081234]
[http://dx.doi.org/10.1109/TCBB.2018.2858756] [PMID: 30040651]
[http://dx.doi.org/10.1093/bib/bbz098] [PMID: 31665221]
[http://dx.doi.org/10.1109/TCBB.2017.2776280] [PMID: 29990255]
[http://dx.doi.org/10.1109/TCBB.2016.2520947] [PMID: 26890920]
[http://dx.doi.org/10.1093/bioinformatics/btaa131] [PMID: 32105326]
[http://dx.doi.org/10.3389/fgene.2019.00003] [PMID: 30761178]
[http://dx.doi.org/10.1093/bioinformatics/btab463] [PMID: 34145885]
[http://dx.doi.org/10.1093/comjnl/bxr020]
[http://dx.doi.org/10.1093/bioinformatics/bty002] [PMID: 29365045]
[http://dx.doi.org/10.1093/bioinformatics/btaa428] [PMID: 32467970]
[http://dx.doi.org/10.1093/nar/gkz740] [PMID: 31504851]
[http://dx.doi.org/10.1016/0898-1221(91)90164-Y]
[http://dx.doi.org/10.1016/j.ins.2018.12.019]
[http://dx.doi.org/10.1016/j.chemolab.2006.07.001]
[http://dx.doi.org/10.1093/nar/30.11.2478] [PMID: 12034836]
[http://dx.doi.org/10.1145/1961189.1961199]
[http://dx.doi.org/10.1093/bib/bby090] [PMID: 30239587]
[http://dx.doi.org/10.1093/bioinformatics/bts565] [PMID: 23060610]
[PMID: 2181150]
[http://dx.doi.org/10.1016/0888-7543(91)90071-L] [PMID: 1774068]
[http://dx.doi.org/10.1089/cmb.2017.0040] [PMID: 29116822]
[http://dx.doi.org/10.1093/bioinformatics/btv177] [PMID: 25812743]
[http://dx.doi.org/10.2174/1574893614666190723120716]
[http://dx.doi.org/10.1016/j.compbiolchem.2020.107304] [PMID: 32580129]
[http://dx.doi.org/10.2174/1574893614666181123155831]
[http://dx.doi.org/10.2174/1574893614666191114123453]
[PMID: 32766811]
[http://dx.doi.org/10.1093/bib/bby124] [PMID: 30649170]
[http://dx.doi.org/10.1016/j.neucom.2020.09.028]
[http://dx.doi.org/10.3390/genes9030158] [PMID: 29534013]
[http://dx.doi.org/10.1109/TPAMI.2019.2937292] [PMID: 31449006]
[http://dx.doi.org/10.1093/bib/bbx103] [PMID: 28968812]
[http://dx.doi.org/10.2174/1566523220666200607185903] [PMID: 32515310]
[http://dx.doi.org/10.1038/s41598-018-32511-1] [PMID: 30250210]
[http://dx.doi.org/10.1016/0167-2789(92)90242-F]
[http://dx.doi.org/10.1023/A:1009715923555]
[http://dx.doi.org/10.1162/neco.1991.3.2.246] [PMID: 31167308]
[http://dx.doi.org/10.1155/2021/6664362] [PMID: 33505515]
[http://dx.doi.org/10.1093/nar/gkab016] [PMID: 33503258]
[http://dx.doi.org/10.1016/j.knosys.2021.106753]
[http://dx.doi.org/10.1016/j.ymeth.2019.02.009] [PMID: 30772464]
[http://dx.doi.org/10.1093/bib/bby107] [PMID: 30383239]
[http://dx.doi.org/10.1093/bib/bbaa144] [PMID: 32685972]
[http://dx.doi.org/10.1093/bib/bbz080] [PMID: 31612203]
[http://dx.doi.org/10.1038/s41598-020-57778-1] [PMID: 31992738]
[http://dx.doi.org/10.1371/journal.pcbi.1008696] [PMID: 33561121]
[http://dx.doi.org/10.1016/j.omtn.2019.03.010] [PMID: 31048185]
[http://dx.doi.org/10.1016/j.chemolab.2019.103811]
[http://dx.doi.org/10.1016/j.chemolab.2019.04.007]
[http://dx.doi.org/10.1016/j.ygeno.2019.08.009] [PMID: 31437540]
[http://dx.doi.org/10.1109/ACCESS.2019.2943169]
[http://dx.doi.org/10.1007/s12035-015-9670-8] [PMID: 26746668]
[http://dx.doi.org/10.1093/nar/gkz843] [PMID: 31584099]
[http://dx.doi.org/10.2174/156652321904191022113307] [PMID: 31762421]
[http://dx.doi.org/10.1016/j.inffus.2021.02.015]
[http://dx.doi.org/10.2217/epi-2019-0321] [PMID: 32921165]
[http://dx.doi.org/10.3389/fcell.2020.00614] [PMID: 32850787]
[http://dx.doi.org/10.1504/IJDMB.2013.056078] [PMID: 24417022]
[http://dx.doi.org/10.1093/bioinformatics/btaa667] [PMID: 32702119]
[http://dx.doi.org/10.1109/TCBB.2021.3069263] [PMID: 33780341]
[http://dx.doi.org/10.1093/bib/bbaa254] [PMID: 33096548]
[http://dx.doi.org/10.1093/bioinformatics/btaa915] [PMID: 33112385]
[http://dx.doi.org/10.3390/molecules23123140] [PMID: 30501121]
[http://dx.doi.org/10.1016/j.neucom.2020.12.068]
[http://dx.doi.org/10.1093/bfgp/elaa023] [PMID: 33313647]
[http://dx.doi.org/10.1109/ACCESS.2019.2958618]
[http://dx.doi.org/10.1038/s41598-019-49430-4] [PMID: 31511570]