Abstract
Background: The primary goal of molecular phylogenetics is to characterize the similarity/ dissimilarity of DNA sequences. Existing sequence comparison methods with some patented are mostly alignment-based and remain computationally arduous.
Objective: In this patent study, we propose a novel alignment-free approach based on a previous DNA curve representation without degeneracy.
Method: The method combines two important geometric elements that describe the global and local features of the curve, respectively. It allows us to use a 24-dimensional vector called a characterization vector to numerically characterize a DNA sequence. We then measure the dissimilarity/ similarity of various DNA sequences by the Euclidean distances between their characterization vectors.
Results: We compare our approach with other existing algorithms on 4 data sets including COVID-19, and find that our apporach can produce consistent results and is faster than the alignment-based methods.
Conclusion: The method stated in this study, can assist in analyzing biological molecular sequences efficiently and will be helpful to molecular biologists.
Graphical Abstract
[http://dx.doi.org/10.1177/1176934317746667] [PMID: 29308007]
[http://dx.doi.org/10.1016/j.physa.2019.121429]
[http://dx.doi.org/10.1016/j.jtbi.2016.06.029] [PMID: 27375217]
[http://dx.doi.org/10.1016/j.jmgm.2017.07.019] [PMID: 28763687]
[http://dx.doi.org/10.1146/annurev-biodatasci-080917-013431] [PMID: 31828235]
[http://dx.doi.org/10.1093/nar/gkg432] [PMID: 12799435]
[http://dx.doi.org/10.2174/1574893612666170620125024]
[http://dx.doi.org/10.1016/j.jad.2017.11.023] [PMID: 29154167]
[http://dx.doi.org/10.1371/journal.pone.0064328] [PMID: 23717598]
[http://dx.doi.org/10.1186/s13059-017-1319-7] [PMID: 28974235]
[http://dx.doi.org/10.4310/CIS.2022.v22.n3.a5]
[http://dx.doi.org/10.2174/1386207324666210811101437] [PMID: 34382516]
[http://dx.doi.org/10.1371/journal.pone.0017293] [PMID: 21399690]
[http://dx.doi.org/10.1080/21681163.2021.1956369]
[http://dx.doi.org/10.1007/s40747-022-00846-y] [PMID: 36035628]
[http://dx.doi.org/10.1089/cmb.1994.1.337] [PMID: 8790475]
[http://dx.doi.org/10.2174/1386207325666220104221516] [PMID: 35038979]
[http://dx.doi.org/10.1016/j.jmgm.2020.107693] [PMID: 32805559]
[http://dx.doi.org/10.1016/j.csbj.2021.11.008] [PMID: 34900136]
[http://dx.doi.org/10.1016/j.meegid.2021.105106] [PMID: 34626822]
[http://dx.doi.org/10.1109/TCBB.2020.2967385] [PMID: 31976902]
[http://dx.doi.org/10.1016/j.csbj.2021.07.028] [PMID: 34429843]
[http://dx.doi.org/10.1038/314585a0] [PMID: 3990794]
[http://dx.doi.org/10.1016/S0021-9258(18)33196-X] [PMID: 6822501]
[http://dx.doi.org/10.1038/316219a0] [PMID: 3927167]
[http://dx.doi.org/10.1093/bioinformatics/11.5.503] [PMID: 8590173]
[http://dx.doi.org/10.1016/j.jtbi.2010.10.018] [PMID: 20969878]
[http://dx.doi.org/10.1016/j.cplett.2005.06.005]
[http://dx.doi.org/10.1016/j.csbj.2021.06.021] [PMID: 34257841]
[http://dx.doi.org/10.1007/978-3-030-95470-3_3]
[http://dx.doi.org/10.1016/j.gene.2020.145096] [PMID: 32919006]
[http://dx.doi.org/10.1016/j.jmgm.2020.107603] [PMID: 32442904]
[http://dx.doi.org/10.1016/j.aej.2022.08.023]
[http://dx.doi.org/10.1002/qua.23157]
[http://dx.doi.org/10.1016/j.csbj.2021.05.039] [PMID: 34141139]
[http://dx.doi.org/10.1016/j.fmre.2021.08.010]
[http://dx.doi.org/10.3390/app6030063]
[http://dx.doi.org/10.14393/BJ-v34n3a2018-39932]
[http://dx.doi.org/10.2174/1574893613666180305161928]
[http://dx.doi.org/10.3390/sym12122090]
[http://dx.doi.org/10.1007/978-3-030-79412-5_1]
[http://dx.doi.org/10.1016/j.cnsns.2010.11.007]
[http://dx.doi.org/10.1109/ACCESS.2020.3025591]
[http://dx.doi.org/10.1093/dnares/dsq008] [PMID: 20360268]
[http://dx.doi.org/10.1016/j.ins.2010.12.010]
[http://dx.doi.org/10.1016/j.csbj.2022.03.018] [PMID: 35422971]
[http://dx.doi.org/10.2174/1386207324666210804120454] [PMID: 34348613]
[http://dx.doi.org/10.1016/j.csbj.2022.09.034] [PMID: 36212532]
[http://dx.doi.org/10.2174/1386207323666200901103001] [PMID: 32875978]
[http://dx.doi.org/10.46793/match.90-2.357Q]
[http://dx.doi.org/10.1093/bib/bbac299] [PMID: 35914952]
[http://dx.doi.org/10.1016/j.jtbi.2022.111039] [PMID: 35085534]
[http://dx.doi.org/10.1186/s12859-022-04889-3] [PMID: 35986255]
[http://dx.doi.org/10.1016/j.jmgm.2021.107906] [PMID: 33848948]
[http://dx.doi.org/10.1186/s12859-021-04223-3] [PMID: 34078264]
[http://dx.doi.org/10.1016/j.csbj.2020.07.004] [PMID: 32774785]
[http://dx.doi.org/10.1016/j.jtbi.2018.03.001] [PMID: 29524440]
[http://dx.doi.org/10.2307/2406046]
[http://dx.doi.org/10.1016/j.jtbi.2015.02.026] [PMID: 25747773]
[http://dx.doi.org/10.1093/molbev/msy096] [PMID: 29722887]
[http://dx.doi.org/10.1093/bioinformatics/btm404] [PMID: 17846036]
[http://dx.doi.org/10.1016/j.genrep.2020.100752] [PMID: 32566803]
[http://dx.doi.org/10.1097/CM9.0000000000000722] [PMID: 32004165]
[http://dx.doi.org/10.1002/jmv.25678] [PMID: 31950516]