Abstract
Background: The primary goal of molecular phylogenetics is to characterize the similarity/ dissimilarity of DNA sequences. Existing sequence comparison methods with some patented are mostly alignment-based and remain computationally arduous.
Objective: In this patent study, we propose a novel alignment-free approach based on a previous DNA curve representation without degeneracy.
Method: The method combines two important geometric elements that describe the global and local features of the curve, respectively. It allows us to use a 24-dimensional vector called a characterization vector to numerically characterize a DNA sequence. We then measure the dissimilarity/ similarity of various DNA sequences by the Euclidean distances between their characterization vectors.
Results: We compare our approach with other existing algorithms on 4 data sets including COVID-19, and find that our apporach can produce consistent results and is faster than the alignment-based methods.
Conclusion: The method stated in this study, can assist in analyzing biological molecular sequences efficiently and will be helpful to molecular biologists.
Graphical Abstract