Numerical Characterization of DNA Sequences for Alignment-free
Sequence Comparison – A Review

Natarajan       Ramanathan; Jayalakshmi       Ramamurthy; Ganapathy       Natarajan
doi:10.2174/1386207324666210811101437
Abstract

Background: Biological macromolecules, namely, DNA, RNA, and protein, have their building blocks organized in a particular sequence and the sequential arrangement encodes the evolutionary history of the organism (species). Hence, biological sequences have been used for studying evolutionary relationships among the species. This is usually carried out by Multiple Sequence Algorithms (MSA). Due to certain limitations of MSA, alignment-free sequence comparison methods were developed. The present review is on alignment-free sequence comparison methods carried out using the numerical characterization of DNA sequences.
Discussion: The graphical representation of DNA sequences by chaos game representation and other 2-dimensional and 3-dimensional methods are discussed. The evolution of numerical characterization from the various graphical representations and the application of the DNA invariants thus computed in phylogenetic analysis are presented. The extension of computing molecular descriptors in chemometrics to the calculation of a new set of DNA invariants and their use in alignment-free sequence comparison in an N-dimensional space and construction of phylogenetic trees are also reviewed.
Conclusion: The phylogenetic tress constructed by the alignment-free sequence comparison methods using DNA invariants were found to be better than those constructed using alignment-based tools such as PHLYIP and ClustalW. One of the graphical representation methods is now extended to study viral sequences of infectious diseases for the identification of conserved regions to design peptidebased vaccines by combining numerical characterization and graphical representation.
Keywords: Numerical characterization, DNA sequences, alignment-free, sequence comparison, phylogenetic analysis, peptide-based vaccines.
« Previous Next »
Graphical Abstract

[1] 
Baxevanis, A.D.; Ouellette, B.F.F. Bioinformatics: A practical guide to the analysis of genes and proteins; John Wiley & Sons: New York, 2005. 
[2] 
Thompson, J.D.; Plewniak, F.; Poch, O. A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Res.,  1999, 27(13), 2682-2690.
[http://dx.doi.org/10.1093/nar/27.13.2682] [PMID:  10373585] 
[3] 
Prjibelski, A.D.; Korobeynikov, A.I.; Lapidus, A.L. Sequence analysis. Encyclopaedia of bioinformatics and computational biology; Ranganathan, S.; Gribskov, M.; Nakai, K; Schönbach, C., Ed.; Academic Press, 2019, pp. 292-322.
[http://dx.doi.org/10.1016/B978-0-12-809633-8.20106-4] 
[4] 
Needleman, S.B.; Wunsch, C.D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol.,  1970, 48(3), 443-453.
[http://dx.doi.org/10.1016/0022-2836(70)90057-4] [PMID:  5420325] 
[5] 
Smith, T.F.; Waterman, M.S. Identification of common molecular subsequences. J. Mol. Biol.,  1981, 147(1), 195-197.
[http://dx.doi.org/10.1016/0022-2836(81)90087-5] [PMID:  7265238] 
[6] 
Altschul, S.F.; Madden, T.L.; Schäffer, A.A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D.J. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res.,  1997, 25(17), 3389-3402.
[http://dx.doi.org/10.1093/nar/25.17.3389] [PMID:  9254694] 
[7] 
Pearson, W.R.; Lipman, D.J. Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA,  1988, 85(8), 2444-2448.
[http://dx.doi.org/10.1073/pnas.85.8.2444] [PMID:  3162770] 
[8] 
Thompson, J.D.; Higgins, D.G.; Gibson, T.J. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res.,  1994, 22(22), 4673-4680.
[http://dx.doi.org/10.1093/nar/22.22.4673] [PMID:  7984417] 
[9] 
Edgar, R.C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res.,  2004, 32(5), 1792-1797.
[http://dx.doi.org/10.1093/nar/gkh340] [PMID:  15034147] 
[10] 
Katoh, K.; Misawa, K.; Kuma, K.; Miyata, T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res.,  2002, 30(14), 3059-3066.
[http://dx.doi.org/10.1093/nar/gkf436] [PMID:  12136088] 
[11] 
Finn, R.D.; Bateman, A.; Clements, J.; Coggill, P.; Eberhardt, R.Y.; Eddy, S.R.; Heger, A.; Hetherington, K.; Holm, L.; Mistry, J.; Sonnhammer, E.L.; Tate, J.; Punta, M. Pfam: the protein families database. Nucleic Acids Res.,  2014, 42(Database issue), D222-D230.
[http://dx.doi.org/10.1093/nar/gkt1223] [PMID:  24288371] 
[12] 
Darling, A.E.; Mau, B.; Perna, N.T. progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One,  2010, 5(6)e11147
[http://dx.doi.org/10.1371/journal.pone.0011147] [PMID:  20593022] 
[13] 
Schwartz, S.; Kent, W.J.; Smit, A.; Zhang, Z.; Baertsch, R.; Hardison, R.C.; Haussler, D.; Miller, W. Human-mouse alignments with BLASTZ. Genome Res.,  2003, 13(1), 103-107.
[http://dx.doi.org/10.1101/gr.809403] [PMID:  12529312] 
[14] 
Blanchette, M.; Kent, W.J.; Riemer, C.; Elnitski, L.; Smit, A.F.; Roskin, K.M.; Baertsch, R.; Rosenbloom, K.; Clawson, H.; Green, E.D.; Haussler, D.; Miller, W. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res.,  2004, 14(4), 708-715.
[http://dx.doi.org/10.1101/gr.1933104] [PMID:  15060014] 
[15] 
Lynch, M. Intron evolution as a population-genetic process. Proc. Natl. Acad. Sci. USA,  2002, 99(9), 6118-6123.
[http://dx.doi.org/10.1073/pnas.092595699] [PMID:  11983904] 
[16] 
Zhang, Y.X.; Perry, K.; Vinci, V.A.; Powell, K.; Stemmer, W.P.; del Cardayré, S.B. Genome shuffling leads to rapid phenotypic improvement in bacteria. Nature,  2002, 415(6872), 644-646.
[http://dx.doi.org/10.1038/415644a] [PMID:  11832946] 
[17] 
Radomski, J.P.; Slonimski, P.P. Primary sequences of proteins from complete genomes display a singular periodicity: Alignment-free N-gram analysis. C. R. Biol.,  2007, 330(1), 33-48.
[http://dx.doi.org/10.1016/j.crvi.2006.11.001] [PMID:  17241946] 
[18] 
Wong, K.M.; Suchard, M.A.; Huelsenbeck, J.P. Alignment uncertainty and genomic analysis. Science,  2008, 319(5862), 473-476.
[http://dx.doi.org/10.1126/science.1151532] [PMID:  18218900] 
[19] 
Zielezinski, A.; Vinga, S.; Almeida, J.; Karlowski, W.M. Alignment-free sequence comparison: benefits, applications, and tools. Genome Biol.,  2017, 18(1), 186.
[http://dx.doi.org/10.1186/s13059-017-1319-7] [PMID:  28974235] 
[20] 
Hamori, E. Visualization of biological information encoded in DNA. Frontiers of scientific visualization; Pickover, C; Tewksbury, S.K., Ed.; John Wiley & Sons: New York, 1994. 
[21] 
Hamori, E. Novel DNA sequence representations. Nature,  1985, 314(6012), 585-586.
[http://dx.doi.org/10.1038/314585a0] [PMID:  3990794] 
[22] 
Hamori, E.; Ruskin, J. H curves, a novel method of representation of nucleotide series especially suited for long DNA sequences. J. Biol. Chem.,  1983, 258(2), 1318-1327.
[http://dx.doi.org/10.1016/S0021-9258(18)33196-X] [PMID:  6822501] 
[23] 
Gates, M.A. Simpler DNA sequence representations. Nature,  1985, 316(6025), 219.
[http://dx.doi.org/10.1038/316219a0] [PMID:  3927167] 
[24] 
Nandy, A. New graphical representation and analysis of DNA sequence structure: I. Methodology and application to globin genes. Curr. Sci.,  1994, 66, 309-314.
[25] 
Leong, P.M.; Morgenthaler, S. Random walk and gap plots of DNA sequences. Comput. Appl. Biosci.,  1995, 11(5), 503-507.
[http://dx.doi.org/10.1093/bioinformatics/11.5.503] [PMID:  8590173] 
[26] 
Nandy, A. Investigation on evolutionary changes in base distribution in gene sequences. Internet Electron. J. Mole. Des.,  2002, 1, 545-558.
[27] 
Nandy, A.; Basak, S.C. Simple numerical descriptor for quantifying effect of toxic substances on DNA sequences. J. Chem. Inf. Comput. Sci.,  2000, 40(4), 915-919.
[http://dx.doi.org/10.1021/ci990117a] [PMID:  10955518] 
[28] 
Nandy, A. Graphical analysis of DNA Structure III. indications of evolutionary disticntions and characteristics of introns and exons. Curr. Sci.,  1996, 70, 661-668.
[29] 
Raychaudhury, C.; Nandy, A. Indexing scheme and similarity measures for macromolecular sequences. J. Chem. Inf. Comput. Sci.,  1999, 39(2), 243-247.
[http://dx.doi.org/10.1021/ci980077v] [PMID:  10192941] 
[30] 
Yao, Y.; Wang, T.M. A class of New 2-D graphical representation of DNA sequences and their application. Chem. Phys. Lett.,  2004, 398, 318-323.
[http://dx.doi.org/10.1016/j.cplett.2004.09.087] 
[31] 
Yao, Y.; Nan, X.; Wang, T. A new 2D graphical representation-classification curve and the analysis of similarity/dissimilarity of DNA sequences. J. Mol. Struct. Theochem.,,  2006, 764, 101-108.
[http://dx.doi.org/10.1016/j.Theochem.2006.02.007] 
[32] 
Bai, F.L.; Liu, Y.Z.; Wang, T.M. A representation of DNA primary sequences by random walk. Math. Biosci.,  2007, 209(1), 282-291.
[http://dx.doi.org/10.1016/j.mbs.2006.06.004] [PMID:  17306305] 
[33] 
Wang, J.; Zhang, Y. Characterization and similarity analysis of DNA sequences grounded on a 2-D graphical representation. Chem. Phys. Lett.,  2006, 423, 50-53.
[http://dx.doi.org/10.1016/j.cplett.2006.03.030] 
[34] 
Wu, K.P.; Lin, H.N.; Sung, T.Y.; Hsu, W.L. A new similarity measure among protein sequences. Proceedings of IEEE CSB,  2003, 2003, 347-352.
[35] 
Nandy, A.; Nandy, P. On the uniqueness of quantitative DNA difference descriptors in 2D graphical representation models. Chem. Phys. Lett.,  2003, 368, 102-107.
[http://dx.doi.org/10.1016/S0009-2614(02)01830-4] 
[36] 
Li, C.; Tang, N.; Wang, J. Directed graphs of DNA sequences and their numerical characterization. J. Theor. Biol.,  2006, 241(2), 173-177.
[http://dx.doi.org/10.1016/j.jtbi.2005.11.023] [PMID:  16384585] 
[37] 
Randić, M. Condensed representation of DNA primary sequences. J. Chem. Inf. Comput. Sci.,  2000, 40(1), 50-56.
[http://dx.doi.org/10.1021/ci990084z] [PMID:  10661549] 
[38] 
Guo, X.; Randic, M.; Basak, S.C. A novel 2D graphical representation of DNA sequences of low degenaracies. Chem. Phys. Lett.,  2001, 350, 106-112.
[http://dx.doi.org/10.1016/S0009-2614(01)01246-5] 
[39] 
Guo, X.; Nandy, A. Numerical characterization of DNA sequences in a 2-D graphical representation scheme of low degeneracy. Chem. Phys. Lett.,  2003, 369, 361-366.
[http://dx.doi.org/10.1016/S0009-2614(02)02029-8] 
[40] 
Liu, Y. The numerical characterization and similarity analysis of DNA primary sequences. Internet Electron. J. Mole. Des.,  2002, 1, 675-684.
[41] 
Bielinska-Waz, D.; Clark, T.; Nowak, W.; Nandy, A. 2D-dynamic representation of DNA sequences. Chem. Phys. Lett.,  2007, 442, 140-144.
[http://dx.doi.org/10.1016/j.cplett.2007.05.050] 
[42] 
Yau, S.S.T.; Wang, J.; Niknejad, A.; Lu, C.; Jin, N.; Ho, Y.K. DNA sequence representation without degeneracy. Nucleic Acids Res.,  2003, 31(12), 3078-3080.
[http://dx.doi.org/10.1093/nar/gkg432] [PMID:  12799435] 
[43] 
Liao, B.; Zhang, Y.S.; Ding, K.Q.; Wang, T.M. Analysis of similarity/dissimilarity of DNA sequence based on a condensed curve representation. J. Mol. Struct. Theochem.,,  2005, 717, 199-203.
[http://dx.doi.org/10.1016/j.Theochem.2004.12.015] 
[44] 
Huang, G.; Liao, B.; Li, Y.; Liu, Z.H-L. Curve: A novel 2D graphical representation for DNA sequences. Chem. Phys. Lett.,  2008, 462, 129-132.
[http://dx.doi.org/10.1016/j.cplett.2008.07.046] 
[45] 
He, P.; Wang, J. Numerical characterization of DNA primary sequence. Internet Electron. J. Mole. Des.,  2002, 1, 668-674.
[46] 
Randic, M.; Vracko, M.; Lers, N.; Plavsic, D. Analysis of similarity/dissimilarity of dna sequences based on novel 2-D graphical representation. Chem. Phys. Lett.,  2003, 371, 202-207.
[http://dx.doi.org/10.1016/S0009-2614(03)00244-6] 
[47] 
Randic, M.; Vracko, M.; Lers, N.; Plavsic, D. Novel 2-D graphical representation of DNA sequences and their numerical characterization. Chem. Phys. Lett.,  2002, 368, 1-6.
[http://dx.doi.org/10.1016/S0009-2614(02)01784-0] 
[48] 
Randic, M.; Zupan, J.; Vikic-Topic, D.; Plavsic, D. A novel unexpected use of a graphical representation of DNA: graphical alignment of DNA sequences. Chem. Phys. Lett.,  2006, 431, 375-379.
[http://dx.doi.org/10.1016/j.cplett.2006.09.044] 
[49] 
Li, C.; Wang, J. Numcerical characterization and similarity analysis of DNA sequences based on 2-D graphical representation of the characteristic sequences. Comb. Chem. High Throughput Screen.,  2003, 6, 795-799.
[PMID: 14683485] 
[50] 
Liao, B.; Wang, T.M. New 2D graphical representation of DNA sequences. J. Comput. Chem.,  2004, 25(11), 1364-1368.
[http://dx.doi.org/10.1002/jcc.20060] [PMID:  15185330] 
[51] 
Ghosh, A.; Chattopadhyay, S.; Chawla-Sarkar, M.; Nandy, P.; Nandy, A. In silico study of rotavirus VP7 surface accessible conserved regions for antiviral drug/vaccine design. PLoS One,  2012, 7(7)e40749
[http://dx.doi.org/10.1371/journal.pone.0040749] [PMID:  22844409] 
[52] 
Nandy, A.; Basak, S.C. An emerging immunogenomics and computational approach for peptide vaccinology: Rational design of peptide vaccines. Curr. Comput. Aided Drug Des.,  2014, 10, 283-284.
[PMID:  25994637] 
[53] 
Nandy, A.; Basak, S.C. A brief review of computer-assisted approaches to rational design of peptide vaccines.Int. J. Mol. Sci.,	2016, 17, 666. 11p; 
[http://dx.doi.org/10.3390/ijms17050666] 
[54] 
Wang, S.; Tian, F.; Feng, W.; Liu, X. Applications of representation method for DNA sequences based on symbolic dynamics. J. Mol. Struct. Theochem.,  2009, 999, 33-42.
[http://dx.doi.org/10.1016/j.Theochem.2009.05.025] 
[55] 
Jeffrey, H.J. Chaos game representation of gene structure. Nucleic Acids Res.,  1990, 18(8), 2163-2170.
[http://dx.doi.org/10.1093/nar/18.8.2163] [PMID:  2336393] 
[56] 
Barnsley, M.F. Fractals everywhere; Academic Press: Boston, 1993. 
[57] 
Hill, K.A.; Schisler, N.J.; Singh, S.M. Chaos game representation of coding regions of human globin genes and alcohol dehydrogenase genes of phylogenetically divergent species. J. Mol. Evol.,  1992, 35(3), 261-269.
[http://dx.doi.org/10.1007/BF00178602] [PMID:  1518093] 
[58] 
Oliver, J.L.; Bernaola-Galván, P.; Guerrero-García, J.; Román-Roldán, R. Entropic profiles of DNA sequences through chaos-game-derived images. J. Theor. Biol.,  1993, 160(4), 457-470.
[http://dx.doi.org/10.1006/jtbi.1993.1030] [PMID:  8501918] 
[59] 
Goldman, N. Nucleotide, dinucleotide and trinucleotide frequencies explain patterns observed in chaos game representations of DNA sequences. Nucleic Acids Res.,  1993, 21(10), 2487-2491.
[http://dx.doi.org/10.1093/nar/21.10.2487] [PMID:  8506142] 
[60] 
Deschavanne, P.J.; Giron, A.; Vilain, J.; Fagot, G.; Fertil, B. Genomic signature: characterization and classification of species assessed by chaos game representation of sequences. Mol. Biol. Evol.,  1999, 16(10), 1391-1399.
[http://dx.doi.org/10.1093/oxfordjournals.molbev.a026048] [PMID:  10563018] 
[61] 
Tino, P. Spatial representation of symbolic sequences through iterative function systems. IEEE Trans. Syst. Man Cybern.,  1999, 29, 386-393.
[http://dx.doi.org/10.1109/3468.769757] 
[62] 
Basu, S.; Pan, A.; Dutta, C.; Das, J. Chaos game representation of proteins. J. Mol. Graph. Model.,  1997, 15(5), 279-289.
[http://dx.doi.org/10.1016/S1093-3263(97)00106-X] [PMID:  9640559] 
[63] 
Almeida, J.S.; Carriço, J.A.; Maretzek, A.; Noble, P.A.; Fletcher, M. Analysis of genomic sequences by chaos game representation. Bioinformatics,  2001, 17(5), 429-437.
[http://dx.doi.org/10.1093/bioinformatics/17.5.429] [PMID:  11331237] 
[64] 
Wang, Y.; Hill, K.; Singh, S.; Kari, L. The spectrum of genomic signatures: from dinucleotides to chaos game representation. Gene,  2005, 346, 173-185.
[http://dx.doi.org/10.1016/j.gene.2004.10.021] [PMID:  15716010] 
[65] 
Almeida, J.S.; Vinga, S. Universal sequence map (USM) of arbitrary discrete sequences. BMC Bioinformatics,  2002, 3, 6.
[http://dx.doi.org/10.1186/1471-2105-3-6] [PMID:  11895567] 
[66] 
Joseph, J.; Sasikumar, R. Chaos game representation for comparison of whole genomes. BMC Bioinformatics,  2006, 7, 243.
[http://dx.doi.org/10.1186/1471-2105-7-243] [PMID:  16677374] 
[67] 
Randic, M. Another look at the chaos-game representation of DNA. Chem. Phys. Lett.,  2008, 456, 84-88.
[http://dx.doi.org/10.1016/j.cplett.2008.03.011] 
[68] 
Hao, B.; Lee, H.C.; Zhang, S. Fractals related to long DNA sequences and complete genomes. Chaos Solitons Fractals,  2000, 11, 825-836.
[http://dx.doi.org/10.1016/S0960-0779(98)00182-9] 
[69] 
Liao, B.; Wang, T.m. Analysis of similarity/dissimilarity of DNA sequences based on 3-D graphical representation. Chem. Phys. Lett.,  2004, 388, 195-200.
[http://dx.doi.org/10.1016/j.cplett.2004.02.089] 
[70] 
Liao, B.; Wang, T.M. 3-D graphical representation of DNA sequences and their numerical characterization. J. Mol. Struct. Theochem,  2004, 681, 209-212.
[http://dx.doi.org/10.1016/j.Theochem.2004.05.020] 
[71] 
Li, C.; Wang, J. Numerical characterization and similarity analysis of DNA sequences based on 2-D graphical representation of the characteristic sequences. Comb. Chem. High Throughput Screen.,  2003, 6(8), 795-799.
[http://dx.doi.org/10.2174/138620703771826900] [PMID:  14683485] 
[72] 
Song, J.; Tang, H. A new 2-D graphical representation of DNA sequences and their numerical characterization. J. Biochem. Biophys. Methods,  2005, 63(3), 228-239.
[http://dx.doi.org/10.1016/j.jbbm.2005.04.004] [PMID:  15939477] 
[73] 
Yao, Y.H.; Nan, X.Y.; Wang, T.M. A class of 2D graphical representations of RNA secondary structures and the analysis of similarity based on them. J. Comput. Chem.,  2005, 26(13), 1339-1346.
[http://dx.doi.org/10.1002/jcc.20271] [PMID:  16021599] 
[74] 
Liao, B.; Tan, M.S.; Ding, K.Q. 4D representation of DNA sequences and its application. Chem. Phys. Lett.,  2005, 402, 380-383.
[http://dx.doi.org/10.1016/j.cplett.2004.12.062] 
[75] 
Liao, B.; Wang, T.M. A 3D graphical representation of RNA secondary structures. J. Biomol. Struct. Dyn.,  2004, 21(6), 827-832.
[http://dx.doi.org/10.1080/07391102.2004.10506972] [PMID:  15107004] 
[76] 
Yao, Y.H.; Liao, B.; Wang, T.M. A 2D graphical representation of RNA secondary structures and the analysis of similarity/dissimilarity based on it. J. Mol. Struct. Theochem,  2005, 755, 131-136.
[http://dx.doi.org/10.1016/j.Theochem.2005.08.009] 
[77] 
Zhu, W.; Liao, B.; Ding, K.Q. A condensed 3D graphical representation of RNA secondary structures. J. Mol. Struct. Theochem,  2005, 757, 193-198.
[http://dx.doi.org/10.1016/j.Theochem.2005.04.042] 
[78] 
Bai, F.; Wang, T. On graphical and numerical representation of protein sequences. J. Biomol. Struct. Dyn.,  2006, 23(5), 537-546.
[http://dx.doi.org/10.1080/07391102.2006.10507078] [PMID:  16494503] 
[79] 
Dai, Q.; Liu, X.Q.; Wang, T.M.C. (i,j) matrix: a better numerical characterization for graphical representations of biological sequences. J. Theor. Biol.,  2007, 247(1), 103-109.
[http://dx.doi.org/10.1016/j.jtbi.2007.03.002] [PMID:  17428502] 
[80] 
Randić, M.; Vračko, M.; Nandy, A.; Basak, S.C. On 3-D graphical representation of DNA primary sequences and their numerical characterization. J. Chem. Inf. Comput. Sci.,  2000, 40(5), 1235-1244.
[http://dx.doi.org/10.1021/ci000034q] [PMID:  11045819] 
[81] 
Bielinska-Waz, D.; Nowak, W.; Waz, P.; Nandy, A.; Clark, T. Distribution moments of 2D-graphs as descriptors of DNA sequences. Chem. Phys. Lett.,  2007, 443, 408-413.
[http://dx.doi.org/10.1016/j.cplett.2007.06.088] 
[82] 
Ghosh, A.; Nandy, A.; Nandy, P.; Gute, B.D.; Basak, S.C. Computational study of dispersion and extent of mutated and duplicated sequences of the H5N1 influenza neuraminidase over the period 1997-2008. J. Chem. Inf. Model.,  2009, 49(11), 2627-2638.
[http://dx.doi.org/10.1021/ci9001662] [PMID:  19778054] 
[83] 
Ghosh, A.; Nandy, A.; Nandy, P. Computational analysis and determination of a highly conserved surface exposed segment in H5N1 avian flu and H1N1 swine flu neuraminidase. BMC Struct. Biol.,  2010, 10, 6.
[http://dx.doi.org/10.1186/1472-6807-10-6] [PMID:  20170556] 
[84] 
Zhang, Y.; Tan, M. Visualization of DNA sequences based on 3DD-curves. J. Math. Chem.,  2008, 44, 206-216.
[http://dx.doi.org/10.1007/s10910-007-9302-2] 
[85] 
Li, C.; Yu, X.; Helal, N. Similarity analysis of DNA sequences based on codon usage. Chem. Phys. Lett.,  2008, 459, 172-174.
[http://dx.doi.org/10.1016/j.cplett.2008.05.039] 
[86] 
Todeschini, R.; Ballabio, D.; Consonni, V.; Mauri, A. A new similarity/diversity measure for the characterization of DNA sequences. Croat. Chem. Acta,  2008, 81, 657-664.
[87] 
Blaisdell, B.E. A measure of the similarity of sets of sequences not requiring sequence alignment. Proc. Natl. Acad. Sci. USA,  1986, 83(14), 5155-5159.
[http://dx.doi.org/10.1073/pnas.83.14.5155] [PMID:  3460087] 
[88] 
Wu, T.J.; Hsieh, Y.C.; Li, L.A. Statistical measures of DNA sequence dissimilarity under Markov chain models of base composition. Biometrics,  2001, 57(2), 441-448.
[http://dx.doi.org/10.1111/j.0006-341X.2001.00441.x] [PMID:  11414568] 
[89] 
Stuart, G.W.; Moffett, K.; Baker, S. Integrated gene and species phylogenies from unaligned whole genome protein sequences. Bioinformatics,  2002, 18(1), 100-108.
[http://dx.doi.org/10.1093/bioinformatics/18.1.100] [PMID:  11836217] 
[90] 
Fichant, G.; Gautier, C. Statistical method for predicting protein coding regions in nucleic acid sequences. Comput. Appl. Biosci.,  1987, 3(4), 287-295.
[http://dx.doi.org/10.1093/bioinformatics/3.4.287] [PMID:  3134115] 
[91] 
Wu, T.J.; Burke, J.P.; Davison, D.B. A measure of DNA sequence dissimilarity based on Mahalanobis distance between frequencies of words. Biometrics,  1997, 53(4), 1431-1439.
[http://dx.doi.org/10.2307/2533509] [PMID:  9423258] 
[92] 
van Helden, J. Metrics for comparing regulatory sequences on the basis of pattern counts. Bioinformatics,  2004, 20(3), 399-406.
[http://dx.doi.org/10.1093/bioinformatics/btg425] [PMID:  14764560] 
[93] 
Sims, G.E.; Jun, S.R.; Wu, G.A.; Kim, S.H. Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions. Proc. Natl. Acad. Sci. USA,  2009, 106(8), 2677-2682.
[http://dx.doi.org/10.1073/pnas.0813249106] [PMID:  19188606] 
[94] 
Sims, G.E.; Jun, S.R.; Wu, G.A.; Kim, S.H. Whole-genome phylogeny of mammals: evolutionary information in genic and nongenic regions. Proc. Natl. Acad. Sci. USA,  2009, 106(40), 17077-17082.
[http://dx.doi.org/10.1073/pnas.0909377106] [PMID:  19805074] 
[95] 
Jing, J.; Burden, C.J.; Forêt, S.; Wilson, S.R. Statistical considerations underpinning an alignment-free sequence comparison method. J. Korean Stat. Soc.,  2010, 39(3), 325-335.
[http://dx.doi.org/10.1016/j.jkss.2010.02.009] 
[96] 
Qi, J.; Wang, B.; Hao, B.I. Whole proteome prokaryote phylogeny without sequence alignment: a K-string composition approach. J. Mol. Evol.,  2004, 58(1), 1-11.
[http://dx.doi.org/10.1007/s00239-003-2493-7] [PMID:  14743310] 
[97] 
Qi, J.; Luo, H.; Hao, B. CVTree: a phylogenetic tree reconstruction tool based on whole genomes.Nucleic Acids Res.,
2004, 32(Web Server issue), W45-7., 
[http://dx.doi.org/10.1093/nar/gkh362] [PMID: 15215347] 
[98] 
Wu, X.; Cai, Z.; Wan, X.F.; Hoang, T.; Goebel, R.; Lin, G. Nucleotide composition string selection in HIV-1 subtyping using whole genomes. Bioinformatics,  2007, 23(14), 1744-1752.
[http://dx.doi.org/10.1093/bioinformatics/btm248] [PMID:  17495995] 
[99] 
Lu, G.; Zhang, S.; Fang, X. An improved string composition method for sequence comparison. BMC Bioinformatics,  2008, 9(Suppl. 6), 515.
[http://dx.doi.org/10.1186/1471-2105-9-S6-S15] 
[100] 
Jayalakshmi, R. Numerical charaterization of DNA sequences and its application in alignment-free sequence ccomparsion. Ph.D. Theisis, Baharathidasan University: Tiruchirappali; Taml Nadu: India, 2011. 
[101] 
Jayalakshmi, R.; Natarajan, R.; Vivekanandan, M. Extension of molecular similarity analysis approach to classification of DNA sequences using DNA descriptors. SAR QSAR Environ. Res.,  2011, 22(1-2), 21-34.
[http://dx.doi.org/10.1080/1062936X.2010.528255] [PMID:  21391139] 
[102] 
Jayalakshmi, R.; Natarajan, R.; Vivekanandan, M.; Natarajan, G.S. Alignment-free sequence comparison using N-dimensional similarity space. Curr. Computeraided Drug Des.,  2010, 6(4), 290-296.
[http://dx.doi.org/10.2174/1573409911006040290] [PMID:  20883198] 
[103] 
Jayalakshmi, R.; Natarajan, R.; Vivekanandan, M.; Ganapathy, N. Descriptors based on information theory for numerical characterization of DNA sequences. Curr. Sci.,  2010, 99, 370-375.
[104] 
Basak, S.C. Topological indices and related descriptors in QSAR and QSPR; Devillers, J; Balaban, A.T., Ed.; Gordon and Breach Science: The Netherlands, 1999, pp. 563-593.
[105] 
Natarajan, R.; Jayalakshmi, R.; Vivekanandhan, M. Numerical characterization of DNA sequences: Connectivity type indices derived from DNA line graphs. J. Math. Chem.,  2010, 8, 521-529.
[http://dx.doi.org/10.1007/s10910-010-9688-0] 
[106] 
Randić, M. On characterization of molecular branching. J. Am. Chem. Soc.,  1975, 97, 6609-6615.
[http://dx.doi.org/10.1021/ja00856a001] 
[107] 
Kier, L.B.; Murray, W.J.; Randić, M.; Hall, L.H. Molecular connectivity V: connectivity series concept applied to density. J. Pharm. Sci.,  1976, 65(8), 1226-1230.
[http://dx.doi.org/10.1002/jps.2600650824] [PMID:  978443] 
[108] 
Wiener, H. Structural determination of paraffin boiling points. J. Am. Chem. Soc.,  1947, 69(1), 17-20.
[http://dx.doi.org/10.1021/ja01193a005] [PMID:  20291038] 
[109] 
Simon, D. Biogeography-based optimization. IEEE Trans. Evol. Comput.,  2008, 12, 702-713.
[http://dx.doi.org/10.1109/TEVC.2008.919004] 
[110] 
Yadav, R.K.; Banka, H. IBBOMSA: An improved biogeography-based approach for multiple sequence alignment. Evol. Bioinform. Online,  2016, 12, 237-246.
[http://dx.doi.org/10.4137/EBO.S40457] [PMID:  27812276] 
[111] 
Zhou, J.; Zhong, P.; Zhang, T. A novel method for alignment-free DNA sequence similarity analysis based on the characterization of complex networks. Evol. Bioinform. Online,  2016, 12, 229-235.
[http://dx.doi.org/10.4137/EBO.S40474] [PMID:  27746676] 
[112] 
Navarro-Gomez, D.; Leipzig, J.; Shen, L.; Lott, M.; Stassen, A.P.M.; Wallace, D.C.; Wiggs, J.L.; Falk, M.J.; van Oven, M.; Gai, X. Phy-Mer: a novel alignment-free and reference-independent mitochondrial haplogroup classifier. Bioinformatics,  2015, 31(8), 1310-1312.
[http://dx.doi.org/10.1093/bioinformatics/btu825] [PMID:  25505086] 
[113] 
Li, H. Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics,  2016, 32(14), 2103-2110.
[http://dx.doi.org/10.1093/bioinformatics/btw152] [PMID:  27153593] 
[114] 
Fan, H.; Ives, A.R.; Surget-Groba, Y.; Cannon, C.H. An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data. BMC Genomics,  2015, 16, 522.
[http://dx.doi.org/10.1186/s12864-015-1647-5] [PMID:  26169061] 
[115] 
Silva, G.G.Z.; Cuevas, D.A.; Dutilh, B.E.; Edwards, R.A. FOCUS: An alignment-free model to identify organisms in metagenomes using non-negative leastsquares. PeerJ,  2014, 2e425
[http://dx.doi.org/10.7717/peerj.425] [PMID:  24949242] 
[116] 
Ames, S.K.; Hysom, D.A.; Gardner, S.N.; Lloyd, G.S.; Gokhale, M.B.; Allen, J.E. Scalable metagenomic taxonomy classification using a reference genome database. Bioinformatics,  2013, 29(18), 2253-2260.
[http://dx.doi.org/10.1093/bioinformatics/btt389] [PMID:  23828782] 
[117] 
Gupta, A.; Jordan, I.K.; Rishishwar, L. stringMLST: a fast k-mer based tool for multilocus sequence typing. Bioinformatics,  2017, 33(1), 119-121.
[http://dx.doi.org/10.1093/bioinformatics/btw586] [PMID:  27605103] 
[118] 
Wang, Y.; Liu, L.; Chen, L.; Chen, T.; Sun, F. Comparison of metatranscriptomic samples based on k-tuple frequencies. PLoS One,  2014, 9(1)e84348
[http://dx.doi.org/10.1371/journal.pone.0084348] [PMID:  24392128] 
[119] 
Jiang, B.; Song, K.; Ren, J.; Deng, M.; Sun, F.; Zhang, X. Comparison of metagenomic samples using sequence signatures. BMC Genomics,  2012, 13, 730.
[http://dx.doi.org/10.1186/1471-2164-13-730] [PMID:  23268604] 
[120] 
Dey, S.; Nandy, A.; Basak, S.C.; Nandy, P.; Das, S. A bioinformatics approach to designing a Zika virus vaccine. Comput. Biol. Chem.,  2017, 68, 143-152.
[http://dx.doi.org/10.1016/j.compbiolchem.2017.03.002] [PMID:  28342423] 
[121] 
Bielińska-Wąż, D.; Wąż, P. Non-standard bioinformatics characterization of SARS-CoV-2. Comput. Biol. Med.,  2021, 131104247
[http://dx.doi.org/10.1016/j.compbiomed.2021.104247] [PMID:  33611129] 
Rights & Permissions Print Cite
Article Metrics
28
1
Journal Information
For Authors
For Editors
For Reviewers
Explore Articles
Open Access
Open Access Articles
For Visitors
DOI https://dx.doi.org/10.2174/1386207324666210811101437	Print ISSN 1386-2073
Publisher Name Bentham Science Publisher	Online ISSN 1875-5402
Combinatorial Chemistry & High Throughput Screening

Numerical Characterization of DNA Sequences for Alignment-free Sequence Comparison – A Review

Abstract Play Pause

Graphical Abstract

Related Journals

Related Books

Abstract