[1]
Marçais, G.; Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics, 2011, 27(6), 764-770.
[2]
Miller, J.R.; Delcher, A.L.; Koren, S.; Venter, E.; Walenz, B.P.; Brownley, A.; Johnson, J.; Li, K.; Mobarry, C.; Sutton, G. Aggressive assembly of pyrosequencing reads with mates. Bioinformatics, 2003, 24(24), 2818-2824.
[3]
Jaffe, D.B.; Butler, J.; Gnerre, S.; Mauceli, E.; Lindblad-Toh, K.; Mesirov, J.P.; Zody, M.C.; Lander, E.S. Whole-genome sequence assembly for mammalian genomes: Arachne 2. Genome Res., 2003, 13(1), 91-96.
[4]
Miller, J.R.; Koren, S.; Sutton, G. Assembly algorithms for next-generation sequencing data. Genomics, 2010, 95(6), 315-327.
[5]
Pevzner, P.A.; Tang, H.; Waterman, M.S. An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. USA, 2001, 98(17), 9748-9753.
[6]
Zerbino, D.; Birney, E. Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Res., 2008, 18(5), 821-829.
[7]
Simpson, J.T.; Wong, K.; Jackman, S.D.; Schein, J.E.; Jones, S.J.; Birol, I. ABySS: A parallel assembler for short read sequence data. Genome Res., 2009, 19(6), 1117-1123.
[8]
Kelley, D.R.; Schatz, M.C.; Salzberg, S.L. Quake: Quality-aware detection and correction of sequencing errors. Genome Biol., 2010, 11(11), R116.
[9]
Shi, H.; Schmidt, B.; Liu, W.; Müller-Wittig, W. A parallel algorithm for error correction in high-throughput short-read data on CUDA-enabled graphics hardware. J. Comput. Biol., 2010, 17(4), 603-615.
[10]
Liu, Y.; Schröder, J.; Schmidt, B. Musket: A multistage k-mer spectrum-based error corrector for Illumina sequence data. Bioinformatics, 2012, 29(3), 308-315.
[11]
Medvedev, P.; Scott, E.; Kakaradov, B.; Pevzner, P. Error correction of high-throughput sequencing datasets with non-uniform coverage. Bioinformatics, 2011, 27(13), 137-141.
[12]
Salmela, L.; Schröder, J. Correcting errors in short reads by multiple alignments. Bioinformatics, 2011, 27(11), 1455-1461.
[13]
Li, R.; Ye, J.; Li, S.; Wang, J.; Han, Y.; Ye, C.; Wang, J.; Yang, H.; Yu, J.; Wong, G.K.S.; Wang, J. ReAS: Recovery of ancestral sequences for transposable elements from the unassembled reads of a whole genome shotgun. PLoSComputat. Biol., 2005, 1(4), e43.
[14]
Price, A.L.; Jones, N.C.; Pevzner, P.A. De novo identification of repeat families in large genomes. Bioinformatics, 2005, 21(Suppl. 1), 351-358.
[15]
Campagna, D.; Romualdi, C.; Vitulo, N.; Del Favero, M.; Lexa, M.; Cannata, N.; Valle, G. RAP: A new computer program for de novo identification of repeated sequences in whole genomes. Bioinformatics, 2004, 21(5), 582-588.
[16]
Lefebvre, A.; Lecroq, T.; Dauchel, H.; Alexandre, J. FORRepeats: Detects repeats on entire chromosomes and between genomes. Bioinformatics, 2003, 19(3), S319-S326.
[17]
Healy, J.; Thomas, E.E.; Schwartz, J.T.; Wigler, M. Annotating large genomes with exact word matches. Genome Res., 2003, 13(10), 2306-2315.
[18]
Kurtz, S.; Narechania, A.; Stein, J.C.; Ware, D. A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes. BMC Genomics, 2008, 9(1), 517.
[19]
Kokot, M.; Długosz, M.; Deorowicz, S. KMC 3: Counting and manipulating k-mer statistics. Bioinformatics, 2017, 33(17), 2759-2761.
[20]
Erbert, M.; Rechner, S.; Müller-Hannemann, M. Gerbil: A fast and memory-efficient k-mer counter with GPU-support. Algor. Mol. Biol., 2017, 12(1), 9.
[21]
Rizk, G.; Lavenier, D.; Chikhi, R. DSK: k-mer counting with very low memory usage. Bioinformatics, 2013, 29(5), 652-653.
[22]
Conway, T.C.; Bromage, A.J. Succinct data structures for assembling large genomes. Bioinformatics, 2011, 27(4), 479-486.
[23]
Stephens, Z.D.; Lee, S.Y.; Faghri, F.; Campbell, R.H.; Zhai, C.; Efron, M.J.; Iyer, R.; Schatz, M.C.; Sinha, S.; Robinson, G.E. Big data: Astronomical or genomical? PLoS Biol., 2015, 13(7), e1002195.
[24]
Brown, T.C.; Howe, A.; Zhang, Q.; Pyrkosz, A.B.; Brom, T.M. A reference-free algorithm for computational normalization of shotgun sequencing data. arXiv:1203.4802, 2012.
[25]
Pell, J.; Hintze, A.; Canino-Koning, R.; Howe, A.; Tiedje, J.M.; Brown, C.T. Scaling metagenome sequence assembly with probabilistic de Bruijn graphs. Proc. Natl. Acad. Sci.USA, 2012, 9(33), 13272-13277.
[26]
Junior, L.C.I.; Brown, C.T. Efficient cardinality estimation for k-mers in large DNA sequencing data sets. F1000. Res., 2016, 1-5.
[27]
Zhang, Q.; Pell, J.; Canino-Koning, R.; Howe, A.C.; Brown, C.T. These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure. PLoS One, 2014, 9(7), e101271.
[28]
Mohamadi, H.; Khan, H.; Birol, I. ntCard: A streaming algorithm for cardinality estimation in genomics data. Bioinformatics, 2017, 33(9), 1324-1330.
[29]
Melsted, P.; Halldorsson, B.V. KmerStream: Streaming algorithms for k-mer abundance estimation. Bioinformatics, 2014, 30(24), 3541-3547.
[30]
Chikhi, R.; Medvedev, P. Sequence analysis informed and automated k-mer size selection for genome assembly. Bioinformatics, 2014, 30(1), 31-37.
[31]
Alon, N.; Matias, Y.; Szegedy, M. The space complexity of approximating the frequency moments. Proceedings of the 28th Annual ACM Symposium on Theory of Computing (STOC), Philadelphia, Pennsylvania, U.S1996, pp. 20-29.
[32]
Bar-Yossef, Z.; Jayram, T.S.; Kumar, R.; Sivakumar, D.; Trevisan, L. Counting distinct elements in a data stream. In: International Workshop on Randomization and Approximation Techniques in Computer Science; Springer: Berlin, Heidelberg, Germany, 2002; pp. 1-10.
[33]
Flajolet, P.; Martin, G.N. Probabilistic counting algorithms for data base applications. J. Comput. Syst. Sci., 1985, 31(2), 182-209.
[34]
Cormode, G.; Muthukrishnan, S. An improved data stream summary: The count-min sketch and its applications. J. Algorithms, 2005, 55(1), 58-75.
[35]
Simpson, J.T. Exploring genome characteristics and sequence quality without a reference. Bioinformatics, 2014, 30(9), 1228-1235.
[36]
Chu, J.; Sadeghi, S.; Raymond, A.; Jackman, S.D.; Nip, K.M.; Mar, R.; Mohamadi, H.; Butterfield, Y.S.; Robertson, A.G.; Birol, I. BioBloom tools: Fast, accurate and memory-efficient host species sequence screening using bloom filters. Bioinformatics, 2014, 30(23), 3402-3404.
[37]
Pérez, N.; Gutierrez, M.; Vera, N. Computational performance assessment of k-mer counting algorithms. J. Comput. Biol., 2016, 23(4), 248-255.
[38]
Mohamadi, H.; Chu, J.; Vandervalk, B.P.; Birol, I. ntHash: Recursive nucleotide hashing. Bioinformatics, 2016, 32(22), 3492-3494.
[39]
Bloom, B.H. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 1970, 13(7), 422-426.
[40]
Crusoe, M.R.; Alameldin, S.; Awad, E.; Boucher, A.; Caldwell, R.; Cartwright, A.; Charbonneau, B.; Constantinides, G.; Edvenson, S.; Fay, J.; Fenton, T.; Fenzl, J.; Fish, L.; Garcia-Gutierrez, P.; Garland, J.; Gluck, I.; González, S.; Guermond, J.; Guo, A.; Gupta, J.R.; Herr, A.; Howe, A.; Hyer, A.; Härpfer, L.; Irber, R.; Kidd, D.; Lin, J.; Lippi, T.; Mansour, P.; McA’Nulty, E.; McDonald, J.; Mizzi, K.D.; Murray, J.R.; Nahum, K.; Nanlohy, A.J.; Nederbragt, H.; Ortiz-Zuazaga, J.; Ory, J.; Pell, C.; Pepe-Ranney, Z.N.; Russ, E.; Schwarz, C.; Scott, J.; Seaman, S.; Sievert, J.; Simpson, C.T.; Skennerton, J.; Spencer, R.; Srinivasan, D.; Standage, J.A.; Stapleton, S.R.; Steinman, J.; Stein, B.; Taylor, W.; Trimble, H.L.; Wiencko, M.; Wright, B.; Wyss, Q.; Zhang, E. Zyme; C.T. Brown. The khmer software package: Enabling efficient nucleotide sequence analysis. F1000 Res., 2015, 4(115), 900.
[41]
Flajolet, P.; Fusy, É.; Gandouet, O.; Meunier, F. Hyperloglog: The analysis of a near-optimal cardinality estimation algorithm In: DiscretMath. Theor. Comput. Sci; , 2007, pp. 137-156.
[42]
Press, W.H.; Teukolsky, S.A.; Vetterling, W.T.; Flannery, B.P. Numerical recipes: The Art of Scientific Computing, 3rd ed; Cambridge University Press: New York, NY, USA, 2007.