Abstract
Next-generation high-throughput sequencing technologies have opened up new and challenging research opportunities. In particular, Next-generation sequencers produce a massive amount of short-reads data in a single run. However, the large amount of short-reads data produced is highly susceptible to errors, as compared to shotgun sequencing. Therefore, there is a peremptory demand to design fast and more accurate statistical and computational tools to analyze this data. We present HaShRECA, a new short-reads error correction algorithm based on probabilistic analysis of potential read errors that utilizes the Hadoop MapReduce framework. Experimental results show that HaShRECA is more accurate, as well as time and space efficient as compared to previous algorithms.
Keywords: Algorithm, genome, mapreduce, next generation sequencing, short read errors.
Graphical Abstract