Generic placeholder image

Current Bioinformatics

Editor-in-Chief

ISSN (Print): 1574-8936
ISSN (Online): 2212-392X

Efficient and Error-Tolerant Sequencing Read Mapping

Author(s): Piotr Jaroszyński and Norbert Dojer

Volume 10, Issue 2, 2015

Page: [191 - 198] Pages: 8

DOI: 10.2174/157489361002150518125716

Price: $65

Abstract

Most efficient read mappers build a Ferragina-Manzini index of a genome sequence and then process reads against it. In order to handle differences between reads and corresponding genome fragments, approximate read occurrences are searched in the index. This technique is particularly efficient for mapping reads of length ∼30bp with up to 2-3 errors, as first massive sequencers required.

However, within the last few years, in most popular sequencing technologies read length increased to 75 − 200bp. Since the number of required index queries is exponential with respect to the number of errors, it is hard to maintain the allowed error rate within this method.

We propose a new approach that overcomes this problem. The main idea is to use the Ferragina-Manzini index to filter potential approximate read occurrences. Filtering is based on the intermediate partitioning concept, i.e. reads are split into parts, which are searched in index with reduced number of errors.

We implemented this method in Bmap program. Our experiments show that Bmap outperforms current methods in efficiency without sacrificing mapping accuracy.

Keywords: Approximate string matching, ferragina-manzini index, intermediate partitioning, next generation sequencing, read mapping, succinct suffix array.

Graphical Abstract


Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy