Efficient and Error-Tolerant Sequencing Read Mapping

Piotr      Jaroszyński; Norbert      Dojer

doi:10.2174/157489361002150518125716

Abstract

Most efficient read mappers build a Ferragina-Manzini index of a genome sequence and then process reads against it. In order to handle differences between reads and corresponding genome fragments, approximate read occurrences are searched in the index. This technique is particularly efficient for mapping reads of length ∼30bp with up to 2-3 errors, as first massive sequencers required.

However, within the last few years, in most popular sequencing technologies read length increased to 75 − 200bp. Since the number of required index queries is exponential with respect to the number of errors, it is hard to maintain the allowed error rate within this method.

We propose a new approach that overcomes this problem. The main idea is to use the Ferragina-Manzini index to filter potential approximate read occurrences. Filtering is based on the intermediate partitioning concept, i.e. reads are split into parts, which are searched in index with reduced number of errors.

We implemented this method in Bmap program. Our experiments show that Bmap outperforms current methods in efficiency without sacrificing mapping accuracy.

Keywords: Approximate string matching, ferragina-manzini index, intermediate partitioning, next generation sequencing, read mapping, succinct suffix array.

« Previous Next »

Graphical Abstract

Rights & Permissions Print Cite

Article Metrics

16

1

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/157489361002150518125716	Print ISSN 1574-8936
Publisher Name Bentham Science Publisher	Online ISSN 2212-392X

Current Bioinformatics

Efficient and Error-Tolerant Sequencing Read Mapping

Abstract Play Pause

Graphical Abstract

Related Journals

Related Books

Abstract