Abstract
Background: Genomes of organisms contains a variety of repeated structures of various lengths and type, interspersed or tandem. Tandem repeats play important role in molecular biology as they are related to genetic backgrounds of inherited diseases, and also they can serve as markers for DNA mapping and DNA fingerprinting. Improving the efficiency of algorithms for searching the tandem repeats in DNA sequences can lead to many useful applications in the area of genomics.
Objective: We introduce an efficient algorithm of O(n) for searching the maximum length exact tandem repeats in genomes.
Method: Algorithm is based on the use of the Enhanced Suffix Array (ESA). ESA consists of Suffix Array (SA) and Longest Common Prefix (LCP) array. SA is an array of all sorted suffixes of a string and LCP array stores the lengths of the longest common prefixes between all pairs of consecutive suffixes in a sorted suffix array.
Results: We compare the results of our computation with other existing application: Burrows Wheeler Tandem Repeat Searcher (BWtrs) for searching the exact tandem repeats. We provided an open source standalone application called TR-ESA (available at: www.algorithms-akgec-shivika.in/tandem), which implements searching of exact maximum length tandem repeat.
Conclusion: Tool is remarkably efficient and powerful which allows the analysis of complete genomes having exact tandem repeats.
Keywords: DNA, tandem repeats, exact tandem repeats, microsatellites, enhanced suffix array.
Graphical Abstract