Abstract
In this paper, an exhaustive algorithm for extracting structured motifs has been presented. Structured motif is defined as an ordered set of highly-conserved over-presented patterns which occur near each other in a set of DNA sequences .The presented algorithm is based on an innovative data structure called l-mer trie. As opposed to other existing motif finders, this algorithm offers more flexibility in terms of the possibility of determining a range for length of the single patterns, their substitution rates and the spacing between them. The possibility of defining a minimum bound for substitution rates saves considerable time and space in the case of searching for weak motifs occurring with many mutations.
Efficiency of the algorithm has been verified on some artificial sequences as well as real DNA sequences of some plant viruses. The results have been compared with those achieved by RISO, another tree-based algorithm, which is claimed to have notable time and space gains over the best known exact algorithms.
Keywords: Exhaustive algorithm, e-mutated set, geminivirideae, l-mer trie, structured motif, tymovirius.