Abstract
Background: There is an increasing need to routinely and quickly compare multiple sequences of, for example, bird flu virus genomes to infer their evolutionary relationship. This entails a fast simultaneous inference of both sequence alignment and phylogeny. Current methods cannot meet the speed requirement though a high phylogeny accuracy is maintained in such scenarios.
Objective: We propose a Fast Algorithm for constructing Multiple sequence Alignment and Phylogeny (FAMAP) from closely related DNA sequences.
Method: FAMAP is essentially a sequentially-inputting algorithm and can be implemented in a progressive fashion, i.e., adding a new sequence into an existing tree or multiple sequence alignment. Its time complexity is O[NP(L)] + O(NG) and its space complexity is O(N) + O(G) + O[Q(L)] , where N is the number of sequences, N is the number of mutations on the phylogeny, L is the maximum length of the sequences, and P(L) and Q(L) are the time and space complexity of aligning a pair of sequences of length L, depending on the pairwise alignment algorithm employed.
Results: Intensive simulation studies shows that our method is superior in terms of speed over other popular methods and has comparable accuracy of both multiple sequence alignment and the phylogeny.
Conclusion: Our new algorithm might be one of the best choices when the user wants to quickly obtain a reliable phylogeny estimation from dozens of closely related long sequences
Keywords: Bioinformatics software, fast algorithm, multiple sequence alignment, phylogenetic tree, simultaneous reconstruction, tree topology
Graphical Abstract