Abstract
As more and more protein sequences are available, homolog identification becomes increasingly important for functional, structural, and evolutional studies of proteins. Many homologous proteins were separated a very long time ago in their evolutionary history and thus their sequences share low sequence identity. These remote homologs have become a research focus in bioinformatics over the past decade, and some significant advances have been achieved. In this paper, we provide a comprehensive review on computational techniques used in remote homolog identification based on different methods, including sequence-sequence comparison, and sequence-structure comparison, and structure-structure comparison. Other miscellaneous approaches are also summarized. Pointers to the online resources of these methods and their related databases are provided. Comparisons among different methods in terms of their technical approaches, their strengths, and limitations are followed. Studies on proteins in SARS-CoV are shown as an example for remote homolog identification application.
Keywords: Remote homolog, homolog identification, evolution, sequence analysis, sequence profile, threading