Abstract
A major breakthrough in classifying proteins from different microbial genomes in terms of sequence similarity was the development of the COG concept by Tatusov et al. in 1997. The authors defined clusters of orthologous groups of proteins (COGs) by strictly applying all against all BLAST alignments of protein sequences from completely sequenced microbial genomes. The latest update of the COG database already covered 66 microbial genomes and additionally included the KOG database, an equivalent consisting of seven eukaryotic genomes. Although excellent web-based software tools designed to analyze this huge amount of data were initially provided by the authors, many other groups independently developed more specialized or extended programs making use of COG data for diverse purposes. Here a brief introduction is given to the concept behind COGs and their potentials in the field of comparative and functional genomics are discussed. The review then is focused on the multitude of recently developed web services aimed at mining the COG database. Their capabilities to solve diverse problems in biochemistry are addressed. In order to illustrate the broad field of possible applications, a compilation of recently published findings, implementing information derived from comparative genomics with emphasis on data retrieved from the COG database, is given.
Keywords: COG database, cluster of orthologous groups, orthologs, comparative genomics