Prediction of Protein Function in the Absence of Significant Sequence Similarity

Paul   D.   Dobson; Yu-Dong      Cai; Benjamin   J.   Stapley; Andrew   J.   Doig

doi:10.2174/0929867043364702

Abstract

Tremendous progress in DNA sequencing has yielded the genomes of a host of important organisms. The utilisation of these resources requires understanding of the function of each gene. Standard methods of functional assignment involve sequence alignment to a gene of known function; however such methods often fail to find any significant matches. Here we discuss a number of recent alternative methods that may be of use when sequence alignment fails. Function can be defined in a number of ways including E.C. number and MIPS and KEGG functional classes. Phylogenetic profiles show the pattern of presence or absence of a protein between genomes. Protein-protein interactions can be identified by searching for interacting pairs of proteins that are fused to a single protein chain in another organism. The gene neighbour method uses the observation that if the genes that encode two proteins are close on a chromosome, the proteins tend to be functionally related. More general methods use sequence properties such as amino acid composition, mean hydrophobicity, predicted secondary structure and post-translational modification sites. Data mining methods devise rules in the form of IF... THEN statements that make predictions of function using sequence based attributes, predicted secondary structure and sequence similarity. Finally, structural features can be used, after modelling the structure of a protein from its sequence or solving its structure. Protein fold class can be strongly indicative of function, while other structural features, such as secondary structure content, cleft size and 3D structural motifs are also useful.

Keywords: protein function, protein structure, alignment, structural genomics, homology, genome, protein fold, sequence