Abstract
Deciphering of the human genome and other model genomes presents the challenge of unraveling the gene products that are expressed by these genomes, and identifying the functional role of the expressed proteins. Rapid determination of the three-dimensional (3D) structures of the gene products is vital in this process and is the focus of various Structural Genomics initiatives around the world. However, determination of the structures of thousands of new proteins by experimental methods, such as X-ray crystallography and NMR remains a formidable task. Here we review a novel approach using an Augmented Homology Modeling technology termed "ProMax" to provide high-quality 3D protein structure from protein sequence data, rigorously assessed by comparison of the Root Mean Square Deviation (RMSD) of Cα carbons in modeled versus subsequently determined X-ray structures, normalized residue energies, and Ramachandran analyses. The modeling procedure starts with determination of the appropriate template 3D structures and proceeds from initial structure generation through multiple iterative energy-based structure refinements assessed against an elaborate panel of 3D structure quality assessment tests. The method is general, applicable to a broad cross-section of protein families, and reasonably accurate even for protein families with low sequence homology to available template structures. The Augmented Homology Modeling approach was tested on various target sequences proposed for the Critical Assessment of techniques for protein Structure Prediction (CASP) competitions: CASP3, CASP4 and CASP5. Also, more than 40 models derived with the "ProMax" approach were compared with X-ray and NMR structures released in the Protein Data Bank after the models were built. The comparison showed good agreement between our models and experimental structures within the core Cα atom RMSD < 2.0 Å, and with a stereochemical quality of the models approaching that of experimental structures. While this method is not a replacement for experimental methods such as X-ray crystallography, it is highly useful to derive 3D structures of protein homologues within or across genomes as a first approximation, with the accuracy and quality sufficient to use these models in rational experimental projects involving protein engineering, mutagenesis design, virtual screening and docking simulations.
Keywords: Homology modeling, protein structure prediction, refinement, protein modeling, CASP (Critical Assessment of techniques for protein Structure Prediction), structure-based drug design