Abstract
As more and more genomes have been sequenced, the hotspot of biological science is moving from the study on nucleic acids to that on proteins. One of the most representative affairs in this era is the launching of projects focusing on the high throughput determination of the protein three dimensional structures in a genome scale, named structural genomics projects. The common objective of the pilot projects is construction of a platform to clone hundreds to thousands of targets, and purify, at the first stage, the highly expressed and soluble proteins, then solve dozens to hundred of structures by means of X-ray crystallography or NMR. The first bottleneck in this pipeline is obtaining manipulable quantity (milligram level according to the necessity of the current crystallographic or NMR technology) of soluble proteins, which are properly folded. For this purpose, a series of methodologies have been established. His-tag makes it possible to purify the desired protein from the crude extract of the host cells or the mix of in vitro expression system in a single step of affinity purification. Beyond His-tag and other short affinity tags, a series of fusion tags have been developed for the purpose of solubility enhancement. Many expression hosts based on bacterium Escherichia coli or yeast Pichia pastoris have been constructed to express heterologous proteins. The influences of temperature during induction and co-expression with chaperones are systematically investigated. The effects of the N-terminal tags, either small or big ones, are examined and compared with those carrying a tag at the C-terminus. Some techniques of in vitro evolution are transferred and applied to increase the expression level and solubility of the targets. These efforts are helpful, at least at one hand, for speeding up the production of folded and concentrated proteins, the sample feeding the crystallographers and NMR spectroscopists. Furthermore three newly determined structures by the pilot structural genomics projects were shown to demonstrate how to decipher biological function from the 3-D structure of a novel protein and how to interpret the structure-function relationship and its application on drug design. In present paper, we review the popular techniques applied in the current structural genomics projects, their effects and possible improvements in the future, especially those for protein preparation and function interpretation.
Keywords: structural genomics, overexpression, high throughput, purification, affinity tag, solubility enhancement tag, directed evolution