Abstract
Due to advances in structural biology, an increasing number of protein structures of unknown function have been deposited in Protein Data Bank (PDB). These proteins are usually characterized by novel structures and sequences. Conventional comparative methodology (such as sequence alignment, structure comparison, or template search) is unable to determine their function. Thus, it is important to identify proteins function directly from its structure, but this is not an easy task. One of the strategies used is to analyze whether there are distinctive structure-derived features associated with functional residues. If so, one may be able to identify the functional residues directly from a single structure. Recently, we have shown that protein weighted contact number is related to atomic thermal fluctuations and can be used to derive motional correlations in proteins. In this report, we analyze the weighted contact-number profiles of both catalytic residues and non-catalytic residues for a dataset of 760 structures. We found that catalytic residues have distinct distributions of weighted contact numbers from those of non-catalytic residues. Using this feature, we are able to effectively differentiate catalytic residues from other residues with a single optimized threshold value. Our method is simple to implement and compares favourably with other more sophisticated methods. In addition, we discuss the physics behind the relationship between catalytic residues and their contact numbers as well as other features (such as residue centrality or B-factors) associated with catalytic residues.
Keywords: Protein active sites, catalytic residue, weighted contact number, crystallization techniques, instrumentation techniques, PDB, novel structures, homologous structures, aforementioned, B-factors