Abstract
Background: Residue-residue interactions play important roles in functional and spatial relationship of proteins. These interactions are usually related to the sequence but display close proximity within three-dimensional structure. In the past few years, identifying residue-residue contacts in proteins is an important prediction problem.
Objective: Many methods extract contact information from multiple sequence alignments (MSAs). Existing methods associated with MSAs are derived from homologous protein sequences. However, they need a large number of homologous protein sequences, average of about several thousand, for residue-residue contact prediction.
Method: In this article, we use both phylogenetic information and amino acid frequency to predict residue-residue contacts, based on small size of MSAs. In order to better reflect evolutionary information, we combine the evolutionary distance matrix and the similarity matrix and produce a novel score to filter some noise, based on amino acid frequency. We use the above information to estimate correlation coefficient between each pair of sites from one target protein family, and extract binding sites with high values of final correlative score.
Results: First, we present statistical analysis of correlative relationship on residue-residue contact. Second, we evaluate our method on 150 benchmark proteins to predict residue-residue contact. Third, we identify protein-protein interaction in bacterial signal transduction. Experiments show that our method is very effective in real applications.
Conclusion: In the case of less protein sequences, experimental results confirm that the performance of our method is better than some currently popular methods. We reduce the number of homologous proteins. Therefore, the computing time to construct phylogenetic trees decreases significantly. On 150 benchmark proteins, our method achieves overall precisions of 68%, 64%, 54% and 45% in the top L/10, L/5, L/2 and L ranked, respectively. The performance of our method is better than the normalized Mutual Information scoring with sequence weighting and the Bayesian approach of Burger & van Nimwegen (B&vN).
Keywords: residue-residue contact, phylogenetic tree, amino acid frequency, multiple sequence alignment, correlation coefficient, co-evolution.
Graphical Abstract