Abstract
In the multiple sequence alignment (MSA) of the members of a protein family some positions are highly conserved, while others vary. The conserved positions are clearly important, but the non-conserved positions are also important because the destabilizing effects of a given amino acid at one position can be compensated by the stabilizing effect of a certain amino acid at another position: in other words, two (or more) positions in a protein family can coevolve. Information about coevolving positions is valuable to understand the protein mechanism and dynamic properties, and to design mutagenesis studies. Several methods are available for the identification of coevolving positions from the analysis of MSAs. If an MSA contains a large number of sequences, information about the proximities between residues derived from coevolution maps can be sufficient to predict a protein fold. Conversely, if the structure of at least one representative member of a protein family is known, coevolution maps obtained by different methods can be validated against the distance map derived from the structure. In the absence of a reference structure, validation of the results obtained with the experimental MSA can be obtained by evaluating the performance of different methods with synthetic MSAs that mimic the features of the experimental one, and in which the covarying positions are known. Using a single protein family as an example, we review here the steps involved in the derivation and validation of coevolution maps from MSAs.
Keywords: Bioinformatics, coevolution, covariation, evolution, methods, multiple sequence alignments, sequence analysis, simulation.
Graphical Abstract