Abstract
Macromolecular events like protein aggregation are complex processes involving physico-chemical properties of their constituting residues. In this study, we used 5-dimensional physico-chemical property (PCP-descriptors) descriptors of amino acids, derived from 237 physico-chemical properties, to develop linear (LM) and neural network (NM) based regression models. We demonstrate their prediction performance in log values of aggregation rates ( ψ ) for 15 human muscle acyl-phosphatase (AcP) mutants. The correlation coefficient between the predicted and the observed ψ - values of the point mutations by LM and NM was 0.81 (p-value < 0.001) and 0.71 (p-value < 0.002) respectively. Using LM, we calculated ψ -values for all possible mutations and performed an average linkage cluster analysis. We identified three groups of amino acids that differ in tolerance to mutations, resulting in increased or decreased aggregation rates. We suggest that our linear regression model can be applied to predict the aggregation propensity of point mutants where only sequence information is known. We also show that sequences containing beta-sheet classes of Structural Classification of Proteins (SCOP) have a higher propensity for aggregation.
Keywords: Protein aggregation, Linear regression, Neural network, PCP-descriptors, Cluster analysis, SCOP