Abstract
In this study, we used two categories of molecular descriptors as CODESSA and DPPS (divided physicochemical property scores of amino acids) to parameterize structural characteristics of 2015 human amphiphysin SH3 domainbinding decapeptides at atom and residue levels. Based upon that, several robust quantitative structure-affinity relationship (QSAR) models were then constructed using partial least squares regression (PLS) and least squares-support vector machine (LSSVM) coupled with genetic algorithm (GA)-variable selection. Results show that (1) GA is a powerful tool for variable selection by which the most informative variable combinations can be efficiently determined for PLS and LSSVM modeling, (2) regression models constructed using nonlinear LSSVM approach are more robust and predictable than those by linear PLS method, (3) the residue level descriptor (DPPS) performs better in capturing peptide structural characteristics, more amenable than those from the atom level descriptor (CODESSA). By investigating the optimal DPPS-based GA-LSSVM model, it is indicated that the core motif of SH3 domain-binding peptides contributes significantly to the binding affinity, whereas the two end residues, especially the N-terminal residue, have a little effect on the binding process.
Keywords: Human amphiphysin SH3 domain, peptide, quantitative structure-affinity relationship, genetic algorithm, partial least squares regression, least squares-support vector machine, peptide descriptor