Abstract
MHC-epitope binding plays a key role in the cellular immune response. Accurate prediction of MHC-epitope binding affinity can greatly expedite epitope screening by reducing costs and experimental effort. In this paper, 13 T descriptors, which derived from 544 physicochemical properties of the natural amino acids, were used to characterize 4 MHC class I alleles epitope peptide sequences, the optimal QSAR models were constructed by using stepwise regression combines with multiple linear regression (STR-MLR). For HLA-A*0201, HLA-A*0203, HLA-A*0206 and HLAA*1101 alleles, the leave one out cross validation values (Q2 train) were 0.581, 0.553, 0.525 and 0.588, the correlation coefficients (R2 train) of training datasets were 0.607, 0.582, 0.556 and 0.606, the correlation coefficients (R2 test) of test datasets were 0.533, 0.506, 0.501 and 0.502, respectively. The results showed that all models can obtain good performance for prediction and explain the mechanism of interaction between MHC and epitope. The descriptors will be useful in structure characterization and activity prediction of peptide sequences.
Keywords: Major Histocompatibility Complex (MHC), MHC class I allele, quantitative structure-activity relationship (QSAR), amino acids descriptors, stepwise regression-multiple linear regression (STR-MLR), CTL, MS-WHIM, HESH, MLR, STR, jackknife test, hydrophobic, stereo and physicochemical properties, STR-MLR, benchmark dataset, hypophamineMajor Histocompatibility Complex (MHC), MHC class I allele, quantitative structure-activity relationship (QSAR), amino acids descriptors, stepwise regression-multiple linear regression (STR-MLR), CTL, MS-WHIM, HESH, MLR, STR, jackknife test, hydrophobic, stereo and physicochemical properties, STR-MLR, benchmark dataset, hypophamine