Abstract
Background: Predicting protein-peptide binding affinity is one of the leading research subjects in peptide drug design and repositioning. In previous studies, models constructed by researchers just used features of peptide structures. These features had limited information and could not describe the proteinpeptide interaction mode. This made models and predicted results lack interpretability in pharmacy and biology, which led to the protein-peptide interaction mode not being reflected. Therefore, it was of little significance for the design of peptide drugs.
Objective: Considering the protein-peptide interaction mode, we extracted protein-peptide interaction interface characteristics and built machine learning models to improve the performance and enhance the interpretability of models.
Methods: Taking MHC-I protein and its binding peptides as the research object, protein-peptide complexes were obtained by molecular docking, and 94 protein-peptide interaction interface characteristics were calculated. Then ten important features were selected using recursive feature elimination to construct SVR, RF, and MLP models to predict protein-peptide binding affinity.
Results: The MAE of the SVR, RF and MLP models constructed using protein-peptide interaction interface characteristics are 0.2279, 0.2939 and 0.2041, their MSE are 0.1289, 0.1308 and 0.0780, and their R2 reached 0.8711, 0.8692 and 0.9220, respectively.
Conclusion: The model constructed using protein-peptide interaction interface characteristics showed better prediction results. The key features for predicting protein-peptide binding affinity are the bSASA of negatively charged species, hydrogen bond acceptor, hydrophobic group, planarity, and aromatic ring.
Keywords: MHC-I protein, binding affinity, protein-peptide interaction, molecular docking, recursive feature elimination, machine learning
Graphical Abstract
[http://dx.doi.org/10.1056/NEJM198112103052401] [PMID: 6272109]
[http://dx.doi.org/10.1002/jia2.25779]
[http://dx.doi.org/10.1517/14740338.2015.1073258] [PMID: 26212044]
[http://dx.doi.org/10.1111/cbdd.12055] [PMID: 23253135]
[http://dx.doi.org/10.1016/bs.apcsb.2018.01.007] [PMID: 29680240]
[http://dx.doi.org/10.1093/bib/bbaa153] [PMID: 32770192]
[http://dx.doi.org/10.3390/pharmaceutics13081237] [PMID: 34452198]
[http://dx.doi.org/10.1007/s10989-021-10222-y]
[http://dx.doi.org/10.1038/nri.2017.131] [PMID: 29226910]
[http://dx.doi.org/10.3390/ijms12052853] [PMID: 21686155]
[http://dx.doi.org/10.2174/138955713804484776] [PMID: 22512573]
[http://dx.doi.org/10.1002/psc.2465] [PMID: 23136069]
[http://dx.doi.org/10.1016/j.bbrc.2019.03.003] [PMID: 30878187]
[http://dx.doi.org/10.1371/journal.pone.0140404] [PMID: 26466362]
[http://dx.doi.org/10.4049/jimmunol.1700950] [PMID: 30936293]
[http://dx.doi.org/10.3390/v11040330] [PMID: 30959883]
[http://dx.doi.org/10.4049/immunohorizons.1900074] [PMID: 31791977]
[http://dx.doi.org/10.1016/j.vaccine.2021.03.008] [PMID: 33744048]
[http://dx.doi.org/10.1016/j.vaccine.2018.08.043] [PMID: 30522906]
[http://dx.doi.org/10.1371/journal.pone.0084246] [PMID: 24376799]
[http://dx.doi.org/10.1016/j.ebiom.2016.01.029] [PMID: 27077111]
[http://dx.doi.org/10.1073/pnas.1102524108] [PMID: 21383180]
[http://dx.doi.org/10.1074/jbc.M114.609263] [PMID: 25320083]
[http://dx.doi.org/10.3389/fimmu.2015.00335] [PMID: 26191062]
[http://dx.doi.org/10.1093/bioinformatics/btz427] [PMID: 31120490]
[http://dx.doi.org/10.4049/jimmunol.1700893] [PMID: 28978689]
[http://dx.doi.org/10.1186/s12859-017-1997-x] [PMID: 29281985]
[http://dx.doi.org/10.1093/bioinformatics/btab312] [PMID: 34252932]
[http://dx.doi.org/10.3389/fimmu.2019.01856] [PMID: 31440245]
[http://dx.doi.org/10.1007/s40484-019-0172-y]
[http://dx.doi.org/10.1016/j.ddtec.2013.01.003] [PMID: 24050137]
[http://dx.doi.org/10.1186/1471-2105-14-82] [PMID: 23497081]
[http://dx.doi.org/10.1089/biores.2016.0035] [PMID: 27872794]
[http://dx.doi.org/10.1002/jcc.21256] [PMID: 19399780]
[http://dx.doi.org/10.1021/acs.jcim.1c00203] [PMID: 34278794]
[http://dx.doi.org/10.1158/0008-5472.CAN-17-0511] [PMID: 29092940]
[http://dx.doi.org/10.1007/978-1-0716-0708-4_15] [PMID: 32621230]
[http://dx.doi.org/10.1093/nar/gky357] [PMID: 29746661]
[http://dx.doi.org/10.1093/bioinformatics/btz185] [PMID: 30865258]
[http://dx.doi.org/10.1093/nar/gkx335] [PMID: 28460116]
[http://dx.doi.org/10.1093/nar/gkv495] [PMID: 25969449]
[http://dx.doi.org/10.1021/acs.jctc.9b01208] [PMID: 32324992]
[http://dx.doi.org/10.1186/s13321-017-0246-7] [PMID: 29168051]
[http://dx.doi.org/10.1002/prot.21214] [PMID: 17096427]
[http://dx.doi.org/10.1110/ps.062501507] [PMID: 17123961]
[http://dx.doi.org/10.1146/annurev.bb.06.060177.001055] [PMID: 326146]
[http://dx.doi.org/10.1023/A:1011115820450] [PMID: 11394736]
[http://dx.doi.org/10.1093/nar/28.1.235] [PMID: 10592235]
[http://dx.doi.org/10.1021/acs.jcim.9b00645] [PMID: 31638801]
[http://dx.doi.org/10.1111/j.1467-9868.2011.00771.x]
[http://dx.doi.org/10.1016/j.asoc.2020.106921]
[http://dx.doi.org/10.1016/j.ab.2007.10.012] [PMID: 17976365]
[PMID: 19177349]
[http://dx.doi.org/10.1074/jbc.M117.776542] [PMID: 28179428]
[http://dx.doi.org/10.1074/jbc.M411323200] [PMID: 15537658]
[http://dx.doi.org/10.1016/j.bmc.2006.05.003] [PMID: 16714117]
[http://dx.doi.org/10.1016/j.actbio.2013.10.025] [PMID: 24184177]
[http://dx.doi.org/10.1002/cbic.201700048] [PMID: 28432776]
[http://dx.doi.org/10.1007/s12551-013-0130-2] [PMID: 28509958]
[http://dx.doi.org/10.1002/(SICI)1096-987X(19981115)19:14<1639::AID-JCC10>3.0.CO;2-B]
[http://dx.doi.org/10.1186/1472-6807-13-S1-S11]
[http://dx.doi.org/10.1016/S0006-3495(03)70017-4] [PMID: 12719222]