CFCN: An HLA-peptide Prediction Model based on Taylor Extension Theory and Multi-view Learning

Bing      Rao; Bing      Han; Leyi      Wei; Zeyu      Zhang; Xinbo      Jiang; Balachandran      Manavalan

doi:10.2174/0115748936299044240202100019

Abstract

Background: With the increasing development of biotechnology, many cancer solutions have been proposed nowadays. In recent years, Neo-peptides-based methods have made significant contributions, with an essential prerequisite of bindings between peptides and HLA molecules. However, the binding is hard to predict, and the accuracy is expected to improve further.

Methods: Therefore, we propose the Crossed Feature Correction Network (CFCN) with deep learning method, which can automatically extract and adaptively learn the discriminative features in HLA-peptide binding, in order to make more accurate predictions on HLA-peptide binding tasks. With the fancy structure of encoding and feature extracting process for peptides, as well as the feature fusion process between fine-grained and coarse-grained level, it shows many advantages on given tasks.

Results: The experiment illustrates that CFCN achieves better performances overall, compared with other fancy models in many aspects.

Conclusion: In addition, we also consider to use multi-view learning methods for the feature fusion process, in order to find out further relations among binding features. Eventually, we encapsulate our model as a useful tool for further research on binding tasks.

« Previous

[1]
Luo H, Ye H, Ng HW, Sakkiah S, Mendrick DL, Hong H. sNebula, a network-based algorithm to predict binding between human leukocyte antigens and peptides. Sci Rep  2016; 6(1): 32115.
 [http://dx.doi.org/10.1038/srep32115] [PMID:  27558848]

[2]
Cao C, Wang J, Kwok D, et al. webTWAS: A resource for disease candidate susceptibility genes identified by transcriptome-wide association study. Nucleic Acids Res  2022; 50(D1): D1123-30.
 [http://dx.doi.org/10.1093/nar/gkab957] [PMID:  34669946]

[3]
Nilsson JB, Kaabinejadian S, Yari H, et al. Accurate prediction of HLA class II antigen presentation across all loci using tailored data acquisition and refined machine learning. Sci Adv  2023; 9(47): eadj6367.
 [http://dx.doi.org/10.1126/sciadv.adj6367] [PMID:  38000035]

[4]
Mei S, Li F, Xiang D, et al. Anthem: A user customised tool for fast and accurate prediction of binding between peptides and HLA class I molecules. Brief Bioinform  2021; 22(5): bbaa415.
 [http://dx.doi.org/10.1093/bib/bbaa415] [PMID:  33454737]

[5]
Lundegaard C, Lund O, Buus S, Nielsen M. Major histocompatibility complex class I binding predictions as a tool in epitope discovery. Immunology  2010; 130(3): 309-18.
 [http://dx.doi.org/10.1111/j.1365-2567.2010.03300.x] [PMID:  20518827]

[6]
Purcell AW, Ramarathinam SH, Ternette N. Mass spectrometry–based identification of MHC-bound peptides for immunopeptidomics. Nat Protoc  2019; 14(6): 1687-707.
 [http://dx.doi.org/10.1038/s41596-019-0133-y] [PMID:  31092913]

[7]
Yu L, Yang K, He X, Li M, Gao L, Zha Y. Repositioning linifanib as a potent anti-necroptosis agent for sepsis. Cell Death Discov  2023; 9(1): 57.
 [http://dx.doi.org/10.1038/s41420-023-01351-y] [PMID:  36765040]

[8]
Purcell AW, McCluskey J, Rossjohn J. More than one reason to rethink the use of peptides in vaccine design. Nat Rev Drug Discov  2007; 6(5): 404-14.
 [http://dx.doi.org/10.1038/nrd2224] [PMID:  17473845]

[9]
Cheng H, Rao B, Liu L, et al. PepFormer: End-to-End transformer-based siamese network to predict and enhance peptide detectability based on sequence only. Anal Chem  2021; 93(16): 6481-90.
 [http://dx.doi.org/10.1021/acs.analchem.1c00354] [PMID:  33843206]

[10]
Mei S, Li F, Leier A, et al. A comprehensive review and performance evaluation of bioinformatics tools for HLA class I peptide-binding prediction. Brief Bioinform  2020; 21(4): 1119-35.
 [http://dx.doi.org/10.1093/bib/bbz051] [PMID:  31204427]

[11]
Gupta S, Nerli S, Kutti Kandy S, Mersky GL, Sgourakis NG. HLA3DB: Comprehensive annotation of peptide/HLA complexes enables blind structure prediction of T cell epitopes. Nat Commun  2023; 14(1): 6349.
 [http://dx.doi.org/10.1038/s41467-023-42163-z] [PMID:  37816745]

[12]
Wang R, Jiang Y, Jin J, et al. DeepBIO: An automated and interpretable deep-learning platform for high-throughput biological sequence prediction, functional annotation and visualization analysis. Nucleic Acids Res  2023; 51(7): 3017-29.
 [http://dx.doi.org/10.1093/nar/gkad055] [PMID:  36796796]

[13]
Jin J, Yu Y, Wang R, et al. iDNA-ABF: Multi-scale deep biological language learning model for the interpretable prediction of DNA methylations. Genome Biol  2022; 23(1): 219.
 [http://dx.doi.org/10.1186/s13059-022-02780-1] [PMID:  36253864]

[14]
Zeng X, Wang F, Luo Y, et al. Deep generative molecular design reshapes drug discovery. Cell Rep Med  2022; 3(12): 100794.
 [http://dx.doi.org/10.1016/j.xcrm.2022.100794] [PMID:  36306797]

[15]
Xu J. Graph embedding and Gaussian mixture variational autoencoder network for end-to-end analysis of single-cell RNA sequencing data. Cell Rep Methods  2023; 3(1): 100382.

[16]
Li HL, Pang YH, Liu B. BioSeq-BLM: A platform for analyzing DNA, RNA and protein sequences based on biological language models. Nucleic Acids Res  2021; 49(22): e129.
 [http://dx.doi.org/10.1093/nar/gkab829] [PMID:  34581805]

[17]
Tang YJ, Pang YH, Liu B. IDP-Seq2Seq: Identification of intrinsically disordered regions based on sequence to sequence learning. Bioinformatics  2021; 36(21): 5177-86.
 [http://dx.doi.org/10.1093/bioinformatics/btaa667] [PMID:  32702119]

[18]
Chen L, Yu L, Gao L. Potent antibiotic design via guided search from antibacterial activity evaluations. Bioinformatics  2023; 39(2): btad059.
 [http://dx.doi.org/10.1093/bioinformatics/btad059] [PMID:  36707990]

[19]
2023 Alzheimer’s disease facts and figures. Alzheimers Dement  2023; 19(4): 1598-695.
 [http://dx.doi.org/10.1002/alz.13016] [PMID:  36918389]

[20]
Hu Y, Sun J, Zhang Y, et al. rs1990622 variant associates with Alzheimer’s disease and regulates TMEM106B expression in human brain tissues. BMC Med  2021; 19(1): 11.
 [http://dx.doi.org/10.1186/s12916-020-01883-5] [PMID:  33461566]

[21]
Hu Y, Zhang H, Liu B, et al. rs34331204 regulates TSPAN13 expression and contributes to Alzheimer’s disease with sex differences. Brain  2020; 143(11): e95.
 [http://dx.doi.org/10.1093/brain/awaa302] [PMID:  33175954]

[22]
Hu Y, Zhang Y, Zhang H, et al. Mendelian randomization highlights causal association between genetically increased C‐reactive protein levels and reduced Alzheimer’s disease risk. Alzheimers Dement  2022; 18(10): 2003-6.
 [http://dx.doi.org/10.1002/alz.12687] [PMID:  35598332]

[23]
Hu Y, Zhang Y, Zhang H, et al. Cognitive performance protects against Alzheimer’s disease independently of educational attainment and intelligence. Mol Psychiatry  2022; 27(10): 4297-306.
 [http://dx.doi.org/10.1038/s41380-022-01695-4] [PMID:  35840796]

[24]
Liu G, Li D, Li Z, et al. PSSMHCpan: A novel PSSM-based software for predicting class I peptide-HLA binding affinity. Gigascience  2017; 6(5): 1-11.
 [http://dx.doi.org/10.1093/gigascience/gix017] [PMID:  28327987]

[25]
Bassani-Sternberg M, Chong C, Guillaume P, et al. Deciphering HLA-I motifs across HLA peptidomes improves neo-antigen predictions and identifies allostery regulating HLA specificity. PLOS Comput Biol  2017; 13(8): e1005725.
 [http://dx.doi.org/10.1371/journal.pcbi.1005725] [PMID:  28832583]

[26]
Rammensee HG, Bachmann J, Emmerich NPN, Bachor OA, Stevanović S. SYFPEITHI: database for MHC ligands and peptide motifs. Immunogenetics  1999; 50(3-4): 213-9.
 [http://dx.doi.org/10.1007/s002510050595] [PMID:  10602881]

[27]
Reche PA, Glutting JP, Reinherz EL. Prediction of MHC class I binding peptides using profile motifs. Hum Immunol  2002; 63(9): 701-9.
 [http://dx.doi.org/10.1016/S0198-8859(02)00432-9] [PMID:  12175724]

[28]
Zhang H, Lund O, Nielsen M. The PickPocket method for predicting binding specificities for receptors based on receptor pocket similarities: Application to MHC-peptide binding. Bioinformatics  2009; 25(10): 1293-9.
 [http://dx.doi.org/10.1093/bioinformatics/btp137] [PMID:  19297351]

[29]
Andreatta M, Nielsen M. Gapped sequence alignment using artificial neural networks: Application to the MHC class I system. Bioinformatics  2016; 32(4): 511-7.
 [http://dx.doi.org/10.1093/bioinformatics/btv639] [PMID:  26515819]

[30]
Wu J, Wang W, Zhang J, et al. DeepHLApan: A deep learning approach for neoantigen prediction considering both HLA-peptide binding and immunogenicity. Front Immunol  2019; 10: 2559.
 [http://dx.doi.org/10.3389/fimmu.2019.02559] [PMID:  31736974]

[31]
Chen J, Zou Q, Li J. DeepM6ASeq-EL: Prediction of human N6-methyladenosine (m6A) sites with LSTM and ensemble learning. Front Comput Sci  2022; 16(2): 162302.
 [http://dx.doi.org/10.1007/s11704-020-0180-0]

[32]
Ye Y, Wang J, Xu Y, et al. MATHLA: A robust framework for HLA-peptide binding prediction integrating bidirectional LSTM and multiple head attention mechanism. BMC Bioinformatics  2021; 22(1): 7.
 [http://dx.doi.org/10.1186/s12859-020-03946-z] [PMID:  33407098]

[33]
Liu Z, Cui Y, Xiong Z, Nasiri A, Zhang A, Hu J. DeepSeqPan, a novel deep convolutional neural network model for pan-specific class I HLA-peptide binding affinity prediction. Sci Rep  2019; 9(1): 794.
 [http://dx.doi.org/10.1038/s41598-018-37214-1] [PMID:  30692623]

[34]
Rasmussen M, Fenoy E, Harndahl M, et al. Pan-specific prediction of peptide–MHC class I complex stability, a correlate of T cell immunogenicity. J Immunol  2016; 197(4): 1517-24.
 [http://dx.doi.org/10.4049/jimmunol.1600582] [PMID:  27402703]

[35]
Bhattacharya R. Prediction of peptide binding to MHC Class I proteins in the age of deep learning. BioRxiv  2017; 154757.

[36]
O'Donnell TJ. MHCflurry: Open-source class I MHC binding affinity prediction. Cell systems  2018; 7(1): 129-32. e4
 [http://dx.doi.org/10.1016/j.cels.2018.05.014]

[37]
Han Y, Kim D. Deep convolutional neural networks for pan-specific peptide-MHC class I binding prediction. BMC Bioinformatics  2017; 18(1): 585.
 [http://dx.doi.org/10.1186/s12859-017-1997-x] [PMID:  29281985]

[38]
Vang YS, Xie X. HLA class I binding prediction via convolutional neural networks. Bioinformatics  2017; 33(17): 2658-65.
 [http://dx.doi.org/10.1093/bioinformatics/btx264] [PMID:  28444127]

[39]
Luo X, Chi W, Deng M. Deepprune: Learning efficient and interpretable convolutional networks through weight pruning for predicting dna-protein binding. Front Genet  2019; 10: 1145.
 [http://dx.doi.org/10.3389/fgene.2019.01145] [PMID:  31824562]

[40]
Luo X, Tu X, Ding Y, Gao G, Deng M. Expectation pooling: An effective and interpretable pooling method for predicting DNA–protein binding. Bioinformatics  2020; 36(5): 1405-12.
 [http://dx.doi.org/10.1093/bioinformatics/btz768] [PMID:  31598637]

[41]
Karosiene E, Lundegaard C, Lund O, Nielsen M. NetMHCcons: A consensus method for the major histocompatibility complex class I predictions. Immunogenetics  2012; 64(3): 177-86.
 [http://dx.doi.org/10.1007/s00251-011-0579-8] [PMID:  22009319]

[42]
Zhang C, Liu Y, Fu H. Ae2-nets: Autoencoder in autoencoder networks. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15-20 June 2019; Long Beach, CA, USA. 2019.

[43]
Dhanda SK, Mahajan S, Paul S, et al. IEDB-AR: Immune epitope database—analysis resource in 2019. Nucleic Acids Res  2019; 47(W1): W502-6.
 [http://dx.doi.org/10.1093/nar/gkz452] [PMID:  31114900]

[44]
Neefjes J, Jongsma MLM, Paul P, Bakke O. Towards a systems understanding of MHC class I and MHC class II antigen presentation. Nat Rev Immunol  2011; 11(12): 823-36.
 [http://dx.doi.org/10.1038/nri3084] [PMID:  22076556]

[45]
Thomas G, Finney R. Calculus and Analytic Geometry. Reading, MA: Addison-Wesley Publishing 1996.

[46]
Kline M. Calculus: an intuitive and physical approach. Courier Corporation 1998.

[47]
Zhang C, Cui Y, Han Z, Zhou JT, Fu H, Hu Q. Deep partial multi-view learning. IEEE Trans Pattern Anal Mach Intell  2020; 44(3): 2402-15.
 [http://dx.doi.org/10.1109/TPAMI.2020.3037734] [PMID:  33180720]

[48]
Wang Y, Pang C, Wang Y, et al. Retrosynthesis prediction with an interpretable deep-learning framework based on molecular assembly tasks. Nat Commun  2023; 14(1): 6155.
 [http://dx.doi.org/10.1038/s41467-023-41698-5] [PMID:  37788995]

[49]
Li H, Liu B. BioSeq-Diabolo: Biological sequence similarity analysis using Diabolo. PLOS Comput Biol  2023; 19(6): e1011214.
 [http://dx.doi.org/10.1371/journal.pcbi.1011214] [PMID:  37339155]

[50]
Xu C, Tao D, Xu C. A survey on multi-view learning. ar Xiv:13045634 2013.

[51]
Dao FY, Liu ML, Su W, et al. AcrPred: A hybrid optimization with enumerated machine learning algorithm to predict Anti-CRISPR proteins. Int J Biol Macromol  2023; 228: 706-14.
 [http://dx.doi.org/10.1016/j.ijbiomac.2022.12.250] [PMID:  36584777]

[52]
Dao FY, Lv H, Fullwood MJ, Lin H. Accurate identification of DNA replication origin by fusing epigenomics and chromatin interaction information. Research   2022; 2022: 2022/9780293.
 [http://dx.doi.org/10.34133/2022/9780293] [PMID: 36405252 ]

[53]
Ao C, Ye X, Sakurai T, Zou Q, Yu L. m5U-SVM: Identification of RNA 5-methyluridine modification sites based on multi-view features of physicochemical features and distributed representation. BMC Biol  2023; 21(1): 93.
 [http://dx.doi.org/10.1186/s12915-023-01596-0] [PMID:  37095510]

[54]
Wang Y, Zhai Y, Ding Y, Zou Q. SBSM-Pro: Support bio-sequence machine for proteins. arXiv:230810275 2023.

[55]
Qian Y, Ding Y, Zou Q, Guo F. Multi-view kernel sparse representation for identification of membrane protein types. IEEE/ACM Trans Comput Biol Bioinformatics  2023; 20(2): 1234-45.
 [http://dx.doi.org/10.1109/TCBB.2022.3191325] [PMID:  35857734]

[56]
Liu X, Yang H, Ai C, Ding Y, Guo F, Tang J. MVML-MPI: Multi-view multi-label learning for metabolic pathway inference. Brief Bioinform  2023; 24(6): bbad393.
 [PMID:  37930024]

[57]
Liang C, Wang L, Liu L, Zhang H, Guo F. Multi-view unsupervised feature selection with tensor robust principal component analysis and consensus graph learning. Pattern Recognit  2023; 141: 109632.
 [http://dx.doi.org/10.1016/j.patcog.2023.109632]

[58]
Liu J. Multi-view clustering via joint nonnegative matrix factorization. Proceedings of the 2013 SIAM international conference on data mining  252-60.
 [http://dx.doi.org/10.1137/1.9781611972832.28]

[59]
Kumar A, Rai P, Daume H. Co-regularized multi-view spectral clustering. Adv Neural Inf Process Syst  2011; 24: 1413-21.

[60]
Zeng X, Xiang H, Yu L, et al. Accurate prediction of molecular properties and drug targets using a self-supervised image representation learning framework. Nat Mach Intell  2022; 4(11): 1004-16.
 [http://dx.doi.org/10.1038/s42256-022-00557-6]

[61]
Hotelling H. Relations between two sets of variates Breakthroughs in statistics.  Springer 1992; pp. 162-90.
 [http://dx.doi.org/10.1007/978-1-4612-4380-9_14]

[62]
Akaho S. A kernel method for canonical correlation analysis. arXiv preprint cs/0609071 2006.

[63]
Andrew G, Raman A, Jeff B, Karen LP. Deep canonical correlation analysis Proceedings of the 30th International Conference on Machine Learning, PMLR.  1247-55.

[64]
Song J, Li F, Leier A, et al. PROSPERous: High-throughput prediction of substrate cleavage sites for 90 proteases with improved accuracy. Bioinformatics  2018; 34(4): 684-7.
 [http://dx.doi.org/10.1093/bioinformatics/btx670] [PMID:  29069280]

[65]
Agarap AF. Deep learning using rectified linear units (relu). ar Xiv:180308375 2018.

[66]
He S, Ye X, Sakurai T, Zou Q. MRMD3.0: A Python tool and webserver for dimensionality reduction and data visualization via an ensemble strategy. J Mol Biol  2023; 435(14): 168116.
 [http://dx.doi.org/10.1016/j.jmb.2023.168116] [PMID:  37356901]

[67]
Bradley AP. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit  1997; 30(7): 1145-59.
 [http://dx.doi.org/10.1016/S0031-3203(96)00142-2]

[68]
Yan K, Lv H, Guo Y, Peng W, Liu B. sAMPpred-GAT: prediction of antimicrobial peptide by graph attention network and predicted peptide structure. Bioinformatics  2023; 39(1): btac715.
 [http://dx.doi.org/10.1093/bioinformatics/btac715] [PMID:  36342186]

[69]
Jin Huang, Ling CX. Using AUC and accuracy in evaluating learning algorithms. IEEE Trans Knowl Data Eng  2005; 17(3): 299-310.
 [http://dx.doi.org/10.1109/TKDE.2005.50]

[70]
Matthews BW. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta Protein Struct  1975; 405(2): 442-51.
 [http://dx.doi.org/10.1016/0005-2795(75)90109-9] [PMID:  1180967]

[71]
Zou X, Ren L, Cai P, et al. Accurately identifying hemagglutinin using sequence information and machine learning methods. Front Med  2023; 10: 1281880.
 [http://dx.doi.org/10.3389/fmed.2023.1281880] [PMID:  38020152]

[72]
Zhu W, Yuan SS, Li J, Huang CB, Lin H, Liao B. A first computational frame for recognizing heparin-binding protein. Diagnostics  2023; 13(14): 2465.
 [http://dx.doi.org/10.3390/diagnostics13142465] [PMID:  37510209]

[73]
Liu B, Gao X, Zhang H. BioSeq-Analysis2.0: An updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches. Nucleic Acids Res  2019; 47(20): e127.
 [http://dx.doi.org/10.1093/nar/gkz740] [PMID:  31504851]

Rights & Permissions Print Cite

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/0115748936299044240202100019	Print ISSN 1574-8936
Publisher Name Bentham Science Publisher	Online ISSN 2212-392X

Current Bioinformatics

CFCN: An HLA-peptide Prediction Model based on Taylor Extension Theory and Multi-view Learning

Abstract Play Pause

Related Journals

Related Books

Abstract