Generic placeholder image

Current Bioinformatics

Editor-in-Chief

ISSN (Print): 1574-8936
ISSN (Online): 2212-392X

Research Article

A Sequence-segment Neighbor Encoding Schema for Protein Hotspot Residue Prediction

Author(s): Peng Chen, Tong Shen, Youzhi Zhang* and Bing Wang*

Volume 15, Issue 5, 2020

Page: [445 - 454] Pages: 10

DOI: 10.2174/1574893615666200106115421

Price: $65

Abstract

Background: Hotspots are those residues that contribute major free energy of binding in protein-protein interactions. Protein functions are frequently dependent on hotspot residues. At present, hotspot residues are always identified by Alanine scanning mutagenesis technology, which is costly, time-consuming and laborious.

Objective: Therefore, more accurate and efficient methods have to be developed to identify protein hotspot residues.

Methods: This paper proposed a novel encoding schema of sequence-segment neighbors and constructed a random forest-based model to identify hotspots in protein interaction interfaces. Firstly, 10 amino acid physicochemical properties, 16 features related to the PI and DI, and 25 features related to ASA were extracted. Different from the previous residue encoding schemas, such as auto correlation descriptor or triplet combination information, this paper employed the influence of amino acids neighbors to hotspot residues and amino acids with a certain distance in sequence to the hotspot.

Results: Moreover, the proposed model was compared with other hotspot prediction methods, including APIS, Robetta, FOLDEF, KFC, MINERVA models, etc.

Conclusion: The experimental results showed that the proposed model can improve the prediction ability of protein hotspot residues on the same test set.

Keywords: Protein interaction, hotspots, encoding of sequence-segment neighbors, sliding window, random forest, schema.

Graphical Abstract

[1]
DeLano WL, Ultsch MH, de Vos AM, Wells JA. Convergent solutions to binding at a protein-protein interface. Science 2000; 287(5456): 1279-83.
[http://dx.doi.org/10.1126/science.287.5456.1279] [PMID: 10678837]
[2]
Schmeichel KL, Beckerle MC. The LIM domain is a modular protein-binding interface. Cell 1994; 79(2): 211-9.
[http://dx.doi.org/10.1016/0092-8674(94)90191-0] [PMID: 7954790]
[3]
Blanton MP, Cohen JB. Identifying the lipid-protein interface of the Torpedo nicotinic acetylcholine receptor: secondary structure implications. Biochemistry 1994; 33(10): 2859-72.
[http://dx.doi.org/10.1021/bi00176a016] [PMID: 8130199]
[4]
Thorn KS, Bogan AA. ASEdb: a database of alanine mutations and their effects on the free energy of binding in protein interactions. Bioinformatics 2001; 17(3): 284-5.
[http://dx.doi.org/10.1093/bioinformatics/17.3.284] [PMID: 11294795]
[5]
Liu Q, Li J. Propensity vectors of low-ASA residue pairs in the distinction of protein interactions. Proteins 2010; 78(3): 589-602.
[PMID: 19768686]
[6]
Li J, Liu Q. ‘Double water exclusion’: a hypothesis refining the O-ring theory for the hot spots at protein interfaces. Bioinformatics 2009; 25(6): 743-50.
[http://dx.doi.org/10.1093/bioinformatics/btp058] [PMID: 19179356]
[7]
Saha RP, Bahadur RP, Chakrabarti P. Interresidue contacts in proteins and protein-protein interfaces and their use in characterizing the homodimeric interface. J Proteome Res 2005; 4(5): 1600-9.
[http://dx.doi.org/10.1021/pr050118k] [PMID: 16212412]
[8]
Atwell S, Ultsch M, De Vos AM, Wells JA. Structural plasticity in a remodeled protein-protein interface. Science 1997; 278(5340): 1125-8.
[http://dx.doi.org/10.1126/science.278.5340.1125] [PMID: 9353194]
[9]
Ruzhansky M, Je Cho Y, Agarwal P, Area I. Advances in Real and Complex Analysis with Applications. Springer Singapore 2017.
[http://dx.doi.org/10.1007/978-981-10-4337-6]
[10]
Agarwal P, Deni̇z S, Jain S, Alderremy AA. Shaban Aly, A new analysis of a partial differential equation arising in biology and population genetics via semi analytical techniques. Physica A: Statistical Mechanics and its Applications 542: 2020: 122769.
[11]
Kortemme T, Baker D. A simple physical model for binding energy hot spots in protein-protein complexes. Proc Natl Acad Sci USA 2002; 99(22): 14116-21.
[http://dx.doi.org/10.1073/pnas.202485799] [PMID: 12381794]
[12]
Guerois R, Nielsen JE, Serrano L. Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J Mol Biol 2002; 320(2): 369-87.
[http://dx.doi.org/10.1016/S0022-2836(02)00442-4] [PMID: 12079393]
[13]
Darnell SJ, Page D, Mitchell JC. An automated decision-tree approach to predicting protein interaction hot spots. Proteins 2007; 68(4): 813-23.
[http://dx.doi.org/10.1002/prot.21474] [PMID: 17554779]
[14]
Cho KI, Kim D, Lee D. A feature-based approach to modeling protein-protein interaction hot spots. Nucleic Acids Res 2009; 37(8): 2672-87.
[http://dx.doi.org/10.1093/nar/gkp132] [PMID: 19273533]
[15]
Xia JF, Zhao XM, Song J, Huang DS. APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility. BMC Bioinformatics 2010; 11(1): 174.
[http://dx.doi.org/10.1186/1471-2105-11-174] [PMID: 20377884]
[16]
Fischer TB, Arunachalam KV, Bailey D, et al. The binding interface database (BID): a compilation of amino acid hot spots in protein interfaces. Bioinformatics 2003; 19(11): 1453-4.
[http://dx.doi.org/10.1093/bioinformatics/btg163] [PMID: 12874065]
[17]
Chen P, Li J, Wong L, Kuwahara H, Huang JZ, Gao X. Accurate prediction of hot spot residues through physicochemical characteristics of amino acid sequences. Proteins 2013; 81(8): 1351-62.
[http://dx.doi.org/10.1002/prot.24278] [PMID: 23504705]
[18]
Pintar A, Carugo O, Pongor S. CX, an algorithm that identifies protruding atoms in proteins. Bioinformatics 2002; 18(7): 980-4.
[http://dx.doi.org/10.1093/bioinformatics/18.7.980] [PMID: 12117796]
[19]
Pintar A, Carugo O, Pongor S. DPX: for the analysis of the protein core. Bioinformatics 2003; 19(2): 313-4.
[http://dx.doi.org/10.1093/bioinformatics/19.2.313] [PMID: 12538266]
[20]
Mihel J, Sikić M, Tomić S, et al. PSAIA – Protein Archer K J, Kimes R V. Empirical characterization of random forest variable importance measures. Comput Stat Data Anal 2008; 52(4): 2249-60.
[http://dx.doi.org/10.1016/j.csda.2007.08.015]
[21]
Mihel J, Sikić M, Tomić S, Jeren B, Vlahovicek K. PSAIA - protein structure and interaction analyzer. BMC Struct Biol 2008; 8(1): 21.
[http://dx.doi.org/10.1186/1472-6807-8-21] [PMID: 18400099]
[22]
Li BQ, Hu LL, Chen L, Feng KY, Cai YD, Chou KC. Prediction of protein domain with mRMR feature selection and analysis. PLoS One 2012; 7(6) e39308.
[http://dx.doi.org/10.1371/journal.pone.0039308] [PMID: 22720092]
[23]
Niu S, Huang T, Feng K, Cai Y, Li Y. Prediction of tyrosine sulfation with mRMR feature selection and analysis. J Proteome Res 2010; 9(12): 6490-7.
[http://dx.doi.org/10.1021/pr1007152] [PMID: 20973568]
[24]
Chou KC. Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 2005; 21(1): 10-9.
[http://dx.doi.org/10.1093/bioinformatics/bth466] [PMID: 15308540]
[25]
Chen P. Limsoon Wong, Jinyan Li, Outlier detection: a challenge to clean interface data in protein hetero-complexes and the application. IEEE/ACM Trans Comput Biol Bioinformatics 2012; 9(4): 1155-65.
[26]
Hu SS, Chen P, Wang B, Li J. Protein binding hot spots prediction from sequence only by a new ensemble learning method. Amino Acids 2017; 49(10): 1773-85.
[http://dx.doi.org/10.1007/s00726-017-2474-6] [PMID: 28766075]
[27]
Zhu X, Mitchell JC. KFC2: a knowledge-based hot spot prediction method based on interface solvation, atomic density, and plasticity features. Proteins 2011; 79(9): 2671-83.
[http://dx.doi.org/10.1002/prot.23094] [PMID: 21735484]

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy