Generic placeholder image

Current Bioinformatics

Editor-in-Chief

ISSN (Print): 1574-8936
ISSN (Online): 2212-392X

Research Article

Improving Multi-type Gram-negative Bacterial Secreted Protein Prediction via Protein Evolutionary Information and Feature Ranking

Author(s): Liang Kong*, Lichao Zhang and Shiqian He

Volume 15, Issue 6, 2020

Page: [538 - 546] Pages: 9

DOI: 10.2174/1574893614666190730105629

Price: $65

Abstract

Background: Gram-negative bacteria interact with their environment by secreting a wide range of particular substrates (such as proteins) across two lipid bilayers from the cytoplasm to the extracellular space. Determining the types of secreted proteins is beneficial for further research on secreted proteins and secretion systems.

Objective: As an essential alternative for experimental methods, an accurate machine learningbased multi-type Gram-negative bacterial secreted protein prediction method was proposed in this study.

Methods: The main contribution is combining auto-cross-correlation analysis and feature ranking technology to build an effective support vector machine-based multi-type Gram-negative bacterial secreted protein predictor. The specifically designed auto-cross-correlation descriptor can capture evolutionary correlation information between amino acid pairs along protein sequence from position specific scoring matrices. Feature ranking technique was used to analyze and select the most informative features for building prediction model.

Results: Several kinds of prediction accuracies obtained by independent dataset test are reported on two benchmark datasets. Compared with the state-of-the-art prediction methods, the proposed method improves overall accuracies by 2.91% and 2.25%, respectively.

Conclusion: Our study will provide an important guide to utilize protein evolutionary information for further research on bacterial secreted proteins.

Keywords: Gram-negative bacteria, secreted proteins, position specific scoring matrix, auto-cross correlation, feature ranking, support vector machine.

Graphical Abstract

[1]
Costa TRD, Felisberto-Rodrigues C, Meir A, et al. Secretion systems in Gram-negative bacteria: structural and mechanistic insights. Nat Rev Microbiol 2015; 13(6): 343-59.
[http://dx.doi.org/10.1038/nrmicro3456] [PMID: 25978706]
[2]
Desvaux M, Hébraud M, Talon R, Henderson IR. Secretion and subcellular localizations of bacterial proteins: a semantic awareness issue. Trends Microbiol 2009; 17(4): 139-45.
[http://dx.doi.org/10.1016/j.tim.2009.01.004] [PMID: 19299134]
[3]
Yu L, Luo J, Guo Y, Li Y, Pu X, Li M. In silico identification of Gram-negative bacterial secreted proteins from primary sequence. Comput Biol Med 2013; 43(9): 1177-81.
[http://dx.doi.org/10.1016/j.compbiomed.2013.06.001] [PMID: 23930811]
[4]
Luo J, Li W, Liu Z, Guo Y, Pu X, Li M. A sequence-based two-level method for the prediction of type I secreted RTX proteins. Analyst (Lond) 2015; 140(9): 3048-56.
[http://dx.doi.org/10.1039/C5AN00311C] [PMID: 25800819]
[5]
An Y, Wang J, Li C, et al. Comprehensive assessment and performance improvement of effector protein predictors for bacterial secretion systems III, IV and VI. Brief Bioinform 2018; 19(1): 148-61.
[PMID: 27777222]
[6]
Wang J, Yang B, Leier A, et al. Bastion6: a bioinformatics approach for accurate prediction of type VI secreted effectors. Bioinformatics 2018; 34(15): 2546-55.
[http://dx.doi.org/10.1093/bioinformatics/bty155] [PMID: 29547915]
[7]
Arnold R, Brandmaier S, Kleine F, et al. Sequence-based prediction of type III secreted proteins. PLoS Pathog 2009; 5(4)e1000376
[http://dx.doi.org/10.1371/journal.ppat.1000376] [PMID: 19390696]
[8]
Dong X, Zhang YJ, Zhang Z. Using weakly conserved motifs hidden in secretion signals to identify type-III effectors from bacterial pathogen genomes. PLoS One 2013; 8(2)e56632
[http://dx.doi.org/10.1371/journal.pone.0056632] [PMID: 23437191]
[9]
Wang Y, Zhang Q, Sun MA, Guo D. High-accuracy prediction of bacterial type III secreted effectors based on position-specific amino acid composition profiles. Bioinformatics 2011; 27(6): 777-84.
[http://dx.doi.org/10.1093/bioinformatics/btr021] [PMID: 21233168]
[10]
Yang Y, Qi S. A new feature selection method for computational prediction of type III secreted effectors. Int J Data Min Bioinform 2014; 10(4): 440-54.
[http://dx.doi.org/10.1504/IJDMB.2014.064894] [PMID: 25946888]
[11]
Panina EM, Mattoo S, Griffith N, Kozak NA, Yuk MH, Miller JF. A genome-wide screen identifies a Bordetella type III secretion effector and candidate effectors in other species. Mol Microbiol 2005; 58(1): 267-79.
[http://dx.doi.org/10.1111/j.1365-2958.2005.04823.x] [PMID: 16164564]
[12]
Dong X, Lu X, Zhang Z. BEAN 2.0: an integrated web resource for the identification and functional analysis of type III secreted effectors. Database (Oxford) 2015; 2015bav064
[13]
Zou L, Nan C, Hu F. Accurate prediction of bacterial type IV secreted effectors using amino acid composition and PSSM profiles. Bioinformatics 2013; 29(24): 3135-42.
[http://dx.doi.org/10.1093/bioinformatics/btt554] [PMID: 24064423]
[14]
Yu L, Liu F, Du L, Li Y. An improved approach for rapidly identifying different types of Gram-negative bacterial secreted proteins. Nat Sci 2018; 10: 168-77.
[http://dx.doi.org/10.4236/ns.2018.105018]
[15]
Kong L, Zhang L. An ensemble method for multi-type Gram-negative bacterial secreted protein prediction by integrating different PSSM-based features. SAR QSAR Environ Res 2019; 30(3): 181-94.
[http://dx.doi.org/10.1080/1062936X.2019.1573438] [PMID: 30739484]
[16]
Altschul SF, Koonin EV. Iterated profile searches with PSI-BLAST--a tool for discovery in protein databases. Trends Biochem Sci 1998; 23(11): 444-7.
[http://dx.doi.org/10.1016/S0968-0004(98)01298-5] [PMID: 9852764]
[17]
Kampenusa I, Zikmanis P. Distinctive attributes for predicted secondary structures at terminal sequences of non-classically secreted proteins from proteobacteria. Cent Eur J Biol 2008; 3: 320-6.
[18]
Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 2012; 28(23): 3150-2.
[http://dx.doi.org/10.1093/bioinformatics/bts565] [PMID: 23060610]
[19]
Chou KC. Some remarks on protein attribute prediction and pseudo amino acid composition. J Theor Biol 2011; 273(1): 236-47.
[http://dx.doi.org/10.1016/j.jtbi.2010.12.024] [PMID: 21168420]
[20]
Wang J, Yang B, Revote J, et al. POSSUM: a bioinformatics toolkit for generating numerical sequence feature descriptors based on PSSM profiles. Bioinformatics 2017; 33(17): 2756-8.
[http://dx.doi.org/10.1093/bioinformatics/btx302] [PMID: 28903538]
[21]
Kong L, Kong L, Wang C, Jing R, Zhang L. Predicting protein structural class for low-similarity sequences via novel evolutionary modes of PseAAC and recursive feature elimination. Lett Org Chem 2017; 14: 673-83.
[http://dx.doi.org/10.2174/1570178614666170511165837]
[22]
Zhang L, Zhao X, Kong L. Predict protein structural class for low-similarity sequences by evolutionary difference information into the general form of Chou’s pseudo amino acid composition. J Theor Biol 2014; 355: 105-10.
[http://dx.doi.org/10.1016/j.jtbi.2014.04.008] [PMID: 24735902]
[23]
Dong Q, Zhou S, Guan J. A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation. Bioinformatics 2009; 25(20): 2655-62.
[http://dx.doi.org/10.1093/bioinformatics/btp500] [PMID: 19706744]
[24]
Xia J, Peng Z, Qi D, Mu H, Yang J. An ensemble approach to protein fold classification by integration of template-based assignment and support vector machine classifier. Bioinformatics 2017; 33(6): 863-70.
[PMID: 28039166]
[25]
Moran PAP. Notes on continuous stochastic phenomena. Biometrika 1950; 37(1-2): 17-23.
[http://dx.doi.org/10.1093/biomet/37.1-2.17] [PMID: 15420245]
[26]
Rao HB, Zhu F, Yang GB, Li ZR, Chen YZ. Update of PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence Nucleic Acids Res 2011; 39(Web Server issue): W385-90.
[http://dx.doi.org/10.1093/nar/gkr284]
[27]
Saeys Y, Inza I, Larrañaga P. A review of feature selection techniques in bioinformatics. Bioinformatics 2007; 23(19): 2507-17.
[http://dx.doi.org/10.1093/bioinformatics/btm344] [PMID: 17720704]
[28]
Feng PM, Lin H, Chen W. Identification of antioxidants from sequence information using naïve Bayes. Comput Math Methods Med 2013; 2013567529
[http://dx.doi.org/10.1155/2013/567529] [PMID: 24062796]
[29]
Feng PM, Ding H, Chen W, Lin H. Naïve Bayes classifier with feature selection to identify phage virion proteins. Comput Math Methods Med 2013; 2013530696
[http://dx.doi.org/10.1155/2013/530696] [PMID: 23762187]
[30]
Cortes C, Vapnik V. Support-vector networks. Mach Learn 1995; 20: 273-97.
[http://dx.doi.org/10.1007/BF00994018]
[31]
Chen W, Ding H, Zhou X, Lin H, Chou KC. iRNA(m6A)-PseDNC: Identifying N6-methyladenosine sites using pseudo dinucleotide composition. Anal Biochem 2018; 561-562: 59-65.
[http://dx.doi.org/10.1016/j.ab.2018.09.002] [PMID: 30201554]
[32]
Chen W, Feng P, Yang H, Ding H, Lin H, Chou KC. iRNA-3typea: Identifying three types of modification at RNA’s adenosine sites. Mol Ther Nucleic Acids 2018; 11: 468-74.
[http://dx.doi.org/10.1016/j.omtn.2018.03.012] [PMID: 29858081]
[33]
Zhang L, Kong L. iRSpot-ADPM: Identify recombination spots by incorporating the associated dinucleotide product model into Chou’s pseudo components. J Theor Biol 2018; 441: 1-8.
[http://dx.doi.org/10.1016/j.jtbi.2017.12.025] [PMID: 29305179]
[34]
Kong L, Zhang L, Lv J. Accurate prediction of protein structural classes by incorporating predicted secondary structure information into the general form of Chou’s pseudo amino acid composition. J Theor Biol 2014; 344: 12-8.
[http://dx.doi.org/10.1016/j.jtbi.2013.11.021] [PMID: 24316044]
[35]
Chang CC, Lin CJ. LIBSVM: A library for support vector machines. ACM Trans Intell Syst Technol 2011; 2: 389-96.
[http://dx.doi.org/10.1145/1961189.1961199]
[36]
Wei L, Xing P, Zeng J, Chen J, Su R, Guo F. Improved prediction of protein-protein interactions using novel negative samples, features, and an ensemble classifier. Artif Intell Med 2017; 83: 67-74.
[http://dx.doi.org/10.1016/j.artmed.2017.03.001] [PMID: 28320624]
[37]
Wei L, Wan S, Guo J, Wong KK. A novel hierarchical selective ensemble classifier with bioinformatics application. Artif Intell Med 2017; 83: 82-90.
[http://dx.doi.org/10.1016/j.artmed.2017.02.005] [PMID: 28245947]
[38]
Dao FY, Lv H, Wang F, et al. Identify origin of replication in Saccharomyces cerevisiae using two-step feature selection technique. Bioinformatics 2019; 35(12): 2075-83.
[http://dx.doi.org/10.1093/bioinformatics/bty943] [PMID: 30428009]
[39]
Feng CQ, Zhang ZY, Zhu XJ, et al. iTerm-PseKNC: a sequence-based tool for predicting bacterial transcriptional terminators. Bioinformatics 2019; 35(9): 1469-77.
[http://dx.doi.org/10.1093/bioinformatics/bty827] [PMID: 30247625]
[40]
Chen W, Lv H, Nie F, Lin H. i6mA-Pred: identifying DNA N6-methyladenine sites in the rice genome. Bioinformatics 2019; 35(16): 2796-800.
[http://dx.doi.org/10.1093/bioinformatics/btz015] [PMID: 30624619]

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy