Abstract
Background: Gram-negative bacteria interact with their environment by secreting a wide range of particular substrates (such as proteins) across two lipid bilayers from the cytoplasm to the extracellular space. Determining the types of secreted proteins is beneficial for further research on secreted proteins and secretion systems.
Objective: As an essential alternative for experimental methods, an accurate machine learningbased multi-type Gram-negative bacterial secreted protein prediction method was proposed in this study.
Methods: The main contribution is combining auto-cross-correlation analysis and feature ranking technology to build an effective support vector machine-based multi-type Gram-negative bacterial secreted protein predictor. The specifically designed auto-cross-correlation descriptor can capture evolutionary correlation information between amino acid pairs along protein sequence from position specific scoring matrices. Feature ranking technique was used to analyze and select the most informative features for building prediction model.
Results: Several kinds of prediction accuracies obtained by independent dataset test are reported on two benchmark datasets. Compared with the state-of-the-art prediction methods, the proposed method improves overall accuracies by 2.91% and 2.25%, respectively.
Conclusion: Our study will provide an important guide to utilize protein evolutionary information for further research on bacterial secreted proteins.
Keywords: Gram-negative bacteria, secreted proteins, position specific scoring matrix, auto-cross correlation, feature ranking, support vector machine.
Graphical Abstract
[http://dx.doi.org/10.1038/nrmicro3456] [PMID: 25978706]
[http://dx.doi.org/10.1016/j.tim.2009.01.004] [PMID: 19299134]
[http://dx.doi.org/10.1016/j.compbiomed.2013.06.001] [PMID: 23930811]
[http://dx.doi.org/10.1039/C5AN00311C] [PMID: 25800819]
[PMID: 27777222]
[http://dx.doi.org/10.1093/bioinformatics/bty155] [PMID: 29547915]
[http://dx.doi.org/10.1371/journal.ppat.1000376] [PMID: 19390696]
[http://dx.doi.org/10.1371/journal.pone.0056632] [PMID: 23437191]
[http://dx.doi.org/10.1093/bioinformatics/btr021] [PMID: 21233168]
[http://dx.doi.org/10.1504/IJDMB.2014.064894] [PMID: 25946888]
[http://dx.doi.org/10.1111/j.1365-2958.2005.04823.x] [PMID: 16164564]
[http://dx.doi.org/10.1093/bioinformatics/btt554] [PMID: 24064423]
[http://dx.doi.org/10.4236/ns.2018.105018]
[http://dx.doi.org/10.1080/1062936X.2019.1573438] [PMID: 30739484]
[http://dx.doi.org/10.1016/S0968-0004(98)01298-5] [PMID: 9852764]
[http://dx.doi.org/10.1093/bioinformatics/bts565] [PMID: 23060610]
[http://dx.doi.org/10.1016/j.jtbi.2010.12.024] [PMID: 21168420]
[http://dx.doi.org/10.1093/bioinformatics/btx302] [PMID: 28903538]
[http://dx.doi.org/10.2174/1570178614666170511165837]
[http://dx.doi.org/10.1016/j.jtbi.2014.04.008] [PMID: 24735902]
[http://dx.doi.org/10.1093/bioinformatics/btp500] [PMID: 19706744]
[PMID: 28039166]
[http://dx.doi.org/10.1093/biomet/37.1-2.17] [PMID: 15420245]
[http://dx.doi.org/10.1093/nar/gkr284]
[http://dx.doi.org/10.1093/bioinformatics/btm344] [PMID: 17720704]
[http://dx.doi.org/10.1155/2013/567529] [PMID: 24062796]
[http://dx.doi.org/10.1155/2013/530696] [PMID: 23762187]
[http://dx.doi.org/10.1007/BF00994018]
[http://dx.doi.org/10.1016/j.ab.2018.09.002] [PMID: 30201554]
[http://dx.doi.org/10.1016/j.omtn.2018.03.012] [PMID: 29858081]
[http://dx.doi.org/10.1016/j.jtbi.2017.12.025] [PMID: 29305179]
[http://dx.doi.org/10.1016/j.jtbi.2013.11.021] [PMID: 24316044]
[http://dx.doi.org/10.1145/1961189.1961199]
[http://dx.doi.org/10.1016/j.artmed.2017.03.001] [PMID: 28320624]
[http://dx.doi.org/10.1016/j.artmed.2017.02.005] [PMID: 28245947]
[http://dx.doi.org/10.1093/bioinformatics/bty943] [PMID: 30428009]
[http://dx.doi.org/10.1093/bioinformatics/bty827] [PMID: 30247625]
[http://dx.doi.org/10.1093/bioinformatics/btz015] [PMID: 30624619]