Generic placeholder image

Protein & Peptide Letters

Editor-in-Chief

ISSN (Print): 0929-8665
ISSN (Online): 1875-5305

Predicting Protein Solubility with a Hybrid Approach by Pseudo Amino Acid Composition

Author(s): Niu Xiaohui, Li Nana, Shi Feng, Hu Xuehai, Xia Jingbo and Xiong Huijuan

Volume 17, Issue 12, 2010

Page: [1466 - 1472] Pages: 7

DOI: 10.2174/0929866511009011466

Price: $65

Abstract

Protein solubility plays a major role for understanding the crystal growth and crystallization process of protein. How to predict the propensity of a protein to be soluble or to form inclusion body is a long but not fairly resolved problem. After choosing almost 10,000 protein sequences from NCBI database and eliminating the sequences with 90% homologous similarity by CD-HIT, 5692 sequences remained. By using Chous pseudo amino acid composition features, we predict the soluble protein with the three methods: support vector machine (SVM), back propagation neural network (BP Neural Network) and hybrid method based on SVM and BP Neural Network, respectively. Each method is evaluated by the re-substitution test and 10-fold cross-validation test. In the re-substitution test, the BP Neural Network performs with the best results, in which the accuracy achieves 92.88% and Matthews Correlation Coefficient (MCC) achieves 0.8513. Meanwhile, the other two methods are better than BP Neural Network in 10-fold cross-validation test. The hybrid method based on SVM and BP Neural Network is the best. The average accuracy is 86.78% and average MCC is 0.7233. Although all of the three methods achieve considerable evaluations, the hybrid method is deemed to be the best, according to the performance comparison.

Keywords: Amino acid composition, neural network, hybrid approach, prediction, protein solubility, support vector machine, NCBI database, Chou's pseudo amino acid, CD-HIT, back propagation neural network, hybrid method, Matthews Correlation Coefficient, Escherichia Coli, Arg residues, cysteine fraction, proline fraction, GalNAc-transferase, serine hydrolases, human papillomaviruses, DNA-binding proteins, Isoleucine, Leucine, Valine, methionine, Arginine, Lysine, Aspartic acid, Glutamic acid, Asparagine, Glutamine, Histidine, Serine, Threonine, Proline, Alanine, Glycine, Cysteine, Phenylalanine, Artificial Neural Network, jackknife test, cross validation test


Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy