Abstract
The function of the protein is closely correlated with its subcellular localization. Probing into the mechanism of protein sorting and predicting protein subcellular location can provide important clues or insights for understanding the function of proteins. In this paper, we introduce a new PseAAC approach to encode the protein sequence based on the physicochemical properties of amino acid residues. Each of the protein samples was defined as a 146D (dimensional) vector including the 20 amino acid composition components and 126 adjacent triune residues contents. To evaluate the effectiveness of this encoding scheme, we did jackknife tests on three datasets using the support vector machine algorithm. The total prediction accuracies are 84.9%, 91.2%, and 92.6%, respectively. The satisfactory results indicate that our method could be a useful tool in the area of bioinformatics and proteomics.
Keywords: Pseudo amino acid composition, tripeptide composition, support vector machine, protein function, PseAAC, jackknife tests, bioinformatics, proteomics, PSORT, ID_SVM method, Combined feature vector, AAC, hydrophobic, Cross-validation testPseudo amino acid composition, tripeptide composition, support vector machine, protein function, PseAAC, jackknife tests, bioinformatics, proteomics, PSORT, ID_SVM method, Combined feature vector, AAC, hydrophobic, Cross-validation test