Abstract
Protein secondary structure carries information about local structural arrangements. Significant majority of successful methods for predicting the secondary structure is based on multiple sequence alignment. However, the multiple alignment fails to achieve accurate results when a protein sequence is characterized by low homology. To this end, we propose a novel method for prediction of secondary structure content through comprehensive sequence representation. The method is featured by employing a support vector machine (SVM) regressing system and adopting a different pseudo amino acid composition (PseAAC), which can partially take into account the sequence-order effects to represent protein samples. It was shown by both the self-consistency test and the independent-dataset test that the trained SVM has remarkable power in grasping the relationship between the PseAAC and the content of protein secondary structural elements, including α-helix, 310-helix, π-helix, β-strand, β- bridge, turn, bend and the rest random coil. Results prior to or competitive with the popular methods have been obtained, which indicate that the present method may at least serve as an alternative to the existing predictors in this area.
Keywords: Pseudo Amino acid composition, support vector machine, protein secondary structure content, prediction.