Abstract
In this paper, we aim at predicting protein structural classes for low-homology data sets based on predicted secondary structures. We propose a new and simple kernel method, named as SSEAKSVM, to predict protein structural classes. The secondary structures of all protein sequences are obtained by using the tool PSIPRED and then a linear kernel on the basis of secondary structure element alignment scores is constructed for training a support vector machine classifier without parameter adjusting. Our method SSEAKSVM was evaluated on two low-homology datasets 25PDB and 1189 with sequence homology being 25% and 40%, respectively. The jackknife test is used to test and compare our method with other existing methods. The overall accuracies on these two data sets are 86.3% and 84.5%, respectively, which are higher than those obtained by other existing methods. Especially, our method achieves higher accuracies (88.1% and 88.5%) for differentiating the α + β class and the α/β class compared to other methods. This suggests that our method is valuable to predict protein structural classes particularly for low-homology protein sequences. The source code of the method in this paper can be downloaded at http://math.xtu.edu.cn/myphp/math/research/source/SSEAK_source_code.rar.
Keywords: Kernel method, low-homology, protein structural class prediction, PSIPRED, secondary structure, support vector machine.