Abstract
Protein folding is one of the most important problems in molecular biology. The kinetic order of protein folding is one of the main aspects of the folding process. Previous methods for predicting protein folding kinetic order require to use the information on tertiary or predicted secondary structure of a protein. In this paper, based on physicochemical properties of amino acids, we propose an approach to predict the protein folding kinetic order from the primary structure of a protein using support vector machine combined with principal component analysis. The horizontal visibility network, Hilbert-Huang transform, global descriptor, and Lempel-Ziv complexity are used to extract features in our approach. To evaluate our approach, the leave-one-out cross-validation test is employed on two widely-used data sets (“IvankovData” and “ZhengData” data sets) consisting of two-state and multi-state proteins. The overall accuracies of prediction can reach 83.87% for “IvankovData” data set and 85% for “ZhengData” data set respectively. Comparisons with the existing methods show that the present approach performs better on the “IvankovData” data set. These results indicate that the present approach is effective and valuable for predicting protein folding kinetic order. Based on factor analysis, we find that the length of protein sequence, hydrophobicity and hydrophilicity of amino acids are important features in our approach.
Graphical Abstract