Abstract
In order to predict enzyme subclasses, this paper builds a new enzyme database in term of previous ideas and methods. Based on protein sequence, by selecting increment of diversity value, low-frequency of power spectral density, matrix scoring values and motif frequency as characteristic parameters to describe the sequence information, a Random Forest algorithm for predicting enzyme subclass is proposed. Using the Jack-knife test, the overall success rate identifying the 18 subclasses of oxidoreductases, the 8 subclasses of transferases, the 5 subclasses of hydrolases, the 6 subclasses of lyases, the 6 subclasses of isomerases, and the 6 subclasses of ligases are 90.86%, 95.24%, 96.42%, 98.60%, 97.53% and 98.03%. Furthermore, the same way is used to the previous database, the better results are obtained.
Keywords: Enzyme subclasses, Increment of diversity value, matrix scoring function value, motif, prediction, random forest algorithm.