Abstract
The early determination of family for a newly found enzyme molecule becomes important because it is directly related to the detail information about which specific target it acts on, as well as to its catalytic process and biological function. Unfortunately, it is still a hard work to distinguish enzyme classes by experiments. With an enormous amount of protein sequences uncovered in the genome research, it is both challenging and indispensable to develop an automatic method for fast and reliably classifying the enzyme family. Using the concept of Chous pseudo amino acid composition, we developed a new method that coupled discrete wavelet transform with support vector machine based on the amino acid hydrophobicity to predict enzyme family. The overall success rate obtained by the 10-cross-validation for the identification of the six enzyme families was 91.9%, indicating the current method could be an effective and promising highthroughput method in the enzyme research.
Keywords: Discrete wavelet transform, Support vector machine, Enzyme family classes, Hydrophobicity, 10-cross-validation, Chou's pseudo amino acid composition