Abstract
Proteases play vitally important regulatory roles in life cycle and represent main potential targets for medical intervention. Different types of proteases perform different functions with different biological processes. Therefore, it is highly desired to develop a fast and reliable approach of calculation for identifying the proteases types. From the protein sequence, 20 amino acids are reclassified into 9-gram encoding according to their biophysical and biochemical properties. Using 9 encoding compositions, increment of diversity and low-frequency power spectral density to extract the information of protease sequence, a support vector machine is applied to predict protease types. By the jackknife test, success rates of our algorithm are higher than other methods. In this paper, three recent patents are used.
Keywords: 9-gram encoding, increment of diversity, Jackknife test, low-frequency power spectral density, support vector machine, biochemical proprieties, Power Spectral, Increment of Diversity, Support Vector Machine, protease dataset