Abstract
Transcription factors (TF) are proteins that control the first step of gene expression, the transcription of DNA into RNA sequences. The mechanism of transcriptional regulatory can be much better understood if the category of transcription factors is known. We developed a new method for predicting the classification of transcription factors by incorporating their binding site properties into a novel mode of Chou's pseudo amino acid composition. The properties include the length of TFBSs for a TF, a new_PWM value, the proportion of not conservative TFBSs, the proportion of nonucleosome of TFBSs, the proportion of conserved-nucleosome of TFBSs, and the GC content of TFBSs. We construct a vector with these properties to represent a TF. Then the vectors which stand for TFs were classified with SVMs. The high accuracy obtained shows that these properties are of great significance for a TF.
Keywords: Transcription factor, classification, nucleosome, TFBS, SVM