Abstract
To fix the large and expanding gap between sequence known proteins and structure known proteins, it is important to study on protein structural class prediction (PSCP) for its foundation and usefulness in protein structure analysis. In this paper, the d-interval conditional probability index was proposed to reflect the long-term correlation between amino acids. Based on this index, the impact of residues long-term relationship on PSCP was analyzed. Two new information theory based algorithms were proposed and were used combining with the long-term information between residues to predict protein structural class (PSC). The dataset 5714 was tested for its low sequence similarity and high reliability. The result showed that the new index was 3-6% higher than traditional index by use of the same algorithms, and the PSCP accuracy was 4-10% improved using the new algorithms. The presented index, algorithms and the long-term relationship of residues on PSCP can be extensively applied in other sequence based protein structure analysis.
Keywords: d-interval conditional probability, Protein structural class, long-term relationship, information content, information entropy