Abstract
Mainly based on pKa (NH3+) values of amino acid, a novel graphical method without degeneracy for protein sequences has been proposed firstly, which assists in viewing, aligning and comparing multiple sequences visually. Then, a new algorithm to extract a 40-dimensional numerical vector from graphical curves has been presented to characterize protein sequences. The similar relationship among sequences is computed by Euclidean distance on corresponding numerical vectors. Finally, our method is applied for similarity analysis of protein sequences on two data sets. The results are in agreement with the acknowledged view proved by a great deal of evidence from anatomy and hence demonstrate the validity of this approach.
Keywords: Graphical representation, numerical characterization, phylogenetic tree, protein sequence, similarity analysis.