Abstract
To facilitate the similarities/dissimilarities analysis of the protein sequences, we introduce a novel approach based on the recurrence quantification analysis (RQA), in which, based on a selected pair of physicochemical properties of amino acids, the primary structure of proteins is considered as two time series, with the amino acid order playing the role of subsequent time intervals, and then, we adopt RQA to analyze these two time series, and utilize 6 characteristic parameters calculated with RQA as feature representation of protein sequence to analyze the similarities/dissimilarities of the nine ND5 protein sequences. The analysis results show that our method is effective, and in addition, different from existing RQA based methods, in our method, after the two parameters such as the embedding dimension m and the time delay τ have been predetermined based on given algorithms, the range of the threshold ε can be determined efficiently, and more interesting, the variation of ε in the determined range almost will not influence the rationality of the results of the similarities/dissimilarities analysis.
Keywords: Characteristic parameter, protein sequence, recurrence quantification analysis, similarity/dissimilarity analysis, time series.
Graphical Abstract