Abstract
Aim and Objective: The similarities comparison of biological sequences is an important task in bioinformatics. The methods of the similarities comparison for biological sequences are divided into two classes: sequence alignment method and alignment-free method. The graphical representation of biological sequences is a kind of alignment-free method, which constitutes a tool for analyzing and visualizing the biological sequences. In this article, a generalized iterative map of protein sequences was suggested to analyze the similarities of biological sequences.
Materials and Methods: Based on the normalized physicochemical indexes of 20 amino acids, each amino acid can be mapped into a point in 5D space. A generalized iterative function system was introduced to outline a generalized iterative map of protein sequences, which can not only reflect various physicochemical properties of amino acids but also incorporate with different compression ratios of the component of a generalized iterative map. Several properties were proved to illustrate the advantage of the generalized iterative map. The mathematical description of the generalized iterative map was suggested to compare the similarities and dissimilarities of protein sequences. Based on this method, similarities/dissimilarities were compared among ND5 protein sequences, as well as ND6 protein sequences of ten different species.
Results: By correlation analysis, the ClustalW results were compared with our similarity/dissimilarity results and other graphical representation results to show the utility of our approach. The comparison results show that our approach has better correlations with ClustalW for all species than other approaches and illustrate the effectiveness of our approach.
Conclusion: Two examples show that our method not only has good performances and effects in the similarity/dissimilarity analysis of protein sequences but also does not require complex computation.
Keywords: Graphical representation, protein sequence, generalized iterative function system, similarity, phylogenetic tree, ClustalW
Graphical Abstract
[http://dx.doi.org/10.1186/s13059-019-1755-7] [PMID: 31345254]
[http://dx.doi.org/10.1146/annurev-biodatasci-080917-013431] [PMID: 31828235]
[http://dx.doi.org/10.1016/j.jmgm.2017.07.019] [PMID: 28763687]
[http://dx.doi.org/10.1093/bib/bbt072] [PMID: 24162172]
[http://dx.doi.org/10.1093/bib/bbt067] [PMID: 24064230]
[PMID: 6822501]
[http://dx.doi.org/10.1093/nar/19.22.6313] [PMID: 1956790]
[http://dx.doi.org/10.1016/0097-8493(92)90067-6]
[http://dx.doi.org/10.1021/ci000034q] [PMID: 11045819]
[http://dx.doi.org/10.1016/j.cplett.2005.06.005]
[http://dx.doi.org/10.1007/s10910-013-0249-1] [PMID: 32214592]
[http://dx.doi.org/10.1016/j.jbi.2017.06.001] [PMID: 28587890]
[http://dx.doi.org/10.1080/10629360410001697744] [PMID: 15293543]
[http://dx.doi.org/10.1016/j.cplett.2007.06.114]
[http://dx.doi.org/10.1016/j.cplett.2009.06.017]
[http://dx.doi.org/10.1016/j.physa.2010.06.031]
[PMID: 19777597]
[http://dx.doi.org/10.3993/jfbi03201403]
[http://dx.doi.org/10.1016/j.jtbi.2012.03.023] [PMID: 22554947]
[http://dx.doi.org/10.1016/j.physa.2014.01.067]
[http://dx.doi.org/10.1021/cr800198j] [PMID: 20939561]
[http://dx.doi.org/10.1021/ci500362s] [PMID: 25243907]
[http://dx.doi.org/10.1016/j.compbiomed.2014.11.012] [PMID: 25486446]
[http://dx.doi.org/10.1016/j.physa.2015.10.067]
[http://dx.doi.org/10.1002/qua.25068]
[http://dx.doi.org/10.1016/j.ygeno.2015.12.003] [PMID: 26705741]
[http://dx.doi.org/10.1166/jctn.2017.5684]
[http://dx.doi.org/10.1109/TCBB.2015.2511731] [PMID: 26731773]
[http://dx.doi.org/10.1016/j.jtbi.2018.03.001] [PMID: 29524440]
[http://dx.doi.org/10.1177/1176934318777755]
[http://dx.doi.org/10.1186/s12859-019-2943-x] [PMID: 31221087]
[http://dx.doi.org/10.1186/s12859-019-3173-y] [PMID: 31842729]
[PMID: 17998252]