Generic placeholder image

Combinatorial Chemistry & High Throughput Screening

Editor-in-Chief

ISSN (Print): 1386-2073
ISSN (Online): 1875-5402

Research Article

A Generalized Iterative Map for Analysis of Protein Sequences

Author(s): Jiahe Huang, Qi Dai, Yuhua Yao* and Ping-An He*

Volume 25, Issue 3, 2022

Published on: 12 October, 2020

Page: [381 - 391] Pages: 11

DOI: 10.2174/1386207323666201012142318

Price: $65

Abstract

Aim and Objective: The similarities comparison of biological sequences is an important task in bioinformatics. The methods of the similarities comparison for biological sequences are divided into two classes: sequence alignment method and alignment-free method. The graphical representation of biological sequences is a kind of alignment-free method, which constitutes a tool for analyzing and visualizing the biological sequences. In this article, a generalized iterative map of protein sequences was suggested to analyze the similarities of biological sequences.

Materials and Methods: Based on the normalized physicochemical indexes of 20 amino acids, each amino acid can be mapped into a point in 5D space. A generalized iterative function system was introduced to outline a generalized iterative map of protein sequences, which can not only reflect various physicochemical properties of amino acids but also incorporate with different compression ratios of the component of a generalized iterative map. Several properties were proved to illustrate the advantage of the generalized iterative map. The mathematical description of the generalized iterative map was suggested to compare the similarities and dissimilarities of protein sequences. Based on this method, similarities/dissimilarities were compared among ND5 protein sequences, as well as ND6 protein sequences of ten different species.

Results: By correlation analysis, the ClustalW results were compared with our similarity/dissimilarity results and other graphical representation results to show the utility of our approach. The comparison results show that our approach has better correlations with ClustalW for all species than other approaches and illustrate the effectiveness of our approach.

Conclusion: Two examples show that our method not only has good performances and effects in the similarity/dissimilarity analysis of protein sequences but also does not require complex computation.

Keywords: Graphical representation, protein sequence, generalized iterative function system, similarity, phylogenetic tree, ClustalW

Graphical Abstract

[1]
Zielezinski, A.; Girgis, H.Z.; Bernard, G.; Leimeister, C.A.; Tang, K.; Dencker, T.; Lau, A.K.; Röhling, S.; Choi, J.J.; Waterman, M.S.; Comin, M.; Kim, S.H.; Vinga, S.; Almeida, J.S.; Chan, C.X.; James, B.T.; Sun, F.; Morgenstern, B.; Karlowski, W.M. Benchmarking of alignment-free sequence comparison methods. Genome Biol., 2019, 20(1), 144.
[http://dx.doi.org/10.1186/s13059-019-1755-7] [PMID: 31345254]
[2]
Ren, J.; Bai, X.; Lu, Y.Y.; Tang, K.; Wang, Y.; Reinert, G.; Sun, F. Alignment-free sequence analysis and applications. Annu. Rev. Biomed. Data Sci., 2018, 1, 93-114.
[http://dx.doi.org/10.1146/annurev-biodatasci-080917-013431] [PMID: 31828235]
[3]
Jin, X.; Jiang, Q.; Chen, Y.; Lee, S.J.; Nie, R.; Yao, S.; Zhou, D.; He, K. Similarity/dissimilarity calculation methods of DNA sequences: A survey. J. Mol. Graph. Model., 2017, 76, 342-355.
[http://dx.doi.org/10.1016/j.jmgm.2017.07.019] [PMID: 28763687]
[4]
Almeida, J.S. Sequence analysis by iterated maps, a review. Brief. Bioinform., 2014, 15(3), 369-375.
[http://dx.doi.org/10.1093/bib/bbt072] [PMID: 24162172]
[5]
Song, K.; Ren, J.; Reinert, G.; Deng, M.; Waterman, M.S.; Sun, F. New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing. Brief. Bioinform., 2014, 15(3), 343-353.
[http://dx.doi.org/10.1093/bib/bbt067] [PMID: 24064230]
[6]
Hamori, E.; Ruskin, J. H curves, a novel method of representation of nucleotide series especially suited for long DNA sequences. J. Biol. Chem., 1983, 258(2), 1318-1327.
[PMID: 6822501]
[7]
Zhang, C.T.; Zhang, R. Analysis of distribution of bases in the coding sequences by a diagrammatic technique. Nucleic Acids Res., 1991, 19(22), 6313-6317.
[http://dx.doi.org/10.1093/nar/19.22.6313] [PMID: 1956790]
[8]
Jeffrey, H.J. Chaos game visualization of sequences. Comput. Graph., 1992, 16(1), 25-33.
[http://dx.doi.org/10.1016/0097-8493(92)90067-6]
[9]
Wu, D.; Roberge, J.; Cork, D.; Gia, B.; Grace, T. Computer visualization of long genomic sequences. IEEE Conference on Visualization, 1993, pp. 308-315.
[10]
Nandy, A. A new graphical representation and analysis of DNA sequence structure: I. Methodology and application to globin genes. Curr. Sci., 1994, 66(4), 309-314.
[11]
Randić, M.; Vracko, M.; Nandy, A.; Basak, S.C. On 3-D graphical representation of DNA primary sequences and their numerical characterization. J. Chem. Inf. Comput. Sci., 2000, 40(5), 1235-1244.
[http://dx.doi.org/10.1021/ci000034q] [PMID: 11045819]
[12]
Zhang, Y.; Liao, B.; Ding, K. On 2D graphical representation of DNA sequence of nondegeneracy. Chem. Phys. Lett., 2005, 411(1-3), 28-32.
[http://dx.doi.org/10.1016/j.cplett.2005.06.005]
[13]
Wąż, P.; Bielińska-Wąż, D.; Nandy, A. Descriptors of 2D-dynamic graphs as a classification tool of DNA sequences. J. Math. Chem., 2014, 52(1), 132-140.
[http://dx.doi.org/10.1007/s10910-013-0249-1] [PMID: 32214592]
[14]
Bielińska-Wąż, D.; Wąż, P. Spectral-dynamic representation of DNA sequences. J. Biomed. Inform., 2017, 72, 1-7.
[http://dx.doi.org/10.1016/j.jbi.2017.06.001] [PMID: 28587890]
[15]
Randić, M. 2-D graphical representation of proteins based on virtual genetic code. SAR QSAR Environ. Res., 2004, 15(3), 147-157.
[http://dx.doi.org/10.1080/10629360410001697744] [PMID: 15293543]
[16]
Randić, M. 2-D graphical representation of proteins based on physicochemical properties of amino acids. Chem. Phys. Lett., 2007, 444(1-3), 176-180.
[http://dx.doi.org/10.1016/j.cplett.2007.06.114]
[17]
Wen, J.; Zhang, Y.Y. A 2D graphical representation of protein sequence and its numerical characterization. Chem. Phys. Lett., 2009, 476(4-6), 281-286.
[http://dx.doi.org/10.1016/j.cplett.2009.06.017]
[18]
Moheb, I.; Matty, A. Mervat. M.; Elkhier, A.; Marwa, A.; Elwahaab, A. 3D graphical representation of protein sequences and their statistical characterization. Physica A, 2010, 389(21), 4668-4676.
[http://dx.doi.org/10.1016/j.physa.2010.06.031]
[19]
He, P.A.; Yang, J.L.; Wang, J. A novel descriptor for protein similarity analysis. MATCH Commun. Math. Comput. Chem., 2011, 65, 445-458.
[20]
Yao, Y.H.; Dai, Q.; Li, L.; Nan, X.Y.; He, P.A.; Zhang, Y.Z. Similarity/dissimilarity studies of protein sequences based on a new 2D graphical representation. J. Comput. Chem., 2010, 31(5), 1045-1052.
[PMID: 19777597]
[21]
Zhao, Y.; Li, X.; Qi, Z. Novel 2D graphic representation of protein sequence and its application. J. Fiber Bioengg. Informatics, 2014, 7(1), 23-33.
[http://dx.doi.org/10.3993/jfbi03201403]
[22]
He, P.A.; Li, D.; Zhang, Y.; Wang, X.; Yao, Y. A 3D graphical representation of protein sequences based on the Gray code. J. Theor. Biol., 2012, 304(7), 81-87.
[http://dx.doi.org/10.1016/j.jtbi.2012.03.023] [PMID: 22554947]
[23]
Liu, Y.X.; Li, D.; Lu, K.B.; Jiao, Y.D.; He, P.A. P-H curve, a graphical representation of protein sequences for similarities analysis. MATCH Commun. Math. Comput. Chem., 2013, 70(1), 451-466.
[24]
Ma, T.T.; Liu, Y.X.; Dai, Q.; Yao, Y.H.; He, P.A. A graphical representation of protein sequences based on a novel iterated function system. Physica A, 2014, 403(1), 21-28.
[http://dx.doi.org/10.1016/j.physa.2014.01.067]
[25]
Randić, M.; Zupan, J.; Balaban, A.T.; Vikić-Topić, D.; Plavsić, D. Graphical representation of proteins. Chem. Rev., 2011, 111(2), 790-862.
[http://dx.doi.org/10.1021/cr800198j] [PMID: 20939561]
[26]
Stafford, R.L.; Zimmerman, E.S.; Hallam, T.J.; Sato, A.K. A general sequence processing and analysis program for protein engineering. J. Chem. Inf. Model., 2014, 54(10), 3020-3032.
[http://dx.doi.org/10.1021/ci500362s] [PMID: 25243907]
[27]
Qi, Z.H.; Jin, M.Z.; Li, S.L.; Feng, J. A protein mapping method based on physicochemical properties and dimension reduction. Comput. Biol. Med., 2015, 57, 1-7.
[http://dx.doi.org/10.1016/j.compbiomed.2014.11.012] [PMID: 25486446]
[28]
Hou, W.; Pan, Q.; He, M. A new graphical representation of protein sequences and its applications. Physica A, 2016, 444(C), 996-1002.
[http://dx.doi.org/10.1016/j.physa.2015.10.067]
[29]
He, P.A.; Xu, S.N.; Dai, Q.; Yao, Y.H. A generalization of CGR representation for analyzing and comparing protein sequences. Int. J. Quantum Chem., 2016, 116(6), 476-482.
[http://dx.doi.org/10.1002/qua.25068]
[30]
Czerniecka, A.; Bielińska-Wąż, D.; Wąż, P.; Clark, T. 20D-dynamic representation of protein sequences. Genomics, 2016, 107(1), 16-23.
[http://dx.doi.org/10.1016/j.ygeno.2015.12.003] [PMID: 26705741]
[31]
Zhu, X.Y.; Ping, P.Y.; Qiu, Y.Z.; Wang, L. Similarities/dissimilarities analysis of protein sequences based on the appearance model. J. Comput. Theor. Nanosci., 2017, 14(3), 1449-1460.
[http://dx.doi.org/10.1166/jctn.2017.5684]
[32]
Hu, H.; Li, Z.; Dong, H.; Zhou, T. Graphical representation and similarity analysis of protein sequences based on fractal interpolation. IEEE/ACM Trans. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 2017, 14(1), 182-192.
[http://dx.doi.org/10.1109/TCBB.2015.2511731] [PMID: 26731773]
[33]
Wu, C.; Gao, R.; De Marinis, Y.; Zhang, Y. A novel model for protein sequence similarity analysis based on spectral radius. J. Theor. Biol., 2018, 446, 61-70.
[http://dx.doi.org/10.1016/j.jtbi.2018.03.001] [PMID: 29524440]
[34]
Qi, Z.H.; Li, K.C.; Ma, J.L.; Yao, Y.H. Liu, L.Y. Novel method of 3-dimensional graphical representation for proteins and its application. Evol. Bioinform., 2018, 14, 1-8.
[http://dx.doi.org/10.1177/1176934318777755]
[35]
Mu, Z.; Yu, T.; Qi, E.; Liu, J.; Li, G. DCGR: feature extractions from protein sequences based on CGR via remodeling multiple information. BMC Bioinformatics, 2019, 20(1), 351.
[http://dx.doi.org/10.1186/s12859-019-2943-x] [PMID: 31221087]
[36]
Gatenbee, C.D.; Schenck, R.O.; Bravo, R.R.; Anderson, A.R.A. EvoFreq: visualization of the Evolutionary Frequencies of sequence and model data. BMC Bioinformatics, 2019, 20(1), 710.
[http://dx.doi.org/10.1186/s12859-019-3173-y] [PMID: 31842729]
[37]
Kawashima, S.; Pokarowski, P.; Pokarowska, M.; Kolinski, A.; Katayama, T.; Kanehisa, M. AAindex: amino acid index database, progress report 2008. Nucleic Acids Res., 2008, 36(Database issue), D202-D205.
[PMID: 17998252]

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy