Generic placeholder image

Combinatorial Chemistry & High Throughput Screening

Editor-in-Chief

ISSN (Print): 1386-2073
ISSN (Online): 1875-5402

Research Article

Applications of 2D and 3D-Dynamic Representations of DNA/RNA Sequences for a Description of Genome Sequences of Viruses

Author(s): Dorota Bielińska-Wąż, Piotr Wąż* and Damian Panas

Volume 25, Issue 3, 2022

Published on: 04 August, 2021

Page: [429 - 438] Pages: 10

DOI: 10.2174/1386207324666210804120454

Price: $65

Abstract

The aim of the studies is to show that graphical bioinformatics methods are good tools for the description of genome sequences of viruses. A new approach to the identification of unknown virus strains, is proposed.

Methods: Biological sequences have been represented graphically through 2D and 3D-Dynamic Representations of DNA/RNA Sequences - theoretical methods for the graphical representation of the sequences developed by us previously. In these approaches, some ideas of the classical dynamics have been introduced to bioinformatics. The sequences are represented by sets of material points in 2D or 3D spaces. The distribution of the points in space is characteristic of the sequence. The numerical parameters (descriptors) characterizing the sequences correspond to the quantities typical of classical dynamics.

Results: Some applications of the theoretical methods have been briefly reviewed. 2D-dynamic graphs representing the complete genome sequences of SARS-CoV-2 are shown.

Conclusion: It is proved that the 3D-Dynamic Representation of DNA/RNA Sequences, coupled with the random forest algorithm, classifies successfully the subtypes of influenza A virus strains.

Keywords: Graphical bioinformatics, 2D and 3D-Dynamic Representations of DNA/RNA Sequences, supervised learning, machine learning, random forest, Boruta algorithm.

Graphical Abstract

[1]
Hamori, E. Novel DNA sequence representations. Nature, 1985, 314(6012), 585-586.
[http://dx.doi.org/10.1038/314585a0] [PMID: 3990794]
[2]
Gates, M.A. Simpler DNA sequence representations. Nature, 1985, 316(6025), 219.
[http://dx.doi.org/10.1038/316219a0] [PMID: 3927167]
[3]
Nandy, A. A new graphical representation and analysis of DNA sequence structure: I. Methodology and application to globin genes. Curr. Sci., 1994, 66, 309-314.
[4]
Leong, P.M.; Morgenthaler, S. Random walk and gap plots of DNA sequences. Comput. Appl. Biosci., 1995, 11(5), 503-507.
[http://dx.doi.org/10.1093/bioinformatics/11.5.503] [PMID: 8590173]
[5]
Randić, M.; Novič, M.; Plavšić, D. Milestones in graphical bioinformatics. Int. J. Quantum Chem., 2013, 113, 2413-2446.
[http://dx.doi.org/10.1002/qua.24479]
[6]
Aram, V.; Iranmanesh, A.; Majid, Z. Spider representation of DNA sequences. J. Comput. Theor. Nanosci., 2014, 11, 418-420.
[http://dx.doi.org/10.1166/jctn.2014.3371]
[7]
Bielińska-Wąż, D.; Wąż, P. Spectral-dynamic representation of DNA sequences. J. Biomed. Inform., 2017, 72, 1-7.
[http://dx.doi.org/10.1016/j.jbi.2017.06.001] [PMID: 28587890]
[8]
Tan, C.J.; Li, S.S.; Zhu, P. 4D Graphical representation research of DNA sequences. Int. J. Biomath., 2015, 81550004
[http://dx.doi.org/10.1142/S1793524515500047]
[9]
Li, Y.; Liu, Q.; Zheng, X. DUC-Curve, a highly compact 2D graphical representation of DNA sequences and its application in sequence alignment. Physica A, 2016, 456, 256-270.
[http://dx.doi.org/10.1016/j.physa.2016.03.061]
[10]
Hu, H.; Li, Z.; Dong, H.; Zhou, T. Graphical Representation and Similarity Analysis of Protein Sequences Based on Fractal Interpolation. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 2017, 14(1), 182-192.
[http://dx.doi.org/10.1109/TCBB.2015.2511731] [PMID: 26731773]
[11]
Mahmoodi-Reihani, M.; Abbasitabar, F.; Zare-Shahabadi, V. A novel graphical representation and similarity analysis of protein sequences based on physicochemical properties. Physica A, 2018, 510, 477-485.
[http://dx.doi.org/10.1016/j.physa.2018.07.011]
[12]
Xie, G.S.; Jin, X.B.; Yang, C.; Pu, J.; Mo, Z. Graphical representation and similarity analysis of DNA sequences based on trigonometric functions. Acta Biotheor., 2018, 66(2), 113-133.
[http://dx.doi.org/10.1007/s10441-018-9324-0] [PMID: 29675730]
[13]
Mo, Z.; Zhu, W.; Sun, Y.; Xiang, Q.; Zheng, M.; Chen, M.; Li, Z. One novel representation of DNA sequence based on the global and local position information. Sci. Rep., 2018, 8(1), 7592.
[http://dx.doi.org/10.1038/s41598-018-26005-3] [PMID: 29765099]
[14]
Liu, H.L. 2D graphical representation of DNA sequence based on horizon lines from a probabilistic view. Biosci. J., 2018, 34, 744-750.
[http://dx.doi.org/10.14393/BJ-v34n3a2018-39932]
[15]
Wu, R.X.; Liu, W.J.; Mao, Y.Y.; Zheng, J. 2D graphical representation of DNA sequences based on variant map. IEEE Access, 2020, 8, 173755-173765.
[http://dx.doi.org/10.1109/ACCESS.2020.3025591]
[16]
Mizuta, S. Graphical Representation of Biological Sequences.Bioinformatics in the Era of Post Genomics and Big Data; Abdurakhmonov, I.Y., Ed.; IntechOpen, 2018.
[http://dx.doi.org/10.5772/intechopen.74795]
[17]
Bielińska-Wąż, D.; Clark, T.; Wąż, P.; Nowak, W.; Nandy, A. 2D-dynamic representation of DNA sequences. Chem. Phys. Lett., 2007, 442, 140-144.
[http://dx.doi.org/10.1016/j.cplett.2007.05.050]
[18]
Wąż, P.; Bielińska-Wąż, D. 3D-dynamic representation of DNA sequences. J. Mol. Model., 2014, 20(3), 2141.
[http://dx.doi.org/10.1007/s00894-014-2141-8] [PMID: 24567158]
[19]
Wąż, P.; Bielińska-Wąż, D. Non-standard similarity/dissimilarity analysis of DNA sequences. Genomics, 2014, 104(6 Pt B), 464-471.
[http://dx.doi.org/10.1016/j.ygeno.2014.08.010] [PMID: 25173573]
[20]
Breiman, L. Random Forests. Mach. Learn., 2001, 45, 5-32.
[http://dx.doi.org/10.1023/A:1010933404324]
[21]
Panas, D.; Wąż, P.; Bielińska-Wąż, D.; Nandy, A.; Basak, S.C. 2D-Dynamic Representation of DNA/RNA sequences as a characterization tool of the zika virus genome. MATCH. Commun. Math. Comput. Chem., 2017, 77, 321-332.
[22]
Panas, D.; Wąż, P.; Bielińska-Wąż, D.; Nandy, A.; Basak, S.C. An application of the 2D-dynamic representation of DNA/RNA sequences to the prediction of influenza A virus subtypes. MATCH Commun. Math. Comput. Chem., 2018, 80, 295-310.
[23]
Bielińska-Wąż, D.; Panas, D.; Wąż, P. Dynamic representations of biological sequences. MATCH Commun. Math. Comput. Chem., 2019, 82, 205-218.
[24]
Kursa, M.B.; Rudnicki, W.R. Feature selection with the boruta package. J. Stat. Softw., 2011, 36, 1-13.
[25]
Rudnicki, W.R.; Kierczak, M.; Koronacki, J.; Komorowski, J. A statistical method for determining importance of variables in an information system. In: Rough Sets and Current Trends in Computing, RSCTC 2006; Greco, Y.; Hata, S.; Hirano, M.; Inuiguchi, S.; Miyamoto, H.S.; Nguyen, H.S.; Słowiński, S., Eds.; Lecture Notes in Computer Science: Springer, 2006, 4259, pp. 557-566.
[26]
Liaw, A.; Wiener, M. Classification and regression by random forest. R News, 2002, 2, 18-22.
[27]
Bielińska-Wąż, D.; Wąż, P. Non-standard bioinformatics characterization of SARS-CoV-2. Comput. Biol. Med., 2021, 131104247
[http://dx.doi.org/10.1016/j.compbiomed.2021.104247] [PMID: 33611129]
[28]
Vračko, M.; Basak, S.C.; Dey, T.; Nandy, A. Cluster analysis of coronavirus sequences using computational sequence descriptors: With applications to SARS, MERS and SARS-CoV-2 (CoVID- 19). Curr Comput Aided Drug Des 2021.
[http://dx.doi.org/10.2174/1573409917666210202092646] [PMID: 33530913]

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy