Using Chaos-Game-Representation for Analysing the SARS-CoV-2
Lineages, Newly Emerging Strains and Recombinants

Amarinder   Singh   Thind; Somdatta      Sinha

doi:10.2174/0113892029264990231013112156

Abstract

Background: Viruses have high mutation rates, facilitating rapid evolution and the emergence of new species, subspecies, strains and recombinant forms. Accurate classification of these forms is crucial for understanding viral evolution and developing therapeutic applications. Phylogenetic classification is typically performed by analyzing molecular differences at the genomic and sub-genomic levels. This involves aligning homologous proteins or genes. However, there is growing interest in developing alignment-free methods for whole-genome comparisons that are computationally efficient.

Methods: Here we elaborate on the Chaos Game Representation (CGR) method, based on concepts of statistical physics and free of sequence alignment assumptions. We adopt the CGR method for classification of the closely related clades/lineages A and B of the SARS-Corona virus 2019 (SARS-CoV-2), which is one of the fastest evolving viruses.

Results: Our study shows that the CGR approach can easily yield the SARS-CoV-2 phylogeny from the available whole genomes of lineage A and lineage B sequences. It also shows an accurate classification of eight different strains and the newly evolved XBB variant from its parental strains. Compared to alignment-based methods (Neighbour-Joining and Maximum Likelihood), the CGR method requires low computational resources, is fast and accurate for long sequences, and, being a K-mer based approach, allows simultaneous comparison of a large number of closely-related sequences of different sizes. Further, we developed an R pipeline CGRphylo, available on GitHub, which integrates the CGR module with various other R packages to create phylogenetic trees and visualize them.

Conclusion: Our findings demonstrate the efficacy of the CGR method for accurate classification and tracking of rapidly evolving viruses, offering valuable insights into the evolution and emergence of new SARS-CoV-2 strains and recombinants.

« Previous

Graphical Abstract

[1]
Duffy, S. Why are RNA virus mutation rates so damn high? PLoS Biol.,  2018, 16(8), e3000003.
 [http://dx.doi.org/10.1371/journal.pbio.3000003] [PMID: 30102691]

[2]
Jankowski, R. Viruses and viral epidemics in the metabolic theory of evolution. Eur. Ann. Otorhinolaryngol. Head Neck Dis.,  2020, 137(4), 297-301.
 [http://dx.doi.org/10.1016/j.anorl.2020.05.014] [PMID: 32773332]

[3]
Xu, B.; Gutierrez, B.; Mekaru, S.; Sewalk, K.; Goodwin, L.; Loskill, A.; Cohn, E.L.; Hswen, Y.; Hill, S.C.; Cobo, M.M.; Zarebski, A.E.; Li, S.; Wu, C.H.; Hulland, E.; Morgan, J.D.; Wang, L.; O’Brien, K.; Scarpino, S.V.; Brownstein, J.S.; Pybus, O.G.; Pigott, D.M.; Kraemer, M.U.G. Epidemiological data from the COVID-19 outbreak, real-time case information. Sci. Data,  2020, 7(1), 106.
 [http://dx.doi.org/10.1038/s41597-020-0448-0] [PMID: 32210236]

[4]
Adepoju, P. Experts hopeful about Ebola control in Uganda. Lancet,  2022, 400(10359), 1184.
 [http://dx.doi.org/10.1016/S0140-6736(22)01924-9] [PMID: 36215997]

[5]
Whitworth, J. COVID-19: A fast evolving pandemic. Trans. R. Soc. Trop. Med. Hyg.,  2020, 114(4), 241-248.
 [http://dx.doi.org/10.1093/trstmh/traa025] [PMID: 32198918]

[6]
Zhu, N.; Zhang, D.; Wang, W.; Li, X.; Yang, B.; Song, J.; Zhao, X.; Huang, B.; Shi, W.; Lu, R.; Niu, P.; Zhan, F.; Ma, X.; Wang, D.; Xu, W.; Wu, G.; Gao, G.F.; Tan, W. A novel coronavirus from patients with pneumonia in China, 2019. N. Engl. J. Med.,  2020, 382(8), 727-733.
 [http://dx.doi.org/10.1056/NEJMoa2001017] [PMID: 31978945]

[7]
Dong, E.; Du, H.; Gardner, L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect. Dis.,  2020, 20(5), 533-534.
 [http://dx.doi.org/10.1016/S1473-3099(20)30120-1] [PMID: 32087114]

[8]
Gorbalenya, A.E.; Krupovic, M.; Mushegian, A.; Kropinski, A.M.; Siddell, S.G.; Varsani, A.; Adams, M.J.; Davison, A.J.; Dutilh, B.E.; Harrach, B.; Harrison, R.L.; Junglen, S.; King, A.M.Q.; Knowles, N.J.; Lefkowitz, E.J.; Nibert, M.L.; Rubino, L.; Sabanadzovic, S.; Sanfaçon, H.; Simmonds, P.; Walker, P.J.; Zerbini, F.M.; Kuhn, J.H. The new scope of virus taxonomy: Partitioning the virosphere into 15 hierarchical ranks. Nat. Microbiol.,  2020, 5(5), 668-674.
 [http://dx.doi.org/10.1038/s41564-020-0709-x] [PMID: 32341570]

[9]
Drews, S.J. The Taxonomy, Classification, and Characterization of Medically Important Viruses; Clinical Virology Manual, 2016, pp. 1-25.
 [http://dx.doi.org/10.1128/9781555819156.ch1]

[10]
Cao, Y.; Jian, F.; Wang, J.; Yu, Y.; Song, W.; Yisimayi, A. Imprinted SARS-CoV-2 humoral immunity induces converging Omicron RBD evolution. bioRxiv, 2022.

[11]
Baltimore, D. Expression of animal virus genomes. Bacteriol. Rev.,  1971, 35(3), 235-241.
 [http://dx.doi.org/10.1128/br.35.3.235-241.1971] [PMID: 4329869]

[12]
Yang, Z.; Rannala, B. Molecular phylogenetics: Principles and practice. Nat. Rev. Genet.,  2012, 13(5), 303-314.
 [http://dx.doi.org/10.1038/nrg3186] [PMID: 22456349]

[13]
Edgar, R.C.; Batzoglou, S. Multiple sequence alignment. Curr. Opin. Struct. Biol.,  2006, 16(3), 368-373.
 [http://dx.doi.org/10.1016/j.sbi.2006.04.004] [PMID: 16679011]

[14]
Katoh, K.; Standley, D.M. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol. Biol. Evol.,  2013, 30(4), 772-780.
 [http://dx.doi.org/10.1093/molbev/mst010] [PMID: 23329690]

[15]
Sievers, F; Higgins, DG Clustal omega. Current protocols in bioinform,  2014, 48(1), 13.
 [http://dx.doi.org/10.1002/0471250953.bi0313s48]

[16]
Thompson, JD; Gibson, TJ; Higgins, DG Multiple sequence alignment using ClustalW and ClustalX. Current protocols in bioinform,  2002, 2002(1), 22.
 [http://dx.doi.org/10.1002/0471250953.bi0203s00]

[17]
Song, H.; Giorgi, E.E.; Ganusov, V.V.; Cai, F.; Athreya, G.; Yoon, H.; Carja, O.; Hora, B.; Hraber, P.; Romero-Severson, E.; Jiang, C.; Li, X.; Wang, S.; Li, H.; Salazar-Gonzalez, J.F.; Salazar, M.G.; Goonetilleke, N.; Keele, B.F.; Montefiori, D.C.; Cohen, M.S.; Shaw, G.M.; Hahn, B.H.; McMichael, A.J.; Haynes, B.F.; Korber, B.; Bhattacharya, T.; Gao, F. Tracking HIV-1 recombination to resolve its contribution to HIV-1 evolution in natural infection. Nat. Commun.,  2018, 9(1), 1928.
 [http://dx.doi.org/10.1038/s41467-018-04217-5] [PMID: 29765018]

[18]
Sims, G.E.; Jun, S.R.; Wu, G.A.; Kim, S.H. Whole-genome phylogeny of mammals: Evolutionary information in genic and nongenic regions. Proc. Natl. Acad. Sci. USA,  2009, 106(40), 17077-17082.
 [http://dx.doi.org/10.1073/pnas.0909377106] [PMID: 19805074]

[19]
Wu, G.A.; Jun, S.R.; Sims, G.E.; Kim, S.H. Whole-proteome phylogeny of large dsDNA virus families by an alignment-free method. Proc. Natl. Acad. Sci. USA,  2009, 106(31), 12826-12831.
 [http://dx.doi.org/10.1073/pnas.0905115106] [PMID: 19553209]

[20]
Zielezinski, A.; Vinga, S.; Almeida, J.; Karlowski, W.M. Alignment-free sequence comparison: Benefits, applications, and tools. Genome Biol.,  2017, 18(1), 186.
 [http://dx.doi.org/10.1186/s13059-017-1319-7] [PMID: 28974235]

[21]
Capriotti, E.; Marti-Renom, M.A. Quantifying the relationship between sequence and three-dimensional structure conservation in RNA. BMC Bioinformatics,  2010, 11(1), 322.
 [http://dx.doi.org/10.1186/1471-2105-11-322] [PMID: 20550657]

[22]
Gardner, P.P.; Wilm, A.; Washietl, S. A benchmark of multiple sequence alignment programs upon structural RNAs. Nucleic Acids Res.,  2005, 33(8), 2433-2439.
 [http://dx.doi.org/10.1093/nar/gki541] [PMID: 15860779]

[23]
Jeffrey, H.J. Chaos game representation of gene structure. Nucleic Acids Res.,  1990, 18(8), 2163-2170.
 [http://dx.doi.org/10.1093/nar/18.8.2163] [PMID: 2336393]

[24]
Goldman, N. Nucleotide, dinucleotide and trinucleotide frequencies explain patterns observed in chaos game representations of DNA sequences. Nucleic Acids Res.,  1993, 21(10), 2487-2491.
 [http://dx.doi.org/10.1093/nar/21.10.2487] [PMID: 8506142]

[25]
Almeida, J.S.; Carriço, J.A.; Maretzek, A.; Noble, P.A.; Fletcher, M. Analysis of genomic sequences by Chaos Game Representation. Bioinformatics,  2001, 17(5), 429-437.
 [http://dx.doi.org/10.1093/bioinformatics/17.5.429] [PMID: 11331237]

[26]
Li, N.N.; Shi, F.; Niu, X.H.; Xia, J.B. A novel method to reconstruct phylogeny tree based on thechaos game representation. J. Biomed. Sci. Eng.,  2009, 2(8), 582-586.
 [http://dx.doi.org/10.4236/jbise.2009.28084]

[27]
Pandit, A.; Sinha, S. Using genomic signatures for HIV-1 sub-typing. BMC Bioinformatics,  2010, 11(S1)(Suppl. 1), S26.
 [http://dx.doi.org/10.1186/1471-2105-11-S1-S26] [PMID: 20122198]

[28]
Solis-Reyes, S.; Avino, M.; Poon, A.; Kari, L. An open-source k-mer based machine learning tool for fast and accurate subtyping of HIV-1 genomes. PLoS One,  2018, 13(11), e0206409.
 [http://dx.doi.org/10.1371/journal.pone.0206409] [PMID: 30427878]

[29]
Team, RDC. A language and environment for statistical computing. 2009. Available From: https://www.r-project.org/

[30]
Khare, S.; Gurry, C.; Freitas, L.; Schultz, MB; Bach, G.; Diallo, A GISAID’s Role in Pandemic Response. China CDC Wkly.,  2021, 3(490), 1049-1051.

[31]
Shu, Y.; McCauley, J. GISAID: Global initiative on sharing all influenza data – from vision to reality. Euro Surveill.,  2017, 22(13), 30494.
 [http://dx.doi.org/10.2807/1560-7917.ES.2017.22.13.30494] [PMID: 28382917]

[32]
Conversano, E.; Lalli, L.T. Sierpinski triangles in stone on medieval floors in Rome. J. Appl. Math.,  2011, 4, 114-122.

[33]
Felsenstein, J. PHYLIP (Phylogeny Inference Package)., 1993. Available From: https://csbf.stanford.edu/phylip/

[34]
Hall, B.G. Building phylogenetic trees from molecular data with MEGA. Mol. Biol. Evol.,  2013, 30(5), 1229-1235.
 [http://dx.doi.org/10.1093/molbev/mst012] [PMID: 23486614]

[35]
Paradis, E.; Schliep, K. ape 5.0: An environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics,  2019, 35(3), 526-528.
 [http://dx.doi.org/10.1093/bioinformatics/bty633] [PMID: 30016406]

[36]
Yu, G. Data Integration, Manipulation and Visualization of Phylogenetic Trees; CRC Press: Boca Raton, Florida, 2022. 
 [http://dx.doi.org/10.1201/9781003279242]

[37]
Sievers, F.; Higgins, D.G. Clustal Omega, accurate alignment of very large numbers of sequences. Multiple sequence alignment methods; Springer: Berlin, Heidelberg, 2014, pp. 105-116.

Rights & Permissions Print Cite

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/0113892029264990231013112156	Print ISSN 1389-2029
Publisher Name Bentham Science Publisher	Online ISSN 1875-5488

Current Genomics

Using Chaos-Game-Representation for Analysing the SARS-CoV-2 Lineages, Newly Emerging Strains and Recombinants

Abstract

Graphical Abstract

Current Genomics in Cardiovascular Research

Deep learning in Single Cell Analysis

New insights on Pediatric Tumors and Associated Cancer Predisposition Syndromes

Current Genomics

Using Chaos-Game-Representation for Analysing the SARS-CoV-2 Lineages, Newly Emerging Strains and Recombinants

Abstract Play Pause

Graphical Abstract

Call for Papers in Thematic Issues

Current Genomics in Cardiovascular Research

Deep learning in Single Cell Analysis

New insights on Pediatric Tumors and Associated Cancer Predisposition Syndromes

Related Journals

Related Books

Abstract