Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the causative agent of the global pandemic of Coronavirus disease 2019 (COVID-19). Limited information is available on evolutionary aspects of the structural proteins: spike (S), envelope (E), membrane (M) and nucleocapsid (N) of the virus. Therefore, we attempted detailed molecular and genetic characterization of SARS-CoV-2 structural protein genes using nucleotide composition, codon usage patterns, phylogenetic, entropy and selection pressure analyses. The RSCU patterns suggested codon biasness due to preference of U/A-ended over C/G-ended codons. Mutational pressure and natural selection influence the synonymous codon usage of structural protein genes in SARS-CoV-2. Phylogenetic analyses of different coronaviruses for all the four structural genes showed that all 2019-nCoV study sequences were clustered under the SARS-CoV-2 clade which was closest to bat coronaviruses. Additional phylogenetic analyses of SARS-CoV-2 structural protein genes showed discordance in the topology, suggesting different patterns of evolutionary relationships among these genes. Few non-synonymous amino acid mutations, low value of entropy and purifying selection suggested limited variations in the studied genes. However, these variations in the SARS-CoV-2 genome are likely to increase in near future since the virus will try to evade the host immune response to enhance its survival in humans. Thus, we evaluated the genetic diversity of the structural protein genes along with the genomic composition and codon usage patterns of SARS-CoV-2. Thus, present data on molecular characterization of structural protein genes is likely to augment the information about the evolution, biology and adaptation of SARS-CoV-2 in the human host.
Keywords: Entropy, Gene ontology, Molecular characterization, Mutational pressure, Natural selection, Nucleotide composition, Phylogenetic analysis, SARS-CoV-2, Structural proteins, Synonymous codon usage.