Generic placeholder image

Current HIV Research

Editor-in-Chief

ISSN (Print): 1570-162X
ISSN (Online): 1873-4251

Research Article

Collaborative Mining of Whole Genome Sequences for Intelligent HIV-1 Sub-Strain(s) Discovery

Author(s): Moses E. Ekpenyong*, Anthony A. Adegoke, Mercy E. Edoho, Udoinyang G. Inyang, Ifiok J. Udo, Itemobong S. Ekaidem, Francis Osang, Nseobong P. Uto and Joseph I. Geoffery

Volume 20, Issue 2, 2022

Published on: 24 March, 2022

Page: [163 - 183] Pages: 21

DOI: 10.2174/1570162X20666220210142209

Price: $65

Abstract

Background: Effective global antiretroviral vaccines and therapeutic strategies depend on the diversity, evolution, and epidemiology of their various strains as well as their transmission and pathogenesis. Most viral disease-causing particles are clustered into a taxonomy of subtypes to suggest pointers toward nucleotide-specific vaccines or therapeutic applications of clinical significance sufficient for sequence-specific diagnosis and homologous viral studies. These are very useful to formulate predictors to induce cross-resistance to some retroviral control drugs being used across study areas.

Objective: This research proposed a collaborative framework of hybridized (Machine Learning and Natural Language Processing) techniques to discover hidden genome patterns and feature predictors for HIV-1 genome sequences mining.

Methods: 630 human HIV-1 genome sequences above 8500 bps were excavated from the National Center for Biotechnology Information (NCBI) database (https://www.ncbi.nlm.nih.gov) for 21 countries across different continents, except for Antarctica. These sequences were transformed and learned using a self-organizing map (SOM). To discriminate emerging/new sub-strain(s), the HIV-1 reference genome was included as part of the input isolates/samples during the training. After training the SOM, component planes defining pattern clusters of the input datasets were generated for cognitive knowledge mining and subsequent labeling of the datasets. Additional genome features, including dinucleotide transmission recurrences, codon recurrences, and mutation recurrences, were finally extracted from the raw genomes to construct output classification targets for supervised learning.

Results: SOM training explains the inherent pattern diversity of HIV-1 genomes as well as interand intra-country transmissions in which mobility might play an active role, as corroborated by the literature. Nine sub-strains were discovered after disassembling the SOM correlation hunting matrix space attributed to disparate clusters. Cognitive knowledge mining separated similar pattern clusters bounded by a certain degree of correlation range, as discovered by the SOM. Kruskal-Wallis ranksum test and Wilcoxon rank-sum test showed statistically significant variations in dinucleotide, codon, and mutation patterns.

Conclusion: Results of the discovered sub-strains and response clusters visualizations corroborate the existing literature, with significant haplotype variations. The proposed framework would assist in the development of decision support systems for easy contact tracing, infectious disease surveillance, and studying the progressive evolution of the reference HIV-1 genome.

Keywords: Cognitive knowledge, genome mining, pattern recognition, self-organizing map, sub-strain, infectious disease.

Graphical Abstract

[1]
Daw MA, El-Bouzedi A, Ahmed MO, Dau AA. Molecular and epidemiological characterization of HIV-1 subtypes among Libyan patients. BMC Res Notes 2017; 10(1): 170.
[http://dx.doi.org/10.1186/s13104-017-2491-2] [PMID: 28454556]
[2]
Kemal KS, Anastos K, Weiser B, Ramirez CM, Shi Q, Burger H. Molecular epidemiology of HIV type 1 subtypes in Rwanda. AIDS Res Hum Retroviruses 2013; 29(6): 957-62.
[http://dx.doi.org/10.1089/aid.2012.0095] [PMID: 23458210]
[3]
Hemelaar J, Gouws E, Ghys PD, Osmanov S. Global trends in molecular epidemiology of HIV-1 during 2000-2007. AIDS (London, England) 2011; 25(5): 679-89.
[http://dx.doi.org/10.1097/QAD.0b013e328342ff93]
[4]
Hemelaar J. The origin and diversity of the HIV-1 pandemic. Trends Mol Med 2012; 18(3): 182-92.
[http://dx.doi.org/10.1016/j.molmed.2011.12.001] [PMID: 22240486]
[5]
Wang W, Jiang S, Li S, et al. Identification of subtype B, multiple circulating recombinant forms and unique recombinants of HIV type 1 in an MSM cohort in China. AIDS Res Hum Retroviruses 2008; 24(10): 1245-54.
[http://dx.doi.org/10.1089/aid.2008.0095] [PMID: 18834324]
[6]
Peeters M, Toure-Kane C, Nkengasong JN. Genetic diversity of HIV in Africa: Impact on diagnosis, treatment, vaccine development and trials. AIDS 2003; 17(18): 2547-60.
[http://dx.doi.org/10.1097/00002030-200312050-00002] [PMID: 14685049]
[7]
Nazziwa J, Faria NR, Chaplin B, et al. Characterisation of HIV-1 molecular epidemiology in Nigeria: Origin, diversity, demography and geographic spread. Sci Rep 2020; 10(1): 3468.
[http://dx.doi.org/10.1038/s41598-020-59944-x] [PMID: 32103028]
[8]
Harris ME, Maayan S, Kim B, et al. A cluster of HIV type 1 subtype C sequences from Ethiopia, observed in full genome analysis, is not sustained in subgenomic regions. AIDS Res Hum Retroviruses 2003; 19(12): 1125-33.
[http://dx.doi.org/10.1089/088922203771881220] [PMID: 14709249]
[9]
Khoja S, Ojwang P, Khan S, Okinda N, Harania R, Ali S. Genetic analysis of HIV-1 subtypes in Nairobi, Kenya. PLoS One 2008; 3(9): e3191.
[http://dx.doi.org/10.1371/journal.pone.0003191] [PMID: 18784834]
[10]
Billings E, Sanders-Buell E, Bose M, et al. HIV-1 genetic diversity among incident infections in Mbeya, Tanzania. AIDS Res Hum Retroviruses 2017; 33(4): 373-81.
[http://dx.doi.org/10.1089/aid.2016.0111] [PMID: 27841669]
[11]
Lee GQ, Bangsberg DR, Mo T, et al. Prevalence and clinical impacts of HIV-1 intersubtype recombinants in Uganda revealed by near-full-genome population and deep sequencing approaches. AIDS 2017; 31(17): 2345-54.
[http://dx.doi.org/10.1097/QAD.0000000000001619] [PMID: 28832407]
[12]
Akrim M, Lemrabet S, Elharti E, et al. HIV-1 Subtype distribution in morocco based on national sentinel surveillance data 2004-2005. AIDS Res Ther 2012; 9(1): 5.
[http://dx.doi.org/10.1186/1742-6405-9-5] [PMID: 22333070]
[13]
Musyoki AM, Rakgole JN, Selabe G, Mphahlele J. Identification and genetic characterization of unique HIV-1 A1/C recombinant strain in South Africa. AIDS Res Hum Retroviruses 2015; 31(3): 347-52.
[http://dx.doi.org/10.1089/aid.2014.0212] [PMID: 25517728]
[14]
Veras NM, Santoro MM, Gray RR, et al. Molecular epidemiology of HIV type 1 CRF02_AG in Cameroon and African patients living in Italy. AIDS research and human retroviruses 2011; 27(11): 1173-82.
[http://dx.doi.org/10.1089/aid.2010.0333]
[15]
González-Alba JM, Holguín A, Garcia R, et al. Molecular surveillance of HIV-1 in Madrid, Spain: a phylogeographic analysis. J Virol 2011; 85(20): 10755-63.
[http://dx.doi.org/10.1128/JVI.00454-11] [PMID: 21795343]
[16]
Hauser A, Hofmann A, Hanke K, et al. National molecular surveillance of recently acquired HIV infections in Germany, 2013 to 2014. Euro Surveill 2017; 22(2): 30436.
[http://dx.doi.org/10.2807/1560-7917.ES.2017.22.2.30436] [PMID: 28105988]
[17]
Casado C, Urtasun I, Saragosti S, et al. Different distribution of HIV type 1 genetic variants in European patients with distinct risk practices. AIDS research and human retroviruses 2000; 16(3): 299-304.
[http://dx.doi.org/10.1089/088922200309403]
[18]
Lole KS, Bollinger RC, Paranjape RS, et al. Full-length human immunodeficiency virus type 1 genomes from subtype C-infected seroconverters in India, with evidence of intersubtype recombination. J Virol 1999; 73(1): 152-60.
[http://dx.doi.org/10.1128/JVI.73.1.152-160.1999] [PMID: 9847317]
[19]
van Griensven F, de Lind van Wijngaarden JW, Baral S, Grulich A. The global epidemic of HIV infection among men who have sex with men. Curr Opin HIV AIDS 2009; 4(4): 300-7.
[http://dx.doi.org/10.1097/COH.0b013e32832c3bb3] [PMID: 19532068]
[20]
Baral S, Sifakis F, Cleghorn F, Beyrer C. Elevated risk for HIV infection among men who have sex with men in low- and middle-income countries 2000-2006: A systematic review. PLoS Med 2007; 4(12): e339.
[http://dx.doi.org/10.1371/journal.pmed.0040339] [PMID: 18052602]
[21]
Kato S, Saito R, Hiraishi Y, et al. Differential prevalence of HIV type 1 subtype B and CRF01_AE among different sexual transmission groups in Tokyo, Japan, as revealed by subtype-specific PCR. AIDS Res Hum Retroviruses 2003; 19(11): 1057-63.
[http://dx.doi.org/10.1089/088922203322588431] [PMID: 14686327]
[22]
Kondo M, Lemey P, Sano T, et al. Emergence in Japan of an HIV-1 variant associated with transmission among men who have sex with men (MSM) in China: first indication of the International Dissemination of the Chinese MSM lineage. J Virol 2013; 87(10): 5351-61.
[http://dx.doi.org/10.1128/JVI.02370-12] [PMID: 23365432]
[23]
Wang W, Xu J, Jiang S, et al. The dynamic face of HIV-1 subtypes among men who have sex with men in Beijing, China. Curr HIV Res 2011; 9(2): 136-9.
[http://dx.doi.org/10.2174/157016211795569096] [PMID: 21361866]
[24]
Xiao P, Zhou Y, Lu J, et al. HIV-1 genotype diversity and distribution characteristics among heterosexually transmitted population in Jiangsu province, China. Virol J 2019; 16(1): 51.
[http://dx.doi.org/10.1186/s12985-019-1162-4] [PMID: 31023323]
[25]
Saeng-Aroon S, Loket R, Plipat T, et al. Circulation of HIV-1 multiple complexity recombinant forms among female sex workers recently infected with HIV-1 in Thailand. AIDS Res Hum Retroviruses 2016; 32(7): 694-701.
[http://dx.doi.org/10.1089/aid.2015.0371] [PMID: 26892382]
[26]
Neogi U, Bontell I, Shet A, et al. Molecular epidemiology of HIV-1 subtypes in India: Origin and evolutionary history of the predominant subtype C. PLoS One 2012; 7(6): e39819.
[http://dx.doi.org/10.1371/journal.pone.0039819] [PMID: 22768132]
[27]
Moore RD. Epidemiology of HIV infection in the United States: implications for linkage to care. Clin Infect Dis 2011; 52(Suppl. 2): S208-13.
[http://dx.doi.org/10.1093/cid/ciq044] [PMID: 21342909]
[28]
Akouamba BS, Viel J, Charest H, et al. HIV-1 genetic diversity in antenatal cohort, Canada. Emerg Infect Dis 2005; 11(8): 1230-4.
[http://dx.doi.org/10.3201/eid1108.040877] [PMID: 16102312]
[29]
Alves BM, Siqueira JD, Prellwitz IM, et al. Estimating HIV-1 genetic diversity in Brazil through next-generation sequencing. Front Microbiol 2019; 10: 749.
[http://dx.doi.org/10.3389/fmicb.2019.00749] [PMID: 31024510]
[30]
Schrider DR, Kern AD. Supervised machine learning for population genetics: A new paradigm. Trends Genet 2018; 34(4): 301-12.
[http://dx.doi.org/10.1016/j.tig.2017.12.005] [PMID: 29331490]
[31]
Singh Y, Narsai N, Mars M. Applying machine learning to predict patient-specific current CD 4 cell count in order to determine the progression of human immunodeficiency virus (HIV) infection. Afr J Biotechnol 2013; 12(23): 1860.
[http://dx.doi.org/10.5897/AJB12.1860]
[32]
Singh Y. Machine learning to improve the effectiveness of ANRS in predicting HIV drug resistance. Healthc Inform Res 2017; 23(4): 271-6.
[http://dx.doi.org/10.4258/hir.2017.23.4.271] [PMID: 29181236]
[33]
Steiner MC, Gibson KM, Crandall KA. Drug resistance prediction using deep learning techniques on HIV-1 sequence data. Viruses 2020; 12(5): 560.
[http://dx.doi.org/10.3390/v12050560] [PMID: 32438586]
[34]
Hu H, Xiao A, Zhang S, et al. DeepHINT: understanding HIV-1 integration via deep learning with attention. Bioinformatics 2019; 35(10): 1660-7.
[http://dx.doi.org/10.1093/bioinformatics/bty842] [PMID: 30295703]
[35]
Shao W, Shan J, Kearney MF, et al. Retrovirus Integration Database (RID): A public database for retroviral insertion sites into host genomes. Retrovirology 2016; 13(1): 47.
[http://dx.doi.org/10.1186/s12977-016-0277-6] [PMID: 27377064]
[36]
Sükösd Z, Andersen ES, Seemann SE, et al. Full-length RNA structure prediction of the HIV-1 genome reveals a conserved core domain. Nucleic Acids Res 2015; 43(21): 10168-79.
[http://dx.doi.org/10.1093/nar/gkv1039] [PMID: 26476446]
[37]
Mei J, Zhao J. Prediction of HIV-1 and HIV-2 proteins by using Chou’s pseudo amino acid compositions and different classifiers. Sci Rep 2018; 8(1): 2359.
[http://dx.doi.org/10.1038/s41598-018-20819-x] [PMID: 29402983]
[38]
Skittrall JP, Ingemarsdotter CK, Gog JR, Lever AML. A scale-free analysis of the HIV-1 genome demonstrates multiple conserved regions of structural and functional importance. PLOS Comput Biol 2019; 15(9): e1007345.
[http://dx.doi.org/10.1371/journal.pcbi.1007345] [PMID: 31545786]
[39]
Gupta MK, Agarwal K, Prakash N, Singh DB, Misra K. Prediction of miRNA in HIV-1 genome and its targets through artificial neural network: A bioinformatics approach. Netw Model Anal Health Inform Bioinform 2012; 1(4): 141-51.
[http://dx.doi.org/10.1007/s13721-012-0017-3]
[40]
Fathi A, Sadeghi R. A genetic programming method for feature mapping to improve prediction of HIV-1 protease cleavage site. Appl Soft Comput 2018; 72: 56-64.
[http://dx.doi.org/10.1016/j.asoc.2018.06.045]
[41]
Solis-Reyes S, Avino M, Poon A, Kari L. An open-source k-mer based machine learning tool for fast and accurate subtyping of HIV-1 genomes. PloS one 2018; 13(11): e0206409.
[http://dx.doi.org/10.1371/journal.pone.0206409]
[42]
Yang ZR, Chou KC. Mining biological data using self-organizing map. J Chem Inf Comput Sci 2003; 43(6): 1748-53.
[http://dx.doi.org/10.1021/ci034138n] [PMID: 14632420]
[43]
Tarasova O, Biziukova N, Kireev D, et al. A computational approach for the prediction of treatment history and the effectiveness or failure of antiretroviral therapy. Int J Mol Sci 2020; 21(3): 748.
[http://dx.doi.org/10.3390/ijms21030748] [PMID: 31979356]
[44]
Rhee S-Y, Gonzales MJ, Kantor R, Betts BJ, Ravela J, Shafer RW. Human immunodeficiency virus reverse transcriptase and protease sequence database. Nucleic Acids Res 2003; 31(1): 298-303.
[http://dx.doi.org/10.1093/nar/gkg100] [PMID: 12520007]
[45]
Salama MA, Hassanien AE, Mostafa A. The prediction of virus mutation using neural networks and rough set techniques. EURASIP J Bioinform Syst Biol 2016; 2016(1): 10.
[http://dx.doi.org/10.1186/s13637-016-0042-0] [PMID: 27257410]
[46]
Fu W, Sanders-Beer BE, Katz KS, Maglott DR, Pruitt KD, Ptak RG. Human immunodeficiency virus type 1, human protein interaction database at NCBI. Nucleic Acids Res 2009; 37(Database issue)(Suppl. 1): D417-22.
[http://dx.doi.org/10.1093/nar/gkn708] [PMID: 18927109]
[47]
Kiang MY. Extending the Kohonen self-organizing map networks for clustering analysis. Comput Stat Data Anal 2001; 38(2): 161-80.
[http://dx.doi.org/10.1016/S0167-9473(01)00040-8]
[48]
Lyu X, Liu Y. Nonoptimal codon usage is critical for protein structure and function of the master general amino acid control regulator CPC-1. MBio 2020; 11(5): e02605-20.
[http://dx.doi.org/10.1128/mBio.02605-20] [PMID: 33051373]
[49]
Palanisamy N, Osman N, Ohnona F, et al. Does antiretroviral treatment change HIV-1 codon usage patterns in its genes: a preliminary bioinformatics study. AIDS Res Ther 2017; 14(1): 2.
[http://dx.doi.org/10.1186/s12981-016-0130-y] [PMID: 28086981]
[50]
Meuwissen PJ, Stolp B, Iannucci V, et al. Identification of a highly conserved valine-glycine-phenylalanine amino acid triplet required for HIV-1 Nef function. Retrovirology 2012; 9(1): 34.
[http://dx.doi.org/10.1186/1742-4690-9-34] [PMID: 22537596]
[51]
Brock T, Stahl DA, Buckley DH, Bender KS, Martinko JM, Madigan MT. Brock biology of microorganisms. Boston: Pearson 2012; pp. 177-88.
[52]
Belinky F, Rogozin IB, Koonin EV. Selection on start codons in prokaryotes and potential compensatory nucleotide substitutions. Sci Rep 2017; 7(1): 12422.
[http://dx.doi.org/10.1038/s41598-017-12619-6] [PMID: 28963504]
[53]
Ho JSY, Zhu Z, Marazzi I. Unconventional viral gene expression mechanisms as therapeutic targets. Nature 2021; 593(7859): 362-71.
[http://dx.doi.org/10.1038/s41586-021-03511-5] [PMID: 34012080]
[54]
Haas J, Park EC, Seed B. Codon usage limitation in the expression of HIV-1 envelope glycoprotein. Curr Biol 1996; 6(3): 315-24.
[http://dx.doi.org/10.1016/S0960-9822(02)00482-7] [PMID: 8805248]
[55]
Boyer PL, Sarafianos SG, Arnold E, Hughes SH. Analysis of mutations at positions 115 and 116 in the dNTP binding site of HIV-1 reverse transcriptase. Proc Natl Acad Sci USA 2000; 97(7): 3056-61.
[http://dx.doi.org/10.1073/pnas.97.7.3056] [PMID: 10737786]
[56]
Vetter BN, Orlowski V, Niederhauser C, Walter L, Schüpbach J. Impact of naturally occurring amino acid variations on the detection of HIV-1 p24 in diagnostic antigen tests. BMC Infect Dis 2015; 15(1): 468.
[http://dx.doi.org/10.1186/s12879-015-1174-7] [PMID: 26511217]
[57]
Winters MA, Merigan TC. Variants other than aspartic acid at codon 69 of the human immunodeficiency virus type 1 reverse transcriptase gene affect susceptibility to nuleoside analogs. Antimicrob Agents Chemother 2001; 45(8): 2276-9.
[http://dx.doi.org/10.1128/AAC.45.8.2276-2279.2001] [PMID: 11451685]
[58]
Cuevas JM, Geller R, Garijo R, López-Aldeguer J, Sanjuán R. Extremely high mutation rate of HIV-1 in vivo. PLoS Biol 2015; 13(9): e1002251.
[http://dx.doi.org/10.1371/journal.pbio.1002251] [PMID: 26375597]
[59]
Kelleher AD, Long C, Holmes EC, et al. Clustered mutations in HIV-1 gag are consistently required for escape from HLA-B27-restricted cytotoxic T lymphocyte responses. J Exp Med 2001; 193(3): 375-86.
[http://dx.doi.org/10.1084/jem.193.3.375] [PMID: 11157057]
[60]
Ekpenyong ME, Edoho ME, Inyang UG, et al. A hybrid computational framework for intelligent inter-continent SARS-CoV-2 sub-strains characterization and prediction 2021.
[http://dx.doi.org/10.1038/s41598-021-93757-w]
[61]
Zhang H, Li P, Zhong HS, Zhang SH. Conservation vs. variation of dinucleotide frequencies across bacterial and archaeal genomes: Evolutionary implications. Front Microbiol 2013; 4: 269.
[http://dx.doi.org/10.3389/fmicb.2013.00269] [PMID: 24046767]
[62]
Pandit A, Vadlamudi J, Sinha S. Analysis of dinucleotide signatures in HIV-1 subtype B genomes. J Genet 2013; 92(3): 403-12.
[http://dx.doi.org/10.1007/s12041-013-0281-8] [PMID: 24371162]
[63]
Xia X, Wei T, Xie Z, Danchin A. Genomic changes in nucleotide and dinucleotide frequencies in Pasteurella multocida cultured under high temperature. Genetics 2002; 161(4): 1385-94.
[http://dx.doi.org/10.1093/genetics/161.4.1385] [PMID: 12196387]
[64]
Ellis J, Griffin H, Morrison D, Johnson AM. Analysis of dinucleotide frequency and codon usage in the phylum Apicomplexa. Gene 1993; 126(2): 163-70.
[http://dx.doi.org/10.1016/0378-1119(93)90363-8] [PMID: 8482530]
[65]
Fleischmann WR Jr. Viral genetics. 4th ed. Medical Microbiology 1996.
[66]
Williamson C, Morris L, Maughan MF, et al. Characterization and selection of HIV-1 subtype C isolates for use in vaccine development. AIDS Res Hum Retroviruses 2003; 19(2): 133-44.
[http://dx.doi.org/10.1089/088922203762688649] [PMID: 12639249]
[67]
Geller R, Domingo-Calap P, Cuevas JM, Rossolillo P, Negroni M, Sanjuán R. The external domains of the HIV-1 envelope are a mutational cold spot. Nat Commun 2015; 6(1): 8571.
[http://dx.doi.org/10.1038/ncomms9571] [PMID: 26450412]
[68]
Rawson JM, Landman SR, Reilly CS, Mansky LM. HIV-1 and HIV-2 exhibit similar mutation frequencies and spectra in the absence of G-to-A hypermutation. Retrovirology 2015; 12(1): 60.
[http://dx.doi.org/10.1186/s12977-015-0180-6] [PMID: 26160407]
[69]
Mullins JI, Heath L, Hughes JP, et al. Mutation of HIV-1 genomes in a clinical population treated with the mutagenic nucleoside KP1461. PLoS One 2011; 6(1): e15135.
[http://dx.doi.org/10.1371/journal.pone.0015135] [PMID: 21264288]
[70]
Pandey A. Entropy and codon bias in HIV-1. bioRxiv 2016; 2016: 052274.
[http://dx.doi.org/10.1101/052274]
[71]
Vidyavijayan KK, Hassan S, Precilla LK, et al. Biased nucleotide composition and differential codon usage pattern in HIV-1 and HIV-2. AIDS Res Hum Retroviruses 2017; 33(3): 298-307.
[http://dx.doi.org/10.1089/aid.2015.0320] [PMID: 27599904]
[72]
Alexaki A, Kames J, Holcomb DD, et al. Codon and Codon-Pair Usage Tables (CoCoPUTs): Facilitating genetic variation analyses and recombinant gene design. J Mol Biol 2019; 431(13): 2434-41.
[http://dx.doi.org/10.1016/j.jmb.2019.04.021]
[73]
Martrus G, Nevot M, Andres C, Clotet B, Martinez MA. Changes in codon-pair bias of human immunodeficiency virus type 1 have profound effects on virus replication in cell culture. Retrovirology 2013; 10(1): 78.
[http://dx.doi.org/10.1186/1742-4690-10-78] [PMID: 23885919]

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy