Generic placeholder image

Current Bioinformatics

Editor-in-Chief

ISSN (Print): 1574-8936
ISSN (Online): 2212-392X

Research Article

Predicting COVID-19 Severity Integrating RNA-Seq Data Using Machine Learning Techniques

Author(s): Javier Bajo-Morales, Daniel Castillo-Secilla, Luis Javier Herrera, Octavio Caba, Jose Carlos Prados and Ignacio Rojas*

Volume 18, Issue 3, 2023

Published on: 22 February, 2023

Page: [221 - 231] Pages: 11

DOI: 10.2174/1574893617666220718110053

Price: $65

Abstract

A fundamental challenge in the fight against COVID-19 is the development of reliable and accurate tools to predict disease progression in a patient. This information can be extremely useful in distinguishing hospitalized patients at higher risk for needing UCI from patients with low severity. How SARS-CoV-2 infection will evolve is still unclear.

Methods: A novel pipeline was developed that can integrate RNA-Seq data from different databases to obtain a genetic biomarker COVID-19 severity index using an artificial intelligence algorithm. Our pipeline ensures robustness through multiple cross-validation processes in different steps.

Results: CD93, RPS24, PSCA, and CD300E were identified as COVID-19 severity gene signatures. Furthermore, using the obtained gene signature, an effective multi-class classifier capable of discriminating between control, outpatient, inpatient, and ICU COVID-19 patients was optimized, achieving an accuracy of 97.5%.

Conclusion: In summary, during this research, a new intelligent pipeline was implemented to develop a specific gene signature that can detect the severity of patients suffering COVID-19. Our approach to clinical decision support systems achieved excellent results, even when processing unseen samples. Our system can be of great clinical utility for the strategy of planning, organizing and managing human and material resources, as well as for automatically classifying the severity of patients affected by COVID-19.

Keywords: COVID-19, CDSS, Severity, Gene Expression, Machine Learning, Feature Selection.

Graphical Abstract

[1]
WHO Coronavirus (COVID-19) dashboard. Available from: https://covid19.who.int/
[2]
COVID-19 map. Johns Hopkins Coronavirus Resource Center. Available from: https://coronavirus.jhu.edu/map.html
[3]
Huang C, Wang Y, Li X, et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet 2020; 395(10223): 497-506. [Internet].
[http://dx.doi.org/10.1016/S0140-6736(20)30183-5] [PMID: 31986264]
[4]
Ciaffi J, Meliconi R, Ruscitti P, Berardicurti O, Giacomelli R, Ursini F. Rheumatic manifestations of COVID-19: A systematic review and meta-analysis. BMC Rheumatol 2020; 4(1): 65.
[http://dx.doi.org/10.1186/s41927-020-00165-0] [PMID: 33123675]
[5]
Gautier J-F, Ravussin Y. A new symptom of COVID-19: Loss of taste and smell. Obesity (Silver Spring) 2020; 28(5): 848.
[http://dx.doi.org/10.1002/oby.22809] [PMID: 32237199]
[6]
Epidemiology Working Group for NCIP Epidemic Response, Chinese Center for Disease Control and Prevention. The epidemiological characteristics of an outbreak of 2019 novel coronavirus diseases (COVID-19) in China. Zhonghua Liu Xing Bing Xue Za Zhi 2020; 41(2): 145-51.
[http://dx.doi.org/10.3760/cma.j.issn.0254-6450.2020.02.003] [PMID: 32064853]
[7]
Pascarella G, Strumia A, Piliego C, et al. COVID-19 diagnosis and management: A comprehensive review. J Intern Med 2020; 288(2): 192-206.
[http://dx.doi.org/10.1111/joim.13091] [PMID: 32348588]
[8]
Guan W-J, Ni Z-Y, Hu Y, et al. Clinical characteristics of Coronavirus disease 2019 in China. N Engl J Med 2020; 382(18): 1708-20.
[http://dx.doi.org/10.1056/NEJMoa2002032]
[9]
He F, Deng Y, Li W. Coronavirus disease 2019: What we know? J Med Virol 2020; 92(7): 719-25.
[http://dx.doi.org/10.1002/jmv.25766] [PMID: 32170865]
[10]
Mick E, Kamm J, Pisco AO, et al. Upper airway gene expression reveals suppressed immune responses to SARS-CoV-2 compared with other respiratory viruses. Nat Commun 2020; 11(1): 5854.
[http://dx.doi.org/10.1038/s41467-020-19587-y] [PMID: 33203890]
[11]
Lieberman NAP, Peddu V, Xie H, et al. In vivo antiviral host transcriptional response to SARS-CoV-2 by viral load, sex, and age. PLoS Biol 2020; 18(9): e3000849.
[http://dx.doi.org/10.1371/journal.pbio.3000849] [PMID: 32898168]
[12]
Zhang Y-H, Li H, Zeng T, et al. Identifying transcriptomic signatures and rules for SARS-CoV-2 infection. Front Cell Dev Biol 2021; 8: 627302.
[http://dx.doi.org/10.3389/fcell.2020.627302] [PMID: 33505977]
[13]
Bajo-Morales J, Prieto-Prieto JC, Herrera LJ, Rojas I, Castillo-Secilla D. COVID-19 biomarkers recognition & classification using intelligent systems. Curr Bioinform 2022; 17(5): 426-39.
[http://dx.doi.org/10.2174/1574893617666220328125029]
[14]
Ng DL, Granados AC, Santos YA, et al. A diagnostic host response biosignature for COVID-19 from RNA profiling of nasal swabs and blood. Sci Adv 2021; 7(6): eabe5984.
[http://dx.doi.org/10.1126/sciadv.abe5984] [PMID: 33536218]
[15]
Chua RL, Lukassen S, Trump S, et al. COVID-19 severity correlates with airway epithelium-immune cell interactions identified by single-cell analysis. Nat Biotechnol 2020; 38(8): 970-9.
[http://dx.doi.org/10.1038/s41587-020-0602-4] [PMID: 32591762]
[16]
Choudhary S, Sreenivasulu K, Mitra P, Misra S, Sharma P. Role of genetic variants and gene expression in the susceptibility and severity of COVID-19. Ann Lab Med 2021; 41(2): 129-38.
[http://dx.doi.org/10.3343/alm.2021.41.2.129] [PMID: 33063674]
[17]
Wang C, Tan S, Liu W-R, et al. RNA-Seq profiling of circular RNA in human lung adenocarcinoma and squamous cell carcinoma. Mol Cancer 2019; 18(1): 134.
[http://dx.doi.org/10.1186/s12943-019-1061-8] [PMID: 31484581]
[18]
Saeys Y, Inza I, Larrañaga P. A review of feature selection techniques in bioinformatics. Bioinformatics 2007; 23(19): 2507-17.
[http://dx.doi.org/10.1093/bioinformatics/btm344] [PMID: 17720704]
[19]
Lee C-P, Leu Y. A novel hybrid feature selection method for microarray data analysis. Appl Soft Comput 2011; 11(1): 208-13.
[http://dx.doi.org/10.1016/j.asoc.2009.11.010]
[20]
Aydadenta H, Adiwijaya A. A clustering approach for feature selection in microarray data classification using random forest. J Inform Process Syst 2018; 14: 1167-75.
[21]
Townes FW, Hicks SC, Aryee MJ, Irizarry RA. Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model. Genome Biol 2019; 20(1): 295.
[http://dx.doi.org/10.1186/s13059-019-1861-6] [PMID: 31870412]
[22]
Lu H, Chen J, Yan K, Jin Q, Xue Y, Gao Z. A hybrid feature selection algorithm for gene expression data classification. Neurocomputing 2017; 256: 56-62.
[http://dx.doi.org/10.1016/j.neucom.2016.07.080]
[23]
Gálvez JM, Castillo D, Herrera LJ, et al. Multiclass classification for skin cancer profiling based on the integration of heterogeneous gene expression series. PLoS One 2018; 13(5): e0196836.
[http://dx.doi.org/10.1371/journal.pone.0196836] [PMID: 29750795]
[24]
Ayyad SM, Saleh AI, Labib LM. Gene expression cancer classification using modified K-Nearest Neighbors technique. Biosystems 2019; 176: 41-51.
[http://dx.doi.org/10.1016/j.biosystems.2018.12.009] [PMID: 30611843]
[25]
van IJzendoorn DGP, Szuhai K, Briaire-de Bruijn IH, Kostine M, Kuijjer ML, Bovée JVMG. Machine learning analysis of gene expression data reveals novel diagnostic and prognostic biomarkers and identifies therapeutic targets for soft tissue sarcomas. PLOS Comput Biol 2019; 15(2): e1006826.
[http://dx.doi.org/10.1371/journal.pcbi.1006826] [PMID: 30785874]
[26]
Zhang L, He Y, Wang H, Liu H, Huang Y, Wang X, et al. Clustering count-based RNA methylation data using a nonparametric generative model. Curr Bioinform 2018; 14(1): 11-23.
[http://dx.doi.org/10.2174/1574893613666180601080008]
[27]
Bugnon LA, Raad J, Merino GA, et al. Deep learning for the discovery of new pre-miRNAs: Helping the fight against COVID-19. Mach Learn Appl 2021; 6(100150)
[http://dx.doi.org/10.1016/j.mlwa.2021.100150]
[28]
Castillo-Secilla D, Gálvez JM, Carrillo-Perez F, et al. KnowSeq R-Bioc package: The automatic smart gene expression tool for retrieving relevant biological knowledge. Comput Biol Med 2021; 133(104387): 104387.
[http://dx.doi.org/10.1016/j.compbiomed.2021.104387] [PMID: 33872966]
[29]
Massey FJ Jr. The kolmogorov-smirnov test for goodness of fit. J Am Stat Assoc 1951; 46(253): 68-78.
[http://dx.doi.org/10.1080/01621459.1951.10500769]
[30]
Walfish S. A review of statistical outlier methods. Pharm Technol 2006; 30.
[31]
Fujita A, Sato JR, Demasi MAA, Sogayar MC, Ferreira CE, Miyano S. Comparing Pearson, Spearman and Hoeffding’s D measure for gene expression association analysis. J Bioinform Comput Biol 2009; 7(4): 663-84.
[http://dx.doi.org/10.1142/S0219720009004230] [PMID: 19634197]
[32]
Dudoit S, Fridlyand J, Speed TP. Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 2002; 97(457): 77-87.
[http://dx.doi.org/10.1198/016214502753479248]
[33]
Smyth GK, Speed T. Normalization of cDNA microarray data. Methods 2003; 31(4): 265-73.
[http://dx.doi.org/10.1016/S1046-2023(03)00155-5] [PMID: 14597310]
[34]
Lazar C, Meganck S, Taminau J, et al. Batch effect removal methods for microarray gene expression data integration: A survey. Brief Bioinform 2013; 14(4): 469-90.
[http://dx.doi.org/10.1093/bib/bbs037] [PMID: 22851511]
[35]
Witten D, Tibshirani R. A comparison of fold-change and the t-statistic for microarray data analysis. Analysis 2007; 1776: 58-85.
[36]
Castillo D, Galvez JM, Herrera LJ, et al. Leukemia multiclass assessment and classification from Microarray and RNA-seq technologies integration at gene expression level. PLoS One 2019; 14(2): e0212127.
[http://dx.doi.org/10.1371/journal.pone.0212127] [PMID: 30753220]
[37]
Peng H, Long F, Ding C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 2005; 27(8): 1226-38.
[http://dx.doi.org/10.1109/TPAMI.2005.159] [PMID: 16119262]
[38]
Mundra PA, Rajapakse JC. SVM-RFE with MRMR filter for gene selection. IEEE Trans Nanobiosci 2010; 9(1): 31-7.
[http://dx.doi.org/10.1109/TNB.2009.2035284]
[39]
Zhang Y, Ding C, Li T. Gene selection algorithm by combining relief F and mRMR. BMC Genomics 2008; 2(S2): S27.
[http://dx.doi.org/10.1186/1471-2164-9-S2-S27]
[40]
Alshamlan H, Badr G, Alohali Y. MRMR-ABC: A hybrid gene selection algorithm for cancer classification using microarray gene expression profiling. Biomed Res Int 2015; 2015: 604910.
[http://dx.doi.org/10.1155/2015/604910]
[41]
Pashaei E, Pashaei E. Gene selection using hybrid dragonfly black hole algorithm: A case study on RNA-seq COVID-19 data. Anal Biochem 2021; 627(114242): 114242.
[http://dx.doi.org/10.1016/j.ab.2021.114242]
[42]
Bose E, Paintsil E, Ghebremichael M. Minimum redundancy maximal relevance gene selection of apoptosis pathway genes in peripheral blood mononuclear cells of HIV-infected patients with antiretroviral therapy-associated mitochondrial toxicity. BMC Med Genomics 2021; 14(1): 285.
[http://dx.doi.org/10.1186/s12920-021-01136-1]
[43]
Al-Rajab M, Lu J, Xu Q. A framework model using multifilter feature selection to enhance colon cancer classification. PLoS One 2021; 16(4): e0249094.
[http://dx.doi.org/10.1371/journal.pone.0249094]
[44]
Cover T, Hart P. Nearest neighbor pattern classification. IEEE Trans Inf Theory 1967; 13(1): 21-7.
[http://dx.doi.org/10.1109/TIT.1967.1053964]
[45]
Cristianini N, Shawe-Taylor J, et al. An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press 2000.
[http://dx.doi.org/10.1017/CBO9780511801389]
[46]
Breiman L. Random forests. Mach Learn 2001; 45(1): 5-32.
[http://dx.doi.org/10.1023/A:1010933404324]
[47]
Arowolo MO, Adebiyi M, Adebiyi A, Okesola O. PCA model for RNA-seq malaria vector data classification using KNN and decision tree algorithm. In: 2020 International Conference in Mathematics, Computer Engineering and Computer Science (ICMCECS); 18-21 March 2020; Ayobo, Nigeria; IEEE 2020.
[http://dx.doi.org/10.1109/ICMCECS47690.2020.240881]
[48]
Molinaro AM, Simon R, Pfeiffer RM. Prediction error estimation: A comparison of resampling methods. Bioinformatics 2005; 21(15): 3301-7.
[http://dx.doi.org/10.1093/bioinformatics/bti499] [PMID: 15905277]
[49]
Rifkin R, Klautau A. In defense of one-vs-all classification. J Mach Learn Res 2004; 5: 101-41.
[50]
Van Der Maaten L, Hinton G. Visualizing data using t-SNE. J Mach Learn Res 2008; 9.
[51]
John CR, Watson D, Russ D, et al. M3C: Monte Carlo reference-based consensus clustering. Sci Rep 2020; 10(1): 1816.
[http://dx.doi.org/10.1038/s41598-020-58766-1] [PMID: 32020004]
[52]
Home - GEO - NCBI. Available from: https://www.ncbi.nlm.nih.gov/geo/
[53]
Jain R, Ramaswamy S, Harilal D, et al. Host transcriptomic profiling of COVID-19 patients with mild, moderate, and severe clinical outcomes. Comput Struct Biotechnol J 2020; 19: 153-60.
[http://dx.doi.org/10.1016/j.csbj.2020.12.016] [PMID: 33425248]
[54]
Akaike H. A new look at the statistical model identifications. IEEE Trans Automat Contr 1974; 19: 716-23.
[http://dx.doi.org/10.1109/TAC.1974.1100705]
[55]
Borah S, Vasudevan D, Swain RK. C-type lectin family XIV members and angiogenesis. Oncol Lett 2019; 18(4): 3954-62.
[http://dx.doi.org/10.3892/ol.2019.10760] [PMID: 31579078]
[56]
Greenlee-Wacker MC, Galvan MD, Bohlson SS. CD93: Recent advances and implications in disease. Curr Drug Targets 2012; 13(3): 411-20.
[http://dx.doi.org/10.2174/138945012799424651] [PMID: 22206251]
[57]
Haralambieva IH, Zimmermann MT, Ovsyannikova IG, et al. Whole transcriptome profiling identifies CD93 and other plasma cell survival factor genes associated with measles-specific antibody response after vaccination. PLoS One 2016; 11(8): e0160970.
[http://dx.doi.org/10.1371/journal.pone.0160970] [PMID: 27529750]
[58]
Daamen AR, Bachali P, Owen KA, et al. Comprehensive transcriptomic analysis of COVID-19 blood, lung, and airway. Sci Rep 2021; 11(1): 7052.
[http://dx.doi.org/10.1038/s41598-021-86002-x] [PMID: 33782412]
[59]
Leon J, Michelson DA, Olejnik J, et al. A virus-specific monocyte inflammatory phenotype is induced by SARS-CoV-2 at the immune-epithelial interface. Proc Natl Acad Sci USA 2022; 119(1): e2116853118.
[http://dx.doi.org/10.1073/pnas.2116853118] [PMID: 34969849]
[60]
Li T, Huang T, Guo C, et al. Genomic variation, origin tracing, and vaccine development of SARS-CoV-2: A systematic review. Innovation (N Y) 2021; 2(2): 100116.
[http://dx.doi.org/10.1016/j.xinn.2021.100116] [PMID: 33997827]
[61]
Badhai J, Fröjmark A-S. J Davey E, Schuster J, Dahl N. Ribosomal protein S19 and S24 insufficiency cause distinct cell cycle defects in Diamond-Blackfan anemia. Biochim Biophys Acta 2009; 1792(10): 1036-42.
[http://dx.doi.org/10.1016/j.bbadis.2009.08.002] [PMID: 19689926]
[62]
Wang Y, Sui J, Li X, et al. RPS24 knockdown inhibits colorectal cancer cell migration and proliferation in vitro. Gene 2015; 571(2): 286-91.
[http://dx.doi.org/10.1016/j.gene.2015.06.084] [PMID: 26149657]
[63]
Mösbauer K, Fritsch VN, Adrian L, et al. The effect of allicin on the proteome of SARS-CoV-2 infected calu-3 cells. Front Microbiol 2021; 12: 746795.
[http://dx.doi.org/10.3389/fmicb.2021.746795] [PMID: 34777295]
[64]
Chen L, Li Z, Zeng T, et al. Identifying COVID-19-specific transcriptomic biomarkers with machine learning methods. BioMed Res Int 2021; 2021: 9939134.
[http://dx.doi.org/10.1155/2021/9939134] [PMID: 34307679]
[65]
Zhigang Z, Wenlv S. Prostate stem cell antigen (PSCA) expression in human prostate cancer tissues: implications for prostate carcinogenesis and progression of prostate cancer. Jpn J Clin Oncol 2004; 34(7): 414-9.
[http://dx.doi.org/10.1093/jjco/hyh073] [PMID: 15342669]
[66]
Zeng H-L, Chen D, Yan J, et al. Proteomic characteristics of bronchoalveolar lavage fluid in critical COVID-19 patients. FEBS J 2021; 288(17): 5190-200.
[http://dx.doi.org/10.1111/febs.15609] [PMID: 33098359]
[67]
Bahmad HF, Abou-Kheir W. Crosstalk between COVID-19 and prostate cancer. Prostate Cancer Prostatic Dis 2020; 23(4): 561-3.
[http://dx.doi.org/10.1038/s41391-020-0262-y] [PMID: 32709978]
[68]
Taborska P, Strizova Z, Stakheev D, Sojka L, Bartunkova J, Smrz D. CD4+ T cells of prostate cancer patients have decreased immune responses to antigens derived from sars-cov-2 spike glycoprotein. Front Immunol 2021; 12: 629102.
[http://dx.doi.org/10.3389/fimmu.2021.629102] [PMID: 34012431]
[69]
Coletta S, Salvi V, Della Bella C, et al. The immune receptor CD300e negatively regulates T cell activation by impairing the STAT1-dependent antigen presentation. Sci Rep 2020; 10(1): 16501. [Internet].
[http://dx.doi.org/10.1038/s41598-020-73552-9] [PMID: 33020563]
[70]
Zenarruzabeitia O, Astarloa-Pando G, Terrén I, et al. T cell activation, highly armed cytotoxic cells and a shift in monocytes CD300 receptors expression is characteristic of patients with severe COVID-19. Front Immunol 2021; 12: 655934.
[http://dx.doi.org/10.3389/fimmu.2021.655934] [PMID: 33777054]
[71]
Alvarez Y, Tang X, Coligan JE, Borrego F. The CD300a (IRp60) inhibitory receptor is rapidly up-regulated on human neutrophils in response to inflammatory stimuli and modulates CD32a (FcgammaRIIa) mediated signaling. Mol Immunol 2008; 45(1): 253-8.
[http://dx.doi.org/10.1016/j.molimm.2007.05.006] [PMID: 17588661]
[72]
Georg P, Astaburuaga-García R, Bonaguro L, et al. Complement activation induces excessive T cell cytotoxicity in severe COVID-19. Cell 2022; 185(3): 493-512.e25.
[http://dx.doi.org/10.1016/j.cell.2021.12.040] [PMID: 35032429]
[73]
Caldrer S, Mazzi C, Bernardi M, et al. Regulatory T cells as predictors of clinical course in hospitalised COVID-19 patients. Front Immunol 2021; 12: 789735.
[http://dx.doi.org/10.3389/fimmu.2021.789735] [PMID: 34925369]

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy