Microarray Analysis Workflow Based on a Genetic Algorithm to Discover
Potential Hub Genes

Jessica   Andrea   Carballido

doi:10.2174/1574893617666220804112743

Abstract

This paper presents a sequence of steps oriented to gain biological knowledge from microarray gene expression data. The pipeline's core is a canonical multi-objective Genetic Algorithm (GA), which takes a gene expression matrix and a factor as input. The factor groups samples according to different criteria, e.g., healthy tissue and diseased tissue samples. The result of one run of the GA is a gene set with good properties both at the individual level, in terms of differential expression, and at the aggregate level, in terms of correlation between expression profiles. Microarray experiment data are obtained from GEO (Gene Expression Omnibus dataset). As for the pipeline structure, independent runs of the GA are analyzed, genes in common between all the runs are collected, and over-representation analysis is performed. At the end of the process, a small number of genes of interest arise. The methodology is exemplified with a leukemia benchmark dataset, and a group of genes of interest is obtained for the illustrative example.

Keywords: Functional genomics, Genetic Algorithm, microarray data analysis, Gene Ontology, over-representation Analysis.

[1]
Ashburner M, Ball CA, Blake JA, et al. Gene ontology: tool for the unification of biology. Nat Genet  2000; 25(1): 25-9.
 [http://dx.doi.org/10.1038/75556] [PMID: 10802651]

[2]
Carbon S, Douglass E, Good BM, et al. The gene ontology resource: Enriching a gold mine. Nucleic Acids Res  2021; 49(D1): D325-34.
 [http://dx.doi.org/10.1093/nar/gkaa1113] [PMID: 33290552]

[3]
Hong S, Yue T, Liu H. Vehicle energy system active defense: A health assessment of lithium‐ion batteries. Int J Intell Syst  2020; int.22309.
 [http://dx.doi.org/10.1002/int.22309]

[4]
Hong S, Zeng Y. A health assessment framework of lithium-ion batteries for cyber defense. Appl Soft Comput  2021; 101: 107067.
 [http://dx.doi.org/10.1016/j.asoc.2020.107067]

[5]
Hong S, Yang H, Zhao T, Ma X. Epidemic spreading model of complex dynamical network with the heterogeneity of nodes. Int J Syst Sci  2016; 47(11): 2745-52.
 [http://dx.doi.org/10.1080/00207721.2015.1022890]

[6]
Pontes B, Giráldez R, Aguilar-Ruiz JS. Biclustering on expression data: A review. J Biomed Inform  2015; 57: 163-80.
 [http://dx.doi.org/10.1016/j.jbi.2015.06.028] [PMID: 26160444]

[7]
Dussaut JS, Gallo CA, Cravero F, Martínez MJ, Carballido JA, Ponzoni I. GeRNet: A gene regulatory network tool. Biosystems  2017; 162: 1-11.
 [http://dx.doi.org/10.1016/j.biosystems.2017.08.006] [PMID: 28860069]

[8]
Hu S, Liao Y, Chen L. Identification of key pathways and genes in anaplastic thyroid carcinoma via integrated bioinformatics analysis. Med Sci Monit  2018; 24: 6438-48.
 [http://dx.doi.org/10.12659/MSM.910088] [PMID: 30213925]

[9]
Kuzmin E, VanderSluis B, Wang W, et al. Systematic analysis of complex genetic interactions. Science  2018; 360(6386): eaao1729.
 [http://dx.doi.org/10.1126/science.aao1729] [PMID: 29674565]

[10]
Itzel T, Scholz P, Maass T, et al. Translating bioinformatics in oncology: guilt-by-profiling analysis and identification of KIF18B and CDCA3 as novel driver genes in carcinogenesis. Bioinformatics  2015; 31(2): 216-24.
 [http://dx.doi.org/10.1093/bioinformatics/btu586] [PMID: 25236463]

[11]
Hill SM, Heiser LM, Cokelaer T, et al. Inferring causal molecular networks: empirical assessment through a community-based effort. Nat Methods  2016; 13(4): 310-8.
 [http://dx.doi.org/10.1038/nmeth.3773] [PMID: 26901648]

[12]
Marbach D, Costello JC, Küffner R, et al. Wisdom of crowds for robust gene network inference. Nat Methods  2012; 9(8): 796-804.
 [http://dx.doi.org/10.1038/nmeth.2016] [PMID: 22796662]

[13]
Song L, Langfelder P, Horvath S. Comparison of co-expression measures: Mutual information, correlation, and model based indices. BMC Bioinformatics  2012; 13(1): 328.
 [http://dx.doi.org/10.1186/1471-2105-13-328] [PMID: 23217028]

[14]
Langfelder P, Horvath S. WGCNA: An R package for weighted correlation network analysis. BMC Bioinformatics  2008; 9(1): 559.
 [http://dx.doi.org/10.1186/1471-2105-9-559] [PMID: 19114008]

[15]
Wei Z, Li H. A Markov random field model for network-based analysis of genomic data. Bioinformatics  2007; 23(12): 1537-44.
 [http://dx.doi.org/10.1093/bioinformatics/btm129] [PMID: 17483504]

[16]
Maathuis MH, Colombo D, Kalisch M, Bühlmann P. Predicting causal effects in large-scale systems from observational data. Nat Methods  2010; 7(4): 247-8.
 [http://dx.doi.org/10.1038/nmeth0410-247] [PMID: 20354511]

[17]
Chan TE, Stumpf MPH, Babtie AC. Gene regulatory network inference from single-cell data using multivariate information measures. Cell Syst  2017; 5(3): 251-267.e3.
 [http://dx.doi.org/10.1016/j.cels.2017.08.014] [PMID: 28957658]

[18]
Freytag S, Gagnon-Bartsch J, Speed TP, Bahlo M. Systematic noise degrades gene co-expression signals but can be corrected. BMC Bioinformatics  2015; 16(1): 309.
 [http://dx.doi.org/10.1186/s12859-015-0745-3] [PMID: 26403471]

[19]
Edgar R, Domrachev M, Lash AE. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res  2002; 30(1): 207-10.

[20]
Dong X, Hao Y, Wang X, Tian W. LEGO: A novel method for gene set over-representation analysis by incorporating network-based gene weights. Sci Rep  2016; 6(1): 18871.
 [http://dx.doi.org/10.1038/srep18871] [PMID: 26750448]

[21]
Reiner A, Yekutieli D, Benjamini Y. Identifying differentially expressed genes using false discovery rate controlling procedures. Bioinformatics  2003; 19(3): 368-75.
 [http://dx.doi.org/10.1093/bioinformatics/btf877] [PMID: 12584122]

[22]
Williams S. Pearson’s correlation coefficient. N Z Med J  1996; 109(1015): 38.
 [PMID: 8606814]

[23]
Golub TR, Slonim DK, Tamayo P, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science  1999; 286(5439): 531-7.
 [http://dx.doi.org/10.1126/science.286.5439.531] [PMID: 10521349]

[24]
Di Liddo R, Bridi D, Gottardi M, et al. Adrenomedullin in the growth modulation and differentiation of acute myeloid leukemia cells. Int J Oncol  2016; 48(4): 1659-69.
 [http://dx.doi.org/10.3892/ijo.2016.3370] [PMID: 26847772]

[25]
Kocemba KA, van Andel H, de Haan-Kramer A, et al. The hypoxia target adrenomedullin is aberrantly expressed in multiple myeloma and promotes angiogenesis. Leukemia  2013; 27(8): 1729-37.
 [http://dx.doi.org/10.1038/leu.2013.76] [PMID: 23478664]

[26]
Demirkaya M, Tugcu D, Akcay A, et al. Adrenomedullin--A new marker in febrile neutropenia: Comparison with CRP and procalcitonin. Pediatr Hematol Oncol  2015; 32(7): 482-9.
 [http://dx.doi.org/10.3109/08880018.2015.1057310]

[27]
Kubo A, Minamino N, Isumi Y, Kangawa K, Dohi K, Matsuo H. Adrenomedullin production is correlated with differentiation in human leukemia cell lines and peripheral blood monocytes. FEBS Lett  1998; 426(2): 233-7.
 [http://dx.doi.org/10.1016/S0014-5793(98)00349-4] [PMID: 9599015]

[28]
Demirkaya M, Tugcu D, Akcay A, et al. Adrenomedullin - A new marker in febrile neutropenia: Comparison with CRP and procalcitonin. Pediatr Hematol Oncol  2015; 32(7): 482-9.
 [http://dx.doi.org/10.3109/08880018.2015.1057310] [PMID: 26271020]

[29]
Hnízda A, Škerlová J, Fábry M, et al. Oligomeric interface modulation causes misregulation of purine 5´-nucleotidase in relapsed leukemia. BMC Biol  2016; 14(1): 91.
 [http://dx.doi.org/10.1186/s12915-016-0313-y] [PMID: 27756303]

[30]
Barros-Filho MC, Marchi FA, Pinto CA, Rogatto SR, Kowalski LP. High diagnostic accuracy based on CLDN10, HMGA2, and LAMB3 transcripts in papillary thyroid carcinoma. J Clin Endocrinol Metab  2015; 100(6): E890-9.
 [http://dx.doi.org/10.1210/jc.2014-4053] [PMID: 25867809]

[31]
Dieck CL, Tzoneva G, Forouhar F, et al. Structure and mechanisms of NT5C2 mutations driving thiopurine resistance in relapsed lymphoblastic leukemia. Cancer Cell  2018; 34(1): 136-147.e6.
 [http://dx.doi.org/10.1016/j.ccell.2018.06.003] [PMID: 29990496]

[32]
Dieck CL, Ferrando A. Genetics and mechanisms of NT5C2-driven chemotherapy resistance in relapsed ALL. Blood  2019; 133(21): 2263-8.
 [http://dx.doi.org/10.1182/blood-2019-01-852392] [PMID: 30910786]

[33]
Moriyama T, Meyer J, Liu S, et al. NT5C2 As a major contributor to thiopurine resistance at all relapse via multiple mechanisms. Blood  2015; 126(23): 446.
 [http://dx.doi.org/10.1182/blood.V126.23.446.446]

[34]
C. D. Mechanisms of NT5C2 mutations driving thiopurine resistance in acute lymphoblastic leukemia. Blood 2017.

[35]
Moriyama T, Liu S, Li J, et al. Mechanisms of NT5C2-mediated thiopurine resistance in acute lymphoblastic leukemia. Mol Cancer Ther  2019; 18(10): 1887-95.
 [http://dx.doi.org/10.1158/1535-7163.MCT-18-1112] [PMID: 31358663]

[36]
Burger JA, Kipps TJ. 2006. CXCR4: A key receptor in the crosstalk between tumor cells and their microenvironment. Blood  2006; 107(5): 1761-7.
 [http://dx.doi.org/10.1182/blood-2005-08-3182]

[37]
Spoo AC, Lübbert M, Wierda WG, Burger JA. CXCR4 is a prognostic marker in acute myelogenous leukemia. Blood  2007; 109(2): 786-91.
 [http://dx.doi.org/10.1182/blood-2006-05-024844] [PMID: 16888090]

[38]
Bajaj J, Scott-Browne J, Spinler K, Reya T. An in vivo genome-wide crispr screen identifies novel dependencies for blast crisis chronic myelogenous leukemia. Blood  2018; 132 (Suppl. 1): 1727.
 [http://dx.doi.org/10.1182/blood-2018-99-119178]

[39]
Pitt LA, Tikhonova AN, Hu H, et al. CXCL12-producing vascular endothelial niches control acute T Cell leukemia maintenance. Cancer Cell  2015; 27(6): 755-68.
 [http://dx.doi.org/10.1016/j.ccell.2015.05.002] [PMID: 26058075]

[40]
Ratajczak MZ, Zuba-Surma E, Kucia M, Reca R, Wojakowski W, Ratajczak J. The pleiotropic effects of the SDF-1-CXCR4 axis in organogenesis, regeneration and tumorigenesis. Leukemia  2006; 20(11): 1915-24.
 [http://dx.doi.org/10.1038/sj.leu.2404357] [PMID: 16900209]

[41]
Zeng Z, Shi YX, Samudio IJ, et al. Targeting the leukemia microenvironment by CXCR4 inhibition overcomes resistance to kinase inhibitors and chemotherapy in AML. Blood  2009; 113(24): 6215-24.
 [http://dx.doi.org/10.1182/blood-2008-05-158311] [PMID: 18955566]

[42]
Zimdahl B, Ito T, Blevins A, et al. Lis1 regulates asymmetric division in hematopoietic stem cells and in leukemia. Nat Genet  2014; 46(3): 245-52.
 [http://dx.doi.org/10.1038/ng.2889] [PMID: 24487275]

[43]
Tornhamre S, Stenke L, Granzelius A, et al. Inverse relationship between myeloid maturation and leukotriene C4 synthase expression in normal and leukemic myelopoiesis-consistent overexpression of the enzyme in myeloid cells from patients with chronic myeloid leukemia. Exp Hematol  2003; 31(2): 122-30.
 [http://dx.doi.org/10.1016/S0301-472X(02)01026-3] [PMID: 12591277]

[44]
Sun Y, Dong LJ, Tian F, et al. Identification of acute leukemia-specific genes from leukemia recipient/sibling donor pairs by distinguishing study with oligonucleotide microarrays. Zhongguo Shi Yan Xue Ye Xue Za Zhi  2004; 12(4): 450-4.

[45]
Barresi V, Di Bella V, Andriano N, et al. NUP-98 rearrangements led to the identification of candidate biomarkers for primary induction failure in pediatric acute myeloid leukemia. Int J Mol Sci  2021; 22(9): 4575.
 [http://dx.doi.org/10.3390/ijms22094575] [PMID: 33925480]

Rights & Permissions Print Cite

Article Metrics

1

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/1574893617666220804112743	Print ISSN 1574-8936
Publisher Name Bentham Science Publisher	Online ISSN 2212-392X

Current Bioinformatics

Microarray Analysis Workflow Based on a Genetic Algorithm to Discover Potential Hub Genes

Abstract

Graphical Abstract

Current Bioinformatics

Microarray Analysis Workflow Based on a Genetic Algorithm to Discover Potential Hub Genes

Abstract Play Pause

Graphical Abstract

Related Journals

Related Books

Abstract