Abstract
This paper presents a sequence of steps oriented to gain biological knowledge from microarray gene expression data. The pipeline's core is a canonical multi-objective Genetic Algorithm (GA), which takes a gene expression matrix and a factor as input. The factor groups samples according to different criteria, e.g., healthy tissue and diseased tissue samples. The result of one run of the GA is a gene set with good properties both at the individual level, in terms of differential expression, and at the aggregate level, in terms of correlation between expression profiles. Microarray experiment data are obtained from GEO (Gene Expression Omnibus dataset). As for the pipeline structure, independent runs of the GA are analyzed, genes in common between all the runs are collected, and over-representation analysis is performed. At the end of the process, a small number of genes of interest arise. The methodology is exemplified with a leukemia benchmark dataset, and a group of genes of interest is obtained for the illustrative example.
Keywords: Functional genomics, Genetic Algorithm, microarray data analysis, Gene Ontology, over-representation Analysis.
Graphical Abstract
[http://dx.doi.org/10.1038/75556] [PMID: 10802651]
[http://dx.doi.org/10.1093/nar/gkaa1113] [PMID: 33290552]
[http://dx.doi.org/10.1002/int.22309]
[http://dx.doi.org/10.1016/j.asoc.2020.107067]
[http://dx.doi.org/10.1080/00207721.2015.1022890]
[http://dx.doi.org/10.1016/j.jbi.2015.06.028] [PMID: 26160444]
[http://dx.doi.org/10.1016/j.biosystems.2017.08.006] [PMID: 28860069]
[http://dx.doi.org/10.12659/MSM.910088] [PMID: 30213925]
[http://dx.doi.org/10.1126/science.aao1729] [PMID: 29674565]
[http://dx.doi.org/10.1093/bioinformatics/btu586] [PMID: 25236463]
[http://dx.doi.org/10.1038/nmeth.3773] [PMID: 26901648]
[http://dx.doi.org/10.1038/nmeth.2016] [PMID: 22796662]
[http://dx.doi.org/10.1186/1471-2105-13-328] [PMID: 23217028]
[http://dx.doi.org/10.1186/1471-2105-9-559] [PMID: 19114008]
[http://dx.doi.org/10.1093/bioinformatics/btm129] [PMID: 17483504]
[http://dx.doi.org/10.1038/nmeth0410-247] [PMID: 20354511]
[http://dx.doi.org/10.1016/j.cels.2017.08.014] [PMID: 28957658]
[http://dx.doi.org/10.1186/s12859-015-0745-3] [PMID: 26403471]
[http://dx.doi.org/10.1038/srep18871] [PMID: 26750448]
[http://dx.doi.org/10.1093/bioinformatics/btf877] [PMID: 12584122]
[http://dx.doi.org/10.1126/science.286.5439.531] [PMID: 10521349]
[http://dx.doi.org/10.3892/ijo.2016.3370] [PMID: 26847772]
[http://dx.doi.org/10.1038/leu.2013.76] [PMID: 23478664]
[http://dx.doi.org/10.3109/08880018.2015.1057310]
[http://dx.doi.org/10.1016/S0014-5793(98)00349-4] [PMID: 9599015]
[http://dx.doi.org/10.3109/08880018.2015.1057310] [PMID: 26271020]
[http://dx.doi.org/10.1186/s12915-016-0313-y] [PMID: 27756303]
[http://dx.doi.org/10.1210/jc.2014-4053] [PMID: 25867809]
[http://dx.doi.org/10.1016/j.ccell.2018.06.003] [PMID: 29990496]
[http://dx.doi.org/10.1182/blood-2019-01-852392] [PMID: 30910786]
[http://dx.doi.org/10.1182/blood.V126.23.446.446]
[http://dx.doi.org/10.1158/1535-7163.MCT-18-1112] [PMID: 31358663]
[http://dx.doi.org/10.1182/blood-2005-08-3182]
[http://dx.doi.org/10.1182/blood-2006-05-024844] [PMID: 16888090]
[http://dx.doi.org/10.1182/blood-2018-99-119178]
[http://dx.doi.org/10.1016/j.ccell.2015.05.002] [PMID: 26058075]
[http://dx.doi.org/10.1038/sj.leu.2404357] [PMID: 16900209]
[http://dx.doi.org/10.1182/blood-2008-05-158311] [PMID: 18955566]
[http://dx.doi.org/10.1038/ng.2889] [PMID: 24487275]
[http://dx.doi.org/10.1016/S0301-472X(02)01026-3] [PMID: 12591277]
[http://dx.doi.org/10.3390/ijms22094575] [PMID: 33925480]