Generic placeholder image

Current Bioinformatics

Editor-in-Chief

ISSN (Print): 1574-8936
ISSN (Online): 2212-392X

Research Article

Integration of Multi-Omics Data Using Probabilistic Graph Models and External Knowledge

Author(s): Bridget A. Tripp* and Hasan H. Otu*

Volume 17, Issue 1, 2022

Published on: 06 September, 2021

Page: [37 - 47] Pages: 11

DOI: 10.2174/1574893616666210906141545

Price: $65

Abstract

Background: High-throughput sequencing technologies have revolutionized the ability to perform systems-level biology and elucidate molecular mechanisms of disease through the comprehensive characterization of different layers of biological information. Integration of these heterogeneous layers can provide insight into the underlying biology but is challenged by modeling complex interactions.

Objective: We introduce OBaNK: omics integration using Bayesian networks and external knowledge, an algorithm to model interactions between heterogeneous high-dimensional biological data to elucidate complex functional clusters and emergent relationships associated with an observed phenotype.

Methods: Using Bayesian network learning, we modeled the statistical dependencies and interactions between lipidomics, proteomics, and metabolomics data. The strength of a learned interaction between molecules was altered based on external knowledge.

Results: Networks learned from synthetic datasets based on real pathways achieved an average area under the curve score of ~0.85, an improvement of ~0.23 from baseline methods. When applied to real multi-omics data collected during pregnancy, five distinct functional networks of heterogeneous biological data were identified, and the results were compared to other multi-omics integration approaches.

Conclusion: OBaNK successfully improved the accuracy of learning interaction networks from data integrating external knowledge, identified heterogeneous functional networks from real data, and suggested potential novel interactions associated with the phenotype. These findings can guide future hypothesis generation. OBaNK source code is available at: https://github.com/bridgettripp/OBaNK.git, and a graphical user interface is available at: http://otulab.unl.edu/OBaNK.

Keywords: Multi-omics, Bayesian networks, data integration, external knowledge, interaction network, OBaNK.

Graphical Abstract

[1]
Huang S, Chaudhary K, Garmire LX. More Is better: Recent progress in multi-omics data integration methods. Front Genet 2017; 8(84): 84.
[http://dx.doi.org/10.3389/fgene.2017.00084] [PMID: 28670325]
[2]
Wu C, Zhou F, Ren J, Li X, Jiang Y, Ma S. A selective review of multi-level omics data integration using variable selection. High Throughput 2019; 8(1)E4
[http://dx.doi.org/10.3390/ht8010004] [PMID: 30669303]
[3]
Bersanelli M, Mosca E, Remondini D, et al. Methods for the integration of multi-omics data: mathematical aspects. BMC Bioinformatics 2016; 17(2)(Suppl. 2): 15.
[http://dx.doi.org/10.1186/s12859-015-0857-9] [PMID: 26821531]
[4]
Rohart F, Gautier B, Singh A, Le Cao KA. mixOmics: An R package for 'omics feature selection and multiple data integration. PLOS Comput Biol 2017; 13(11)e1005752
[http://dx.doi.org/10.1371/journal.pcbi.1005752] [PMID: 29099853]
[5]
Singh A, Shannon CP, Gautier B, et al. DIABLO: an integrative approach for identifying key molecular drivers from multi-omics assays. Bioinformatics 2019; 35(17): 3055-62.
[http://dx.doi.org/10.1093/bioinformatics/bty1054] [PMID: 30657866]
[6]
Zhang S, Li Q, Liu J, Zhou XJ. A novel computational framework for simultaneous integration of multiple types of genomic data to identify microRNA-gene regulatory modules. Bioinformatics 2011; 27(13): i401-9.
[http://dx.doi.org/10.1093/bioinformatics/btr206] [PMID: 21685098]
[7]
Zhang S, Liu C-C, Li W, Shen H, Laird PW, Zhou XJ. Discovery of multi-dimensional modules by integrative analysis of cancer genomic data. Nucleic Acids Res 2012; 40(19): 9379-91.
[http://dx.doi.org/10.1093/nar/gks725] [PMID: 22879375]
[8]
Yang Z, Michailidis G. A non-negative matrix factorization method for detecting modules in heterogeneous omics multi-modal data. Bioinformatics 2016; 32(1): 1-8.
[http://dx.doi.org/10.1093/bioinformatics/btw552] [PMID: 26377073]
[9]
Le Cao K-A, Rossouw D. Robert-GraniA(c) C, Besse P. A sparse PLS for variable selection when integrating omics data. Stat Appl Genet Mol Biol 2008; 7(1): 35.
[http://dx.doi.org/10.2202/1544-6115.1390] [PMID: 19049491]
[10]
Conesa A, Prats-Montalban JM, Tarazona S, Nueda MJ, Ferrer A. A multiway approach to data integration in systems biology based on Tucker3 and N-PLS. Chemom Intell Lab Syst 2010; 104(1): 101-11.
[http://dx.doi.org/10.1016/j.chemolab.2010.06.004]
[11]
Hassani S, Martens H, Qannari EM, Hanafi M, Borge GI, Kohler A. Analysis of -omics data: Graphical interpretation- and validation tools in multi-block methods. Chemom Intell Lab Syst 2010; 104(1): 140-53.
[http://dx.doi.org/10.1016/j.chemolab.2010.08.008]
[12]
Hassani S, Hanafi M, Qannari EM, Kohler A. Deflation strategies for multi-block principal component analysis revisited. Chemom Intell Lab Syst 2013; 120: 154-68.
[http://dx.doi.org/10.1016/j.chemolab.2012.08.011]
[13]
Meng C, Kuster B, Culhane AC, Gholami AM. A multivariate approach to the integration of multi-omics datasets. BMC Bioinformatics 2014; 15: 162.
[http://dx.doi.org/10.1186/1471-2105-15-162] [PMID: 24884486]
[14]
Lock EF, Hoadley KA, Marron JS, Nobel AB. Joint and individual variation explained (JIVE) for integrated analysis of multiple data types. Ann Appl Stat 2013; 7(1): 523-42.
[http://dx.doi.org/10.1214/12-AOAS597] [PMID: 23745156]
[15]
Jendoubi T, Strimmer K. A whitening approach to probabilistic canonical correlation analysis for omics data integration. BMC Bioinformatics 2019; 20(1): 15.
[http://dx.doi.org/10.1186/s12859-018-2572-9] [PMID: 30626338]
[16]
Shen R, Olshen AB, Ladanyi M. Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics 2009; 25(22): 2906-12.
[http://dx.doi.org/10.1093/bioinformatics/btp543] [PMID: 19759197]
[17]
Shen R, Wang S, Mo Q. Sparse integrative clustering of multiple omics data sets. Ann Appl Stat 2013; 7(1): 269-94.
[http://dx.doi.org/10.1214/12-AOAS578] [PMID: 24587839]
[18]
Eichner J, Rosenbaum L, Wrzodek C. HAring H-U, Zell A, Lehmann R. Integrated enrichment analysis and pathway-centered visualization of metabolomics, proteomics, transcriptomics, and genomics data by using the InCroMAP software. J Chromatogr B Analyt Technol Biomed Life Sci 2014; 966: 77-82.
[http://dx.doi.org/10.1016/j.jchromb.2014.04.030] [PMID: 24811976]
[19]
Kirk P, Griffin JE, Savage RS, Ghahramani Z, Wild DL. Bayesian correlated clustering to integrate multiple datasets. Bioinformatics 2012; 28(24): 3290-7.
[http://dx.doi.org/10.1093/bioinformatics/bts595] [PMID: 23047558]
[20]
Meng C, Helm D, Frejno M, Kuster B. moCluster: Identifying joint patterns across multiple omics data sets. J Proteome Res 2016; 15(3): 755-65.
[http://dx.doi.org/10.1021/acs.jproteome.5b00824] [PMID: 26653205]
[21]
Lock EF, Dunson DB. Bayesian consensus clustering. Bioinformatics 2013; 29(20): 2610-6.
[http://dx.doi.org/10.1093/bioinformatics/btt425] [PMID: 23990412]
[22]
Newman AM, Cooper JB. AutoSOME: a clustering method for identifying gene expression modules without prior knowledge of cluster number. BMC Bioinformatics 2010; 11(1): 117.
[http://dx.doi.org/10.1186/1471-2105-11-117] [PMID: 20202218]
[23]
Lavender CA, Shapiro AJ, Burkholder AB, Bennett BD, Adelman K, Fargo DC. ORIO (online resource for integrative omics): a web-based platform for rapid integration of next generation sequencing data. Nucleic Acids Res 2017; 45(10): 5678-90.
[http://dx.doi.org/10.1093/nar/gkx270] [PMID: 28402545]
[24]
Le Cao K-A, Gonzalez I. DA(c)jean S. integrOmics: an R package to unravel relationships between two omics datasets. Bioinformatics 2009; 25(21): 2855-6.
[http://dx.doi.org/10.1093/bioinformatics/btp515] [PMID: 19706745]
[25]
Sharma A, Shigemizu D, Boroevich KA, et al. Stepwise iterative maximum likelihood clustering approach. BMC Bioinformatics 2016; 17(1): 319.
[http://dx.doi.org/10.1186/s12859-016-1184-5] [PMID: 27553625]
[26]
Hellton KH, Thoresen M. Integrative clustering of high-dimensional data with joint and individual clusters. Biostatistics 2016; 17(3): 537-48.
[http://dx.doi.org/10.1093/biostatistics/kxw005] [PMID: 26917056]
[27]
Mo Q, Shen R, Guo C, Vannucci M, Chan KS, Hilsenbeck SG. A fully Bayesian latent variable model for integrative clustering analysis of multi-type omics data. Biostatistics 2018; 19(1): 71-86.
[http://dx.doi.org/10.1093/biostatistics/kxx017] [PMID: 28541380]
[28]
Wang W, Baladandayuthapani V, Morris JS, Broom BM, Manyam G, Do K-A. iBAG: integrative Bayesian analysis of high-dimensional multiplatform genomics data. Bioinformatics 2013; 29(2): 149-59.
[http://dx.doi.org/10.1093/bioinformatics/bts655] [PMID: 23142963]
[29]
Gogoshin G, Boerwinkle E, Rodin AS. New algorithm and software (BNOmics) for inferring and visualizing Bayesian networks from heterogeneous big biological and genetic data. J Comput Biol 2017; 24(4): 340-56.
[http://dx.doi.org/10.1089/cmb.2016.0100] [PMID: 27681505]
[30]
khmedov M, Arribas A, Montemanni R, Bertoni F, Kwee I. OmicsNet: Integration of multi-omics data using path analysis in multilayer networks. bioRxiv 2017; •••238766
[31]
Bonnet E, Calzone L, Michoel T. Integrative multi-omics module network inference with Lemon-Tree. PLOS Comput Biol 2015; 11(2)e1003983
[http://dx.doi.org/10.1371/journal.pcbi.1003983] [PMID: 25679508]
[32]
Vaske CJ, Benz SC, Sanborn JZ, et al. Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics 2010; 26(12): i237-45.
[http://dx.doi.org/10.1093/bioinformatics/btq182] [PMID: 20529912]
[33]
Wang B, Mezlini AM, Demir F, et al. Similarity network fusion for aggregating data types on a genomic scale. Nat Methods 2014; 11(3): 333-7.
[http://dx.doi.org/10.1038/nmeth.2810] [PMID: 24464287]
[34]
Friedman N, Linial M, Nachman I, Pe’er D. Using Bayesian networks to analyze expression data. J Comput Biol 2000; 7(3-4): 601-20.
[http://dx.doi.org/10.1089/106652700750050961] [PMID: 11108481]
[35]
Pearl J, Verma TS. A theory of inferred causation.Studies in logic and the foundations of mathematics. Elsevier In: 1995; 134: pp. 789-811.
[36]
Isci S, Ozturk C, Jones J, Otu HH. Pathway analysis of high-throughput biological data within a Bayesian network framework. Bioinformatics 2011; 27(12): 1667-74.
[http://dx.doi.org/10.1093/bioinformatics/btr269] [PMID: 21551144]
[37]
Isci S, Dogan H, Ozturk C, Otu HH. Bayesian network prior: network analysis of biological data using external knowledge. Bioinformatics 2014; 30(6): 860-7.
[http://dx.doi.org/10.1093/bioinformatics/btt643] [PMID: 24215027]
[38]
Korucuoglu M, Isci S, Ozgur A, Otu HH. Bayesian pathway analysis of cancer microarray data. PLoS One 2014; 9(7)e102803
[http://dx.doi.org/10.1371/journal.pone.0102803] [PMID: 25036210]
[39]
Deeter A, Dalman M, Haddad J, Duan Z-H. Inferring gene and protein interactions using PubMed citations and consensus Bayesian networks. PLoS One 2017; 12(10)e0186004
[http://dx.doi.org/10.1371/journal.pone.0186004] [PMID: 29049295]
[40]
Hobbs ET, Pereira T, O’Neill PK, Erill I. A Bayesian inference method for the analysis of transcriptional regulatory networks in metagenomic data. Algorithms Mol Biol 2016; 11(1): 19.
[http://dx.doi.org/10.1186/s13015-016-0082-8] [PMID: 27398089]
[41]
Van den Bulcke T, Van Leemput K, Naudts B, et al. SynTReN: a generator of synthetic gene expression data for design and analysis of structure learning algorithms. BMC Bioinformatics 2006; 7: 43.
[http://dx.doi.org/10.1186/1471-2105-7-43] [PMID: 16438721]
[42]
Team RC R. a language and environment for statistical computing.Vienna, Austria: R Foundation for Statistical Computing 2019.
[43]
Scutari M. Learning Bayesian networks with the bnlearn R package. J Stat Softw 2010; 1(3): 2010.
[44]
Clarke EJ, Barton BA. Entropy and MDL discretization of continuous variables for Bayesian belief networks. Int J Intell Syst 2000; 15(1): 61-92.
[http://dx.doi.org/10.1002/(SICI)1098-111X(200001)15:161:AID-INT43.0.CO;2-O]
[45]
Dougherty J, Kohavi R, Sahami M. Supervised and unsupervised discretization of continuous featuresMachine learning proceedings 1995. San Francisco, CA: Morgan Kaufmann 1995; pp. 194-202.
[http://dx.doi.org/10.1016/B978-1-55860-377-6.50032-3]
[46]
Hartemink AJ. Principled computational methods for the validation and discovery of genetic regulatory networks. Massachusetts Institute of Technology 2001.
[47]
Tsamardinos I, Brown LE, Aliferis CF. The max-min hill-climbing Bayesian network structure learning algorithm. Mach Learn 2006; 65(1): 31-78.
[http://dx.doi.org/10.1007/s10994-006-6889-7]
[48]
Scutari M, Graafland CE. GutiA(c)rrez JM. Who learns better Bayesian network structures: Accuracy and speed of structure learning algorithms. Int J Approx Reason 2019; 115: 235-53.
[http://dx.doi.org/10.1016/j.ijar.2019.10.003]
[49]
Scutari M, Nagarajan R. Identifying significant edges in graphical models of molecular networks. Artif Intell Med 2013; 57(3): 207-17.
[http://dx.doi.org/10.1016/j.artmed.2012.12.006] [PMID: 23395009]
[50]
Chanumolu SK, Albahrani M, Can H, Otu HH. KEGG2Net: Deducing gene interaction networks and acyclic graphs from KEGG pathways. EMBnet J 2021; 26e949
[http://dx.doi.org/10.14806/ej.26.0.949] [PMID: 33880340]
[51]
Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 2000; 28(1): 27-30.
[http://dx.doi.org/10.1093/nar/28.1.27] [PMID: 10592173]
[52]
Brunk E, Sahoo S, Zielinski DC, et al. Recon3D enables a three-dimensional view of gene variation in human metabolism. Nat Biotechnol 2018; 36(3): 272-81.
[http://dx.doi.org/10.1038/nbt.4072] [PMID: 29457794]
[53]
Aimo L, Liechti R, Hyka-Nouspikel N, et al. The SwissLipids knowledgebase for lipid biology. Bioinformatics 2015; 31(17): 2860-6.
[http://dx.doi.org/10.1093/bioinformatics/btv285] [PMID: 25943471]
[54]
Wishart DS, Tzur D, Knox C, et al. HMDB: the human metabolome database. Nucleic Acids Res 2007; 35(Database issue): D521-6.
[http://dx.doi.org/10.1093/nar/gkl923] [PMID: 17202168]
[55]
Wishart DS, Knox C, Guo AC, et al. HMDB: a knowledgebase for the human metabolome. Nucleic Acids Res 2009; 37(Database issue): D603-10.
[http://dx.doi.org/10.1093/nar/gkn810] [PMID: 18953024]
[56]
Wishart DS, Feunang YD, Marcu A, et al. HMDB 4.0: the human metabolome database for 2018. Nucleic Acids Res 2018; 46(D1): D608-17.
[http://dx.doi.org/10.1093/nar/gkx1089] [PMID: 29140435]
[57]
Fahy E, Sud M, Cotter D, Subramaniam S. LIPID MAPS online tools for lipid research. Nucleic Acids Res 2007; 35(Web Server issue): W606-12.
[http://dx.doi.org/10.1093/nar/gkm324]
[58]
Ghaemi MS, DiGiulio DB, Contrepois K, et al. Multiomics modeling of the immunome, transcriptome, microbiome, proteome and metabolome adaptations during human pregnancy. Bioinformatics 2019; 35(1): 95-103.
[PMID: 30561547]
[59]
Tkachenko O, Shchekochikhin D, Schrier RW. Hormones and hemodynamics in pregnancy. Int J Endocrinol Metab 2014; 12(2)e14098
[http://dx.doi.org/10.5812/ijem.14098] [PMID: 24803942]
[60]
Fountain JH, Lappin SL. Physiology, renin angiotensin systemStatPearls. Treasure Island, FL: StatPearls Publishing 2021.
[61]
van Buul EJ, Steegers EA, Jongsma HW, Eskes TK, Thomas CM, Hein PR. Haematological and biochemical profile of uncomplicated pregnancy in nulliparous women; a longitudinal study. Neth J Med 1995; 46(2): 73-85.
[http://dx.doi.org/10.1016/0300-2977(94)00104-H] [PMID: 7885525]
[62]
Teasdale S, Morton A. Changes in biochemical tests in pregnancy and their clinical significance. Obstet Med 2018; 11(4): 160-70.
[http://dx.doi.org/10.1177/1753495X18766170] [PMID: 30574177]
[63]
Bhoopalan SV, Huang LJ, Weiss MJ. Erythropoietin regulation of red blood cell production: from bench to bedside and back. F1000 Res 2020; 9: 9.
[http://dx.doi.org/10.12688/f1000research.26648.1] [PMID: 32983414]
[64]
Watowich SS. The erythropoietin receptor: molecular structure and hematopoietic signaling pathways. J Investig Med 2011; 59(7): 1067-72.
[http://dx.doi.org/10.2310/JIM.0b013e31820fb28c]
[65]
Whipple GH, Robscheit-Robbins FS. Amino acids and hemoglobin production in anemia. J Exp Med 1940; 71(4): 569-83.
[http://dx.doi.org/10.1084/jem.71.4.569] [PMID: 19870982]
[66]
Enko D, Moro T, Holasek S, et al. Branched-chain amino acids are linked with iron metabolism. Ann Transl Med 2020; 8(23): 1569.
[http://dx.doi.org/10.21037/atm-20-624a] [PMID: 33437768]
[67]
Pagani A, Nai A, Silvestri L, Camaschella C. Hepcidin and anemia: a tight relationship. Front Physiol 2019; 10(1294): 1294.
[http://dx.doi.org/10.3389/fphys.2019.01294] [PMID: 31649559]
[68]
Rossi E. Hepcidin-the iron regulatory hormone. Clin Biochem Rev 2005; 26(3): 47-9.
[PMID: 16450011]
[69]
Carafoli F, Hohenester E. Collagen recognition and transmembrane signalling by discoidin domain receptors. Biochim Biophys Acta 2013; 1834(10): 2187-94.
[http://dx.doi.org/10.1016/j.bbapap.2012.10.014] [PMID: 23128141]
[70]
Knowles PP, Murray-Rust J, Kjaer S, et al. Structure and chemical inhibition of the RET tyrosine kinase domain. J Biol Chem 2006; 281(44): 33577-87.
[http://dx.doi.org/10.1074/jbc.M605604200] [PMID: 16928683]
[71]
Aghaeepour N, Lehallier B, Baca Q, Ganio EA, Wong RJ, Ghaemi MS, et al. A proteomic clock of human pregnancy. Am J Obstet Gynecol 2018; 218(3): 347.e1-347.e14.
[http://dx.doi.org/10.1016/j.ajog.2017.12.208]
[72]
Vogel WF. AszA3di A, Alves F, Pawson T. Discoidin domain receptor 1 tyrosine kinase has an essential role in mammary gland development. Mol Cell Biol 2001; 21(8): 2906-17.
[http://dx.doi.org/10.1128/MCB.21.8.2906-2917.2001] [PMID: 11283268]
[73]
Kempf T, Guba-Quint A, Torgerson J, et al. Growth differentiation factor 15 predicts future insulin resistance and impaired glucose control in obese nondiabetic individuals: results from the XENDOS trial. Eur J Endocrinol 2012; 167(5): 671-8.
[http://dx.doi.org/10.1530/EJE-12-0466] [PMID: 22918303]
[74]
Tang M, Luo M, Lu W, et al. Serum growth differentiation factor 15 is associated with glucose metabolism in the third trimester in Chinese pregnant women. Diabetes Res Clin Pract 2019; 156107823
[http://dx.doi.org/10.1016/j.diabres.2019.107823] [PMID: 31446114]

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy