Integration of Multi-Omics Data Using Probabilistic Graph Models and
External Knowledge

Bridget    A.    Tripp; Hasan    H.    Otu

doi:10.2174/1574893616666210906141545

Abstract

Background: High-throughput sequencing technologies have revolutionized the ability to perform systems-level biology and elucidate molecular mechanisms of disease through the comprehensive characterization of different layers of biological information. Integration of these heterogeneous layers can provide insight into the underlying biology but is challenged by modeling complex interactions.

Objective: We introduce OBaNK: omics integration using Bayesian networks and external knowledge, an algorithm to model interactions between heterogeneous high-dimensional biological data to elucidate complex functional clusters and emergent relationships associated with an observed phenotype.

Methods: Using Bayesian network learning, we modeled the statistical dependencies and interactions between lipidomics, proteomics, and metabolomics data. The strength of a learned interaction between molecules was altered based on external knowledge.

Results: Networks learned from synthetic datasets based on real pathways achieved an average area under the curve score of ~0.85, an improvement of ~0.23 from baseline methods. When applied to real multi-omics data collected during pregnancy, five distinct functional networks of heterogeneous biological data were identified, and the results were compared to other multi-omics integration approaches.

Conclusion: OBaNK successfully improved the accuracy of learning interaction networks from data integrating external knowledge, identified heterogeneous functional networks from real data, and suggested potential novel interactions associated with the phenotype. These findings can guide future hypothesis generation. OBaNK source code is available at: https://github.com/bridgettripp/OBaNK.git, and a graphical user interface is available at: http://otulab.unl.edu/OBaNK.

Keywords: Multi-omics, Bayesian networks, data integration, external knowledge, interaction network, OBaNK.

« Previous Next »

Graphical Abstract

[1] 
Huang S, Chaudhary K, Garmire LX. More Is better: Recent progress in multi-omics data integration methods. Front Genet  2017; 8(84): 84.
[http://dx.doi.org/10.3389/fgene.2017.00084] [PMID: 28670325] 
[2] 
Wu C, Zhou F, Ren J, Li X, Jiang Y, Ma S. A selective review of multi-level omics data integration using variable selection. High Throughput  2019; 8(1)E4
[http://dx.doi.org/10.3390/ht8010004] [PMID: 30669303] 
[3] 
Bersanelli M, Mosca E, Remondini D, et al. Methods for the integration of multi-omics data: mathematical aspects. BMC Bioinformatics  2016; 17(2)(Suppl. 2): 15.
[http://dx.doi.org/10.1186/s12859-015-0857-9] [PMID: 26821531] 
[4] 
Rohart F, Gautier B, Singh A, Le Cao KA. mixOmics: An R package for 'omics feature selection and multiple data integration. PLOS Comput Biol  2017; 13(11)e1005752
[http://dx.doi.org/10.1371/journal.pcbi.1005752] [PMID: 29099853] 
[5] 
Singh A, Shannon CP, Gautier B, et al. DIABLO: an integrative approach for identifying key molecular drivers from multi-omics assays. Bioinformatics  2019; 35(17): 3055-62.
[http://dx.doi.org/10.1093/bioinformatics/bty1054] [PMID: 30657866] 
[6] 
Zhang S, Li Q, Liu J, Zhou XJ. A novel computational framework for simultaneous integration of multiple types of genomic data to identify microRNA-gene regulatory modules. Bioinformatics  2011; 27(13): i401-9.
[http://dx.doi.org/10.1093/bioinformatics/btr206] [PMID: 21685098] 
[7] 
Zhang S, Liu C-C, Li W, Shen H, Laird PW, Zhou XJ. Discovery of multi-dimensional modules by integrative analysis of cancer genomic data. Nucleic Acids Res  2012; 40(19): 9379-91.
[http://dx.doi.org/10.1093/nar/gks725] [PMID: 22879375] 
[8] 
Yang Z, Michailidis G. A non-negative matrix factorization method for detecting modules in heterogeneous omics multi-modal data. Bioinformatics  2016; 32(1): 1-8.
[http://dx.doi.org/10.1093/bioinformatics/btw552] [PMID: 26377073] 
[9] 
Le Cao K-A, Rossouw D. Robert-GraniA(c) C, Besse P. A sparse PLS for variable selection when integrating omics data. Stat Appl Genet Mol Biol  2008; 7(1): 35.
[http://dx.doi.org/10.2202/1544-6115.1390] [PMID: 19049491] 
[10] 
Conesa A, Prats-Montalban JM, Tarazona S, Nueda MJ, Ferrer A. A multiway approach to data integration in systems biology based on Tucker3 and N-PLS. Chemom Intell Lab Syst  2010; 104(1): 101-11.
[http://dx.doi.org/10.1016/j.chemolab.2010.06.004] 
[11] 
Hassani S, Martens H, Qannari EM, Hanafi M, Borge GI, Kohler A. Analysis of -omics data: Graphical interpretation- and validation tools in multi-block methods. Chemom Intell Lab Syst  2010; 104(1): 140-53.
[http://dx.doi.org/10.1016/j.chemolab.2010.08.008] 
[12] 
Hassani S, Hanafi M, Qannari EM, Kohler A. Deflation strategies for multi-block principal component analysis revisited. Chemom Intell Lab Syst  2013; 120: 154-68.
[http://dx.doi.org/10.1016/j.chemolab.2012.08.011] 
[13] 
Meng C, Kuster B, Culhane AC, Gholami AM. A multivariate approach to the integration of multi-omics datasets. BMC Bioinformatics  2014; 15: 162.
[http://dx.doi.org/10.1186/1471-2105-15-162] [PMID: 24884486] 
[14] 
Lock EF, Hoadley KA, Marron JS, Nobel AB. Joint and individual variation explained (JIVE) for integrated analysis of multiple data types. Ann Appl Stat  2013; 7(1): 523-42.
[http://dx.doi.org/10.1214/12-AOAS597] [PMID: 23745156] 
[15] 
Jendoubi T, Strimmer K. A whitening approach to probabilistic canonical correlation analysis for omics data integration. BMC Bioinformatics  2019; 20(1): 15.
[http://dx.doi.org/10.1186/s12859-018-2572-9] [PMID: 30626338] 
[16] 
Shen R, Olshen AB, Ladanyi M. Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics  2009; 25(22): 2906-12.
[http://dx.doi.org/10.1093/bioinformatics/btp543] [PMID: 19759197] 
[17] 
Shen R, Wang S, Mo Q. Sparse integrative clustering of multiple omics data sets. Ann Appl Stat  2013; 7(1): 269-94.
[http://dx.doi.org/10.1214/12-AOAS578] [PMID: 24587839] 
[18] 
Eichner J, Rosenbaum L, Wrzodek C. HAring H-U, Zell A, Lehmann R. Integrated enrichment analysis and pathway-centered visualization of metabolomics, proteomics, transcriptomics, and genomics data by using the InCroMAP software. J Chromatogr B Analyt Technol Biomed Life Sci  2014; 966: 77-82.
[http://dx.doi.org/10.1016/j.jchromb.2014.04.030] [PMID: 24811976] 
[19] 
Kirk P, Griffin JE, Savage RS, Ghahramani Z, Wild DL. Bayesian correlated clustering to integrate multiple datasets. Bioinformatics  2012; 28(24): 3290-7.
[http://dx.doi.org/10.1093/bioinformatics/bts595] [PMID: 23047558] 
[20] 
Meng C, Helm D, Frejno M, Kuster B. moCluster: Identifying joint patterns across multiple omics data sets. J Proteome Res  2016; 15(3): 755-65.
[http://dx.doi.org/10.1021/acs.jproteome.5b00824] [PMID: 26653205] 
[21] 
Lock EF, Dunson DB. Bayesian consensus clustering. Bioinformatics  2013; 29(20): 2610-6.
[http://dx.doi.org/10.1093/bioinformatics/btt425] [PMID: 23990412] 
[22] 
Newman AM, Cooper JB. AutoSOME: a clustering method for identifying gene expression modules without prior knowledge of cluster number. BMC Bioinformatics  2010; 11(1): 117.
[http://dx.doi.org/10.1186/1471-2105-11-117] [PMID: 20202218] 
[23] 
Lavender CA, Shapiro AJ, Burkholder AB, Bennett BD, Adelman K, Fargo DC. ORIO (online resource for integrative omics): a web-based platform for rapid integration of next generation sequencing data. Nucleic Acids Res  2017; 45(10): 5678-90.
[http://dx.doi.org/10.1093/nar/gkx270] [PMID: 28402545] 
[24] 
Le Cao K-A, Gonzalez I. DA(c)jean S. integrOmics: an R package to unravel relationships between two omics datasets. Bioinformatics  2009; 25(21): 2855-6.
[http://dx.doi.org/10.1093/bioinformatics/btp515] [PMID: 19706745] 
[25] 
Sharma A, Shigemizu D, Boroevich KA, et al. Stepwise iterative maximum likelihood clustering approach. BMC Bioinformatics  2016; 17(1): 319.
[http://dx.doi.org/10.1186/s12859-016-1184-5] [PMID: 27553625] 
[26] 
Hellton KH, Thoresen M. Integrative clustering of high-dimensional data with joint and individual clusters. Biostatistics  2016; 17(3): 537-48.
[http://dx.doi.org/10.1093/biostatistics/kxw005] [PMID: 26917056] 
[27] 
Mo Q, Shen R, Guo C, Vannucci M, Chan KS, Hilsenbeck SG. A fully Bayesian latent variable model for integrative clustering analysis of multi-type omics data. Biostatistics  2018; 19(1): 71-86.
[http://dx.doi.org/10.1093/biostatistics/kxx017] [PMID: 28541380] 
[28] 
Wang W, Baladandayuthapani V, Morris JS, Broom BM, Manyam G, Do K-A. iBAG: integrative Bayesian analysis of high-dimensional multiplatform genomics data. Bioinformatics  2013; 29(2): 149-59.
[http://dx.doi.org/10.1093/bioinformatics/bts655] [PMID: 23142963] 
[29] 
Gogoshin G, Boerwinkle E, Rodin AS. New algorithm and software (BNOmics) for inferring and visualizing Bayesian networks from heterogeneous big biological and genetic data. J Comput Biol  2017; 24(4): 340-56.
[http://dx.doi.org/10.1089/cmb.2016.0100] [PMID: 27681505] 
[30] 
khmedov M, Arribas A, Montemanni R, Bertoni F, Kwee I. OmicsNet: Integration of multi-omics data using path analysis in multilayer networks. bioRxiv  2017; •••238766
[31] 
Bonnet E, Calzone L, Michoel T. Integrative multi-omics module network inference with Lemon-Tree. PLOS Comput Biol  2015; 11(2)e1003983
[http://dx.doi.org/10.1371/journal.pcbi.1003983] [PMID: 25679508] 
[32] 
Vaske CJ, Benz SC, Sanborn JZ, et al. Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics  2010; 26(12): i237-45.
[http://dx.doi.org/10.1093/bioinformatics/btq182] [PMID: 20529912] 
[33] 
Wang B, Mezlini AM, Demir F, et al. Similarity network fusion for aggregating data types on a genomic scale. Nat Methods  2014; 11(3): 333-7.
[http://dx.doi.org/10.1038/nmeth.2810] [PMID: 24464287] 
[34] 
Friedman N, Linial M, Nachman I, Pe’er D. Using Bayesian networks to analyze expression data. J Comput Biol  2000; 7(3-4): 601-20.
[http://dx.doi.org/10.1089/106652700750050961] [PMID: 11108481] 
[35] 
Pearl J, Verma TS. A theory of inferred causation.Studies in logic and the foundations of mathematics. Elsevier In:  1995; 134: pp. 789-811.
[36] 
Isci S, Ozturk C, Jones J, Otu HH. Pathway analysis of high-throughput biological data within a Bayesian network framework. Bioinformatics  2011; 27(12): 1667-74.
[http://dx.doi.org/10.1093/bioinformatics/btr269] [PMID: 21551144] 
[37] 
Isci S, Dogan H, Ozturk C, Otu HH. Bayesian network prior: network analysis of biological data using external knowledge. Bioinformatics  2014; 30(6): 860-7.
[http://dx.doi.org/10.1093/bioinformatics/btt643] [PMID: 24215027] 
[38] 
Korucuoglu M, Isci S, Ozgur A, Otu HH. Bayesian pathway analysis of cancer microarray data. PLoS One  2014; 9(7)e102803
[http://dx.doi.org/10.1371/journal.pone.0102803] [PMID: 25036210] 
[39] 
Deeter A, Dalman M, Haddad J, Duan Z-H. Inferring gene and protein interactions using PubMed citations and consensus Bayesian networks. PLoS One  2017; 12(10)e0186004
[http://dx.doi.org/10.1371/journal.pone.0186004] [PMID: 29049295] 
[40] 
Hobbs ET, Pereira T, O’Neill PK, Erill I. A Bayesian inference method for the analysis of transcriptional regulatory networks in metagenomic data. Algorithms Mol Biol  2016; 11(1): 19.
[http://dx.doi.org/10.1186/s13015-016-0082-8] [PMID: 27398089] 
[41] 
Van den Bulcke T, Van Leemput K, Naudts B, et al. SynTReN: a generator of synthetic gene expression data for design and analysis of structure learning algorithms. BMC Bioinformatics  2006; 7: 43.
[http://dx.doi.org/10.1186/1471-2105-7-43] [PMID: 16438721] 
[42] 
Team RC R. a language and environment for statistical computing.Vienna, Austria: R Foundation for Statistical Computing 2019.
[43] 
Scutari M. Learning Bayesian networks with the bnlearn R package. J Stat Softw  2010; 1(3): 2010.
[44] 
Clarke EJ, Barton BA. Entropy and MDL discretization of continuous variables for Bayesian belief networks. Int J Intell Syst  2000; 15(1): 61-92.
[http://dx.doi.org/10.1002/(SICI)1098-111X(200001)15:161:AID-INT43.0.CO;2-O] 
[45] 
Dougherty J, Kohavi R, Sahami M. Supervised and unsupervised discretization of continuous featuresMachine learning proceedings 1995.  San Francisco, CA: Morgan Kaufmann 1995; pp. 194-202.
[http://dx.doi.org/10.1016/B978-1-55860-377-6.50032-3] 
[46] 
Hartemink AJ. Principled computational methods for the validation and discovery of genetic regulatory networks. Massachusetts Institute of Technology 2001.
[47] 
Tsamardinos I, Brown LE, Aliferis CF. The max-min hill-climbing Bayesian network structure learning algorithm. Mach Learn  2006; 65(1): 31-78.
[http://dx.doi.org/10.1007/s10994-006-6889-7] 
[48] 
Scutari M, Graafland CE. GutiA(c)rrez JM. Who learns better Bayesian network structures: Accuracy and speed of structure learning algorithms. Int J Approx Reason  2019; 115: 235-53.
[http://dx.doi.org/10.1016/j.ijar.2019.10.003] 
[49] 
Scutari M, Nagarajan R. Identifying significant edges in graphical models of molecular networks. Artif Intell Med  2013; 57(3): 207-17.
[http://dx.doi.org/10.1016/j.artmed.2012.12.006] [PMID: 23395009] 
[50] 
Chanumolu SK, Albahrani M, Can H, Otu HH. KEGG2Net: Deducing gene interaction networks and acyclic graphs from KEGG pathways. EMBnet J  2021; 26e949
[http://dx.doi.org/10.14806/ej.26.0.949] [PMID: 33880340] 
[51] 
Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res  2000; 28(1): 27-30.
[http://dx.doi.org/10.1093/nar/28.1.27] [PMID: 10592173] 
[52] 
Brunk E, Sahoo S, Zielinski DC, et al. Recon3D enables a three-dimensional view of gene variation in human metabolism. Nat Biotechnol  2018; 36(3): 272-81.
[http://dx.doi.org/10.1038/nbt.4072] [PMID: 29457794] 
[53] 
Aimo L, Liechti R, Hyka-Nouspikel N, et al. The SwissLipids knowledgebase for lipid biology. Bioinformatics  2015; 31(17): 2860-6.
[http://dx.doi.org/10.1093/bioinformatics/btv285] [PMID: 25943471] 
[54] 
Wishart DS, Tzur D, Knox C, et al. HMDB: the human metabolome database. Nucleic Acids Res  2007; 35(Database issue): D521-6.
[http://dx.doi.org/10.1093/nar/gkl923] [PMID: 17202168] 
[55] 
Wishart DS, Knox C, Guo AC, et al. HMDB: a knowledgebase for the human metabolome. Nucleic Acids Res  2009; 37(Database issue): D603-10.
[http://dx.doi.org/10.1093/nar/gkn810] [PMID: 18953024] 
[56] 
Wishart DS, Feunang YD, Marcu A, et al. HMDB 4.0: the human metabolome database for 2018. Nucleic Acids Res  2018; 46(D1): D608-17.
[http://dx.doi.org/10.1093/nar/gkx1089] [PMID: 29140435] 
[57] 
Fahy E, Sud M, Cotter D, Subramaniam S. LIPID MAPS online tools for lipid research. Nucleic Acids Res  2007; 35(Web Server issue): W606-12.
[http://dx.doi.org/10.1093/nar/gkm324] 
[58] 
Ghaemi MS, DiGiulio DB, Contrepois K, et al. Multiomics modeling of the immunome, transcriptome, microbiome, proteome and metabolome adaptations during human pregnancy. Bioinformatics  2019; 35(1): 95-103.
[PMID: 30561547] 
[59] 
Tkachenko O, Shchekochikhin D, Schrier RW. Hormones and hemodynamics in pregnancy. Int J Endocrinol Metab  2014; 12(2)e14098
[http://dx.doi.org/10.5812/ijem.14098] [PMID: 24803942] 
[60] 
Fountain JH, Lappin SL. Physiology, renin angiotensin systemStatPearls. Treasure Island, FL: StatPearls Publishing 2021.
[61] 
van Buul EJ, Steegers EA, Jongsma HW, Eskes TK, Thomas CM, Hein PR. Haematological and biochemical profile of uncomplicated pregnancy in nulliparous women; a longitudinal study. Neth J Med  1995; 46(2): 73-85.
[http://dx.doi.org/10.1016/0300-2977(94)00104-H] [PMID: 7885525] 
[62] 
Teasdale S, Morton A. Changes in biochemical tests in pregnancy and their clinical significance. Obstet Med  2018; 11(4): 160-70.
[http://dx.doi.org/10.1177/1753495X18766170] [PMID: 30574177] 
[63] 
Bhoopalan SV, Huang LJ, Weiss MJ. Erythropoietin regulation of red blood cell production: from bench to bedside and back. F1000 Res  2020; 9: 9.
[http://dx.doi.org/10.12688/f1000research.26648.1] [PMID: 32983414] 
[64] 
Watowich SS. The erythropoietin receptor: molecular structure and hematopoietic signaling pathways. J Investig Med  2011; 59(7): 1067-72.
[http://dx.doi.org/10.2310/JIM.0b013e31820fb28c] 
[65] 
Whipple GH, Robscheit-Robbins FS. Amino acids and hemoglobin production in anemia. J Exp Med  1940; 71(4): 569-83.
[http://dx.doi.org/10.1084/jem.71.4.569] [PMID: 19870982] 
[66] 
Enko D, Moro T, Holasek S, et al. Branched-chain amino acids are linked with iron metabolism. Ann Transl Med  2020; 8(23): 1569.
[http://dx.doi.org/10.21037/atm-20-624a] [PMID: 33437768] 
[67] 
Pagani A, Nai A, Silvestri L, Camaschella C. Hepcidin and anemia: a tight relationship. Front Physiol  2019; 10(1294): 1294.
[http://dx.doi.org/10.3389/fphys.2019.01294] [PMID: 31649559] 
[68] 
Rossi E. Hepcidin-the iron regulatory hormone. Clin Biochem Rev  2005; 26(3): 47-9.
[PMID: 16450011] 
[69] 
Carafoli F, Hohenester E. Collagen recognition and transmembrane signalling by discoidin domain receptors. Biochim Biophys Acta  2013; 1834(10): 2187-94.
[http://dx.doi.org/10.1016/j.bbapap.2012.10.014] [PMID: 23128141] 
[70] 
Knowles PP, Murray-Rust J, Kjaer S, et al. Structure and chemical inhibition of the RET tyrosine kinase domain. J Biol Chem  2006; 281(44): 33577-87.
[http://dx.doi.org/10.1074/jbc.M605604200] [PMID: 16928683] 
[71] 
Aghaeepour N, Lehallier B, Baca Q, Ganio EA, Wong RJ, Ghaemi MS, et al. A proteomic clock of human pregnancy. Am J Obstet Gynecol  2018; 218(3): 347.e1-347.e14.
[http://dx.doi.org/10.1016/j.ajog.2017.12.208] 
[72] 
Vogel WF. AszA3di A, Alves F, Pawson T. Discoidin domain receptor 1 tyrosine kinase has an essential role in mammary gland development. Mol Cell Biol  2001; 21(8): 2906-17.
[http://dx.doi.org/10.1128/MCB.21.8.2906-2917.2001] [PMID: 11283268] 
[73] 
Kempf T, Guba-Quint A, Torgerson J, et al. Growth differentiation factor 15 predicts future insulin resistance and impaired glucose control in obese nondiabetic individuals: results from the XENDOS trial. Eur J Endocrinol  2012; 167(5): 671-8.
[http://dx.doi.org/10.1530/EJE-12-0466] [PMID: 22918303] 
[74] 
Tang M, Luo M, Lu W, et al. Serum growth differentiation factor 15 is associated with glucose metabolism in the third trimester in Chinese pregnant women. Diabetes Res Clin Pract  2019; 156107823
[http://dx.doi.org/10.1016/j.diabres.2019.107823] [PMID: 31446114] 

Rights & Permissions Print Cite

Article Metrics

23

3

1

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/1574893616666210906141545	Print ISSN 1574-8936
Publisher Name Bentham Science Publisher	Online ISSN 2212-392X

Current Bioinformatics

Integration of Multi-Omics Data Using Probabilistic Graph Models and External Knowledge

Abstract Play Pause

Graphical Abstract

Related Journals

Related Books

Abstract