Genotype and Phenotype Association Analysis Based on Multi-omics
Statistical Data

Xinpeng      Guo; Yafei      Song; Dongyan      Xu; Xueping      Jin; Xuequn      Shang

doi:10.2174/0115748936276861240109045208

Abstract

Background: When using clinical data for multi-omics analysis, there are issues such as the insufficient number of omics data types and relatively small sample size due to the protection of patients' privacy, the requirements of data management by various institutions, and the relatively large number of features of each omics data. This paper describes the analysis of multi-omics pathway relationships using statistical data in the absence of clinical data.

Methods: We proposed a novel approach to exploit easily accessible statistics in public databases. This approach introduces phenotypic associations that are not included in the clinical data and uses these data to build a three-layer heterogeneous network. To simplify the analysis, we decomposed the three-layer network into double two-layer networks to predict the weights of the inter-layer associations. By adding a hyperparameter β, the weights of the two layers of the network were merged, and then k-fold cross-validation was used to evaluate the accuracy of this method. In calculating the weights of the two-layer networks, the RWR with fixed restart probability was combined with PBMDA and CIPHER to generate the PCRWR with biased weights and improved accuracy. Results: The area under the receiver operating characteristic curve was increased by approximately 7% in the case of the RWR with initial weights.

Conclusion: Multi-omics statistical data were used to establish genotype and phenotype correlation networks for analysis, which was similar to the effect of clinical multi-omics analysis.

« Previous Next »

[1]
Guo X, Song Y, Liu S, Gao M, Qi Y, Shang X. Linking genotype to phenotype in multi-omics data of small sample. BMC Genomics  2021; 22(1): 537.
 [http://dx.doi.org/10.1186/s12864-021-07867-w] [PMID:  34256701]

[2]
Guo X, Han J, Song Y, Yin Z, Liu S, Shang X. Using expression quantitative trait loci data and graph-embedded neural networks to uncover genotype–phenotype interactions. Front Genet  2022; 13: 921775.
 [http://dx.doi.org/10.3389/fgene.2022.921775] [PMID:  36046233]

[3]
Guo Y, Liu S, Li Z, Shang X. BCDForest: A boosting cascade deep forest model towards the classification of cancer subtypes based on gene expression data. BMC Bioinformatics  2018; 19(S5) (Suppl. 5): 118.
 [http://dx.doi.org/10.1186/s12859-018-2095-4] [PMID:  29671390]

[4]
Guo X, Lu Y, Yin Z, Shang X. IPMM: Cancer subtype clustering model based on multiomics data and pathway and motif information.  Cham: Springer International Publishing 2020; pp. 560-8.

[5]
Fiscon G, Conte F, Farina L, Paci P. SAveRUNNER: A network-based algorithm for drug repurposing and its application to COVID-19. PLOS Comput Biol  2021; 17(2): e1008686.
 [http://dx.doi.org/10.1371/journal.pcbi.1008686] [PMID:  33544720]

[6]
van Driel MA, Bruggeman J, Vriend G, Brunner HG, Leunissen JAM. A text-mining analysis of the human phenome. Eur J Hum Genet  2006; 14(5): 535-42.
 [http://dx.doi.org/10.1038/sj.ejhg.5201585] [PMID:  16493445]

[7]
Kim Y, Park JH, Cho YR. Network-based approaches for disease-gene association prediction using protein-protein interaction networks. Int J Mol Sci  2022; 23(13): 7411.
 [http://dx.doi.org/10.3390/ijms23137411] [PMID:  35806415]

[8]
Wu X, Jiang R, Zhang MQ, Li S. Network-based global inference of human disease genes. Mol Syst Biol  2008; 4(1): 189.
 [http://dx.doi.org/10.1038/msb.2008.27] [PMID:  18463613]

[9]
Gilad Y, Rifkin SA, Pritchard JK. Revealing the architecture of gene regulation: the promise of eQTL studies. Trends Genet  2008; 24(8): 408-15.
 [http://dx.doi.org/10.1016/j.tig.2008.06.001] [PMID:  18597885]

[10]
Schadt EE, Lamb J, Yang X, et al. An integrative genomics approach to infer causal associations between gene expression and disease. Nat Genet  2005; 37(7): 710-7.
 [http://dx.doi.org/10.1038/ng1589] [PMID:  15965475]

[11]
Zhu Z, Zhang F, Hu H, et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat Genet  2016; 48(5): 481-7.
 [http://dx.doi.org/10.1038/ng.3538] [PMID:  27019110]

[12]
Roytman M, Kichaev G, Gusev A, Pasaniuc B. Methods for fine-mapping with chromatin and expression data. PLoS Genet  2018; 14(2): e1007240.
 [http://dx.doi.org/10.1371/journal.pgen.1007240] [PMID:  29481575]

[13]
Köhler S, Gargano M, Matentzoglu N, et al. The human phenotype ontology in 2021. Nucleic Acids Res  2021; 49(D1): D1207-17.
 [http://dx.doi.org/10.1093/nar/gkaa1043] [PMID:  33264411]

[14]
Murtagh F, Contreras P. Algorithms for hierarchical clustering: An overview. Wiley Interdiscip Rev Data Min Knowl Discov  2012; 2(1): 86-97.
 [http://dx.doi.org/10.1002/widm.53]

[15]
Havens TC, Bezdek JC, Leckie C, Hall LO, Palaniswami M. Fuzzy c-means algorithms for very large data. IEEE Trans Fuzzy Syst  2012; 20(6): 1130-46.
 [http://dx.doi.org/10.1109/TFUZZ.2012.2201485]

[16]
Kohonen T. The self-organizing map. Neurocomputing  1998; 21(1-3): 1-6.
 [http://dx.doi.org/10.1016/S0925-2312(98)00030-7]

[17]
Wu FX. Genetic weighted k-means algorithm for clustering large-scale gene expression data. BMC Bioinformatics  2008; 9(S6) (Suppl. 6): S12.
 [http://dx.doi.org/10.1186/1471-2105-9-S6-S12] [PMID:  18541047]

[18]
You ZH, Huang ZA, Zhu Z, et al. PBMDA: A novel and effective path-based computational model for miRNA-disease association prediction. PLOS Comput Biol  2017; 13(3): e1005455.
 [http://dx.doi.org/10.1371/journal.pcbi.1005455] [PMID:  28339468]

[19]
Ba-alawi W, Soufan O, Essack M, Kalnis P, Bajic VB. DASPfind: new efficient method to predict drug–target interactions. J Cheminform  2016; 8(1): 15.
 [http://dx.doi.org/10.1186/s13321-016-0128-4] [PMID:  26985240]

[20]
Luo J, Long Y. NTSHMDA: Prediction of human microbe-disease association based on random walk by integrating network topological similarity. IEEE/ACM Trans Comput Biol Bioinform  2020; 17: 1341-51.

[21]
Köhler S, Bauer S, Horn D, Robinson PN. Walking the interactome for prioritization of candidate disease genes. Am J Hum Genet  2008; 82(4): 949-58.
 [http://dx.doi.org/10.1016/j.ajhg.2008.02.013] [PMID:  18371930]

[22]
Li Y, Patra JC. Genome-wide inferring gene–phenotype relationship by walking on the heterogeneous network. Bioinformatics  2010; 26(9): 1219-24.
 [http://dx.doi.org/10.1093/bioinformatics/btq108] [PMID:  20215462]

[23]
Chen X, Liu MX, Yan GY. RWRMDA: Predicting novel human microRNA–disease associations. Mol Biosyst  2012; 8(10): 2792-8.
 [http://dx.doi.org/10.1039/c2mb25180a] [PMID:  22875290]

[24]
Smedley D, Haider S, Durinck S, et al. The BioMart community portal: An innovative alternative to large, centralized data repositories. Nucleic Acids Res  2015; 43(W1): W589-98.
 [http://dx.doi.org/10.1093/nar/gkv350] [PMID:  25897122]

[25]
Keshava Prasad TS, Goel R, Kandasamy K, et al. Human protein reference database-2009 update. Nucleic Acids Res  2009; 37(Database): D767-72.
 [http://dx.doi.org/10.1093/nar/gkn892] [PMID:  18988627]

[26]
Mathivanan S, Ahmed M, Ahn NG, et al. Human Proteinpedia enables sharing of human protein data. Nat Biotechnol  2008; 26(2): 164-7.
 [http://dx.doi.org/10.1038/nbt0208-164] [PMID:  18259167]

[27]
Piñero J, Bravo À, Queralt-Rosinach N, et al. DisGeNET: A comprehensive platform integrating information on human disease-associated genes and variants. Nucleic Acids Res  2017; 45(D1): D833-9.
 [http://dx.doi.org/10.1093/nar/gkw943] [PMID:  27924018]

[28]
Peng J, Hui W, Li Q, et al. A learning-based framework for miRNA-disease association identification using neural networks. Bioinformatics  2019; 35(21): 4364-71.
 [http://dx.doi.org/10.1093/bioinformatics/btz254] [PMID:  30977780]

[29]
Ramos EM, Hoffman D, Junkins HA, et al. Phenotype–genotype integrator (PheGenI): Synthesizing genome-wide association study (GWAS) data with existing genomic resources. Eur J Hum Genet  2014; 22(1): 144-7.
 [http://dx.doi.org/10.1038/ejhg.2013.96] [PMID:  23695286]

[30]
Cornish AJ, David A, Sternberg MJE. PhenoRank: Reducing study bias in gene prioritization through simulation. Bioinformatics  2018; 34(12): 2087-95.
 [http://dx.doi.org/10.1093/bioinformatics/bty028] [PMID:  29360927]

[31]
Zhang Y, Liu J, Liu X, et al. Prioritizing disease genes with an improved dual label propagation framework. BMC Bioinformatics  2018; 19(1): 47.
 [http://dx.doi.org/10.1186/s12859-018-2040-6] [PMID:  29422030]

[32]
Yang K, Wang R, Liu G, et al. HerGePred: Heterogeneous network embedding representation for disease gene prediction. IEEE J Biomed Health Inform  2019; 23(4): 1805-15.
 [http://dx.doi.org/10.1109/JBHI.2018.2870728] [PMID:  31283472]

Rights & Permissions Print Cite

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/0115748936276861240109045208	Print ISSN 1574-8936
Publisher Name Bentham Science Publisher	Online ISSN 2212-392X

Current Bioinformatics

Genotype and Phenotype Association Analysis Based on Multi-omics Statistical Data

Abstract Play Pause

Related Journals

Related Books

Abstract