Generic placeholder image

Current Bioinformatics

Editor-in-Chief

ISSN (Print): 1574-8936
ISSN (Online): 2212-392X

Research Article

ADSVAE: An Adaptive Density-aware Spectral Clustering Method for Multi-omics Data Based on Variational Autoencoder

Author(s): Jianping Zhao*, Qi Guan, Chunhou Zheng* and Qingqing Cao

Volume 18, Issue 6, 2023

Published on: 13 June, 2023

Page: [527 - 536] Pages: 10

DOI: 10.2174/1574893618666230406105659

Price: $65

conference banner
Abstract

Introduction: The discovery of tumor subtypes helps to explore tumor pathogenesis, determine the operability of clinical treatment, and improve patient survival. Clustering analysis is increasingly applied to multi-genomic data. However, due to the diversity and complexity of multi-omics data, developing a complete clustering algorithm for tumor molecular typing is still challenging.

Methods: In this study, we present an adaptive density-aware spectral clustering method based on a variational autoencoder (ADSVAE). ADSVAE first learns the underlying spatial information of each omics data using a variational autoencoder (VAE) based on the Wasserstein distance metric. Secondly, a similarity matrix is built for each gene set using an adaptive density-aware kernel. Thirdly, tensor product graphs (TPGs) are used to merge different data sources and reduce noise. Finally, ADSVAE employs a spectral clustering algorithm and utilizes the Gaussian mixture model (GMM) to cluster the final eigenvector matrix to identify cancer subtypes.

Results: We tested ADSVAE on 5 TCGA datasets, all with good performance in comparison with several advanced multi-omics clustering algorithms. Compared with the existing multi-group clustering algorithms, the variational autoencoder based on the Wasserstein distance measure in the ADSVAE algorithm can learn the underlying spatial information on each omics data, which has a better effect on learning complex data distribution. The self-tuning density-aware kernel used by the ADSVAE algorithm enhances the similarity between shared near neighbor points, and the process of tensor product plot data integration and diffusion can better reduce the noise and reveal the underlying structure, improving the performance.

Conclusion: Due to the inherent pitfalls of computational biology in the study of cancer subtype identification, although some research conclusions have been made in this paper on the related issues, as the research in related fields continues to deepen, the clustering study of cancer subtype identification based on genomic data needs further improvement and refinement.

« Previous
Graphical Abstract

[1]
Goodwin S, McPherson JD, McCombie WR. Coming of age: Ten years of next-generation sequencing technologies. Nat Rev Genet 2016; 17(6): 333-51.
[http://dx.doi.org/10.1038/nrg.2016.49] [PMID: 27184599]
[2]
Li HT, Zhang J, Xia J, Zheng CH. Identification of driver pathways in cancer based on combinatorial patterns of somatic gene mutations. Neoplasma 2016; 63(1): 57-63.
[http://dx.doi.org/10.4149/neo_2016_007] [PMID: 26639234]
[3]
Wang B, Mezlini AM, Demir F, et al. Similarity network fusion for aggregating data types on a genomic scale. Nat Methods 2014; 11(3): 333-7.
[http://dx.doi.org/10.1038/nmeth.2810] [PMID: 24464287]
[4]
Wu D, Wang D, Zhang MQ, Gu J. Fast dimension reduction and integrative clustering of multi-omics data using low-rank approximation: Application to cancer molecular classification. BMC Genomics 2015; 16(1): 1022.
[http://dx.doi.org/10.1186/s12864-015-2223-8] [PMID: 26626453]
[5]
Tini G, Marchetti L, Priami C, Scott-Boyer MP. Multi-omics integration—a comparison of unsupervised clustering methodologies. Brief Bioinform 2019; 20(4): 1269-79.
[http://dx.doi.org/10.1093/bib/bbx167] [PMID: 29272335]
[6]
John CR, Watson D, Barnes MR, Pitzalis C, Lewis MJ. Spectrum: Fast density-aware spectral clustering for single and multi-omic data. Bioinformatics 2019; 36(4): 1159-66.
[http://dx.doi.org/10.1093/bioinformatics/btz704]
[7]
Guan Q, Zhao JP, Zheng CH. SNEMO: Spectral clustering based on the neighborhood for multi-omics data. Intelligent Computing Theories and Application. Lecture Notes in Computer Science, vol 12838. Cham: Springer 2021.
[http://dx.doi.org/10.1007/978-3-030-84532-2_44]
[8]
Mo Q, Shen R, Guo C, Vannucci M, Chan KS, Hilsenbeck SG. A fully Bayesian latent variable model for integrative clustering analysis of multi-type omics data. Biostatistics 2018; 19(1): 71-86.
[http://dx.doi.org/10.1093/biostatistics/kxx017] [PMID: 28541380]
[9]
Mo Q, Wang S, Seshan VE, et al. Pattern discovery and cancer gene identification in integrated cancer genomic data. Proc Natl Acad Sci 2013; 110(11): 4245-50.
[http://dx.doi.org/10.1073/pnas.1208949110] [PMID: 23431203]
[10]
Nguyen H, Shrestha S, Draghici S, Nguyen T. PINSPlus: A tool for tumor subtype discovery in integrated genomic data. Bioinformatics 2019; 35(16): 2843-6.
[http://dx.doi.org/10.1093/bioinformatics/bty1049] [PMID: 30590381]
[11]
Nguyen T, Tagett R, Diaz D, Draghici S. A novel approach for data integration and disease subtyping. Genome Res 2017; 27(12): 2025-39.
[http://dx.doi.org/10.1101/gr.215129.116] [PMID: 29066617]
[12]
Speicher NK, Pfeifer N. Integrating different data types by regularized unsupervised multiple kernel learning with application to cancer subtype discovery. Bioinformatics 2015; 31(12): i268-75.
[http://dx.doi.org/10.1093/bioinformatics/btv244] [PMID: 26072491]
[13]
Rappoport N, Shamir R. NEMO: Cancer subtyping by integration of partial multi-omic data. Bioinformatics 2019; 35(18): 3348-56.
[http://dx.doi.org/10.1093/bioinformatics/btz058] [PMID: 30698637]
[14]
Witten DM, Tibshirani RJ. Extensions of sparse canonical correlation analysis with applications to genomic data. Stat Appl Genet Mol Biol 2009; 8(1): 1-27.
[http://dx.doi.org/10.2202/1544-6115.1470] [PMID: 19572827]
[15]
Xu C, Jackson SA. Machine learning and complex biological data. Genome Biol 2019; 20(1): 76.
[http://dx.doi.org/10.1186/s13059-019-1689-0] [PMID: 30992073]
[16]
Chaudhary K, Poirion OB, Lu L, Garmire LX. Deep learning–based multi-omics integration robustly predicts survival in liver cancer. Clin Cancer Res 2018; 24(6): 1248-59.
[http://dx.doi.org/10.1158/1078-0432.CCR-17-0853] [PMID: 28982688]
[17]
Chen R, Yang L, Goodison S, Sun Y. Deep-learning approach to identifying cancer subtypes using high-dimensional genomic data. Bioinformatics 2020; 36(5): 1476-83.
[http://dx.doi.org/10.1093/bioinformatics/btz769] [PMID: 31603461]
[18]
Way GP, Greene CS. Extracting a biologically relevant latent space from cancer transcriptomes with variational autoencoders. Proceedings of the Pacific Symposium. Hawaii, USA 2021; pp. 80-91.
[http://dx.doi.org/10.1142/9789813235533_0008]
[19]
Xu J, Wu P, Chen Y, Meng Q, Dawood H, Dawood H. A hierarchical integration deep flexible neural forest framework for cancer subtype classification by integrating multi-omics data. BMC Bioinformatics 2019; 20(1): 527.
[http://dx.doi.org/10.1186/s12859-019-3116-7] [PMID: 31660856]
[20]
Zhang L, Lv C, Jin Y, et al. Deep learning-based multi-omics data integration reveals two prognostic subtypes in high-risk neuroblastoma. Front Genet 2018; 9: 477.
[http://dx.doi.org/10.3389/fgene.2018.00477] [PMID: 30405689]
[21]
Kingma DP, Welling M. Auto-encoding variational bayes. arXiv 2013.
[http://dx.doi.org/10.48550/arXiv.1312.6114]
[22]
Hira MT, Razzaque MA, Angione C, Scrivens J, Sawan S, Sarker M. Integrated multi-omics analysis of ovarian cancer using variational autoencoders. Sci Rep 2021; 11(1): 6265.
[http://dx.doi.org/10.1038/s41598-021-85285-4] [PMID: 33414495]
[23]
Chen Y, Ye J, Li J. Aggregated wasserstein distance and state registration for hidden markov models. IEEE Trans Pattern Anal Mach Intell 2020; 42(9): 2133-47.
[http://dx.doi.org/10.1109/TPAMI.2019.2908635] [PMID: 30946661]
[24]
Ng AY. On spectral clustering: Analysis and an algorithm. Adv Neural Inf Process Syst 2002; 14(1): 849-56.
[25]
Zhu R, Liu JX, Zhang YK, Guo Y. A robust manifold graph regularized nonnegative matrix factorization algorithm for cancer gene clustering. Molecules 2017; 22(12): 2131.
[http://dx.doi.org/10.3390/molecules22122131] [PMID: 29207477]
[26]
Yu Y, Zhang LH, Zhang S. Simultaneous clustering of multiview biomedical data using manifold optimization. Bioinformatics 2019; 35(20): 4029-37.
[http://dx.doi.org/10.1093/bioinformatics/btz217] [PMID: 30918942]
[27]
Rappoport N, Shamir R. Multi-omic and multi-view clustering algorithms: Review and cancer benchmark. Nucleic Acids Res 2018; 46(20): 10546-62.
[http://dx.doi.org/10.1093/nar/gky889] [PMID: 30295871]
[28]
Noushmehr H, Weisenberger DJ, Diefes K, et al. Identification of a CpG island methylator phenotype that defines a distinct subgroup of glioma. Cancer Cell 2010; 17(5): 510-22.
[http://dx.doi.org/10.1016/j.ccr.2010.03.017] [PMID: 20399149]
[29]
Levesley J. Radial Basis Functions: Theory and Implementations. Cambridge, UK: Cambridge University Press 2003.
[http://dx.doi.org/10.2277/0511036019]
[30]
Zhang D, Chen P, Zheng CH, Xia J. Identification of ovarian cancer subtype-specific network modules and candidate drivers through an integrative genomics approach. Oncotarget 2016; 7(4): 4298-309.
[http://dx.doi.org/10.18632/oncotarget.6774] [PMID: 26735889]
[31]
McLendon R, Friedman A, Bigner D. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 2008; 455(7216): 1061-8.
[http://dx.doi.org/10.1038/nature07385] [PMID: 18772890]
[32]
Wang D, Gu J. Integrative clustering methods of multi-omics data for molecule-based cancer classifications. Quant Biol 2016; 4(1): 58-67.
[http://dx.doi.org/10.1007/s40484-016-0063-4]
[33]
Troyanskaya O, Cantor M, Sherlock G, et al. Missing estimation methods for DNA microarrays. Bioinformatics 2001; 17(6): 520-5.
[http://dx.doi.org/10.1093/bioinformatics/17.6.520] [PMID: 11395428]

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy