ADSVAE: An Adaptive Density-aware Spectral Clustering Method for
Multi-omics Data Based on Variational Autoencoder

Jianping      Zhao; Qi      Guan; Chunhou      Zheng; Qingqing      Cao

doi:10.2174/1574893618666230406105659

Abstract

Introduction: The discovery of tumor subtypes helps to explore tumor pathogenesis, determine the operability of clinical treatment, and improve patient survival. Clustering analysis is increasingly applied to multi-genomic data. However, due to the diversity and complexity of multi-omics data, developing a complete clustering algorithm for tumor molecular typing is still challenging.

Methods: In this study, we present an adaptive density-aware spectral clustering method based on a variational autoencoder (ADSVAE). ADSVAE first learns the underlying spatial information of each omics data using a variational autoencoder (VAE) based on the Wasserstein distance metric. Secondly, a similarity matrix is built for each gene set using an adaptive density-aware kernel. Thirdly, tensor product graphs (TPGs) are used to merge different data sources and reduce noise. Finally, ADSVAE employs a spectral clustering algorithm and utilizes the Gaussian mixture model (GMM) to cluster the final eigenvector matrix to identify cancer subtypes.

Results: We tested ADSVAE on 5 TCGA datasets, all with good performance in comparison with several advanced multi-omics clustering algorithms. Compared with the existing multi-group clustering algorithms, the variational autoencoder based on the Wasserstein distance measure in the ADSVAE algorithm can learn the underlying spatial information on each omics data, which has a better effect on learning complex data distribution. The self-tuning density-aware kernel used by the ADSVAE algorithm enhances the similarity between shared near neighbor points, and the process of tensor product plot data integration and diffusion can better reduce the noise and reveal the underlying structure, improving the performance.

Conclusion: Due to the inherent pitfalls of computational biology in the study of cancer subtype identification, although some research conclusions have been made in this paper on the related issues, as the research in related fields continues to deepen, the clustering study of cancer subtype identification based on genomic data needs further improvement and refinement.

« Previous

Graphical Abstract

[1]
Goodwin S, McPherson JD, McCombie WR. Coming of age: Ten years of next-generation sequencing technologies. Nat Rev Genet  2016; 17(6): 333-51.
 [http://dx.doi.org/10.1038/nrg.2016.49] [PMID: 27184599]

[2]
Li HT, Zhang J, Xia J, Zheng CH. Identification of driver pathways in cancer based on combinatorial patterns of somatic gene mutations. Neoplasma  2016; 63(1): 57-63.
 [http://dx.doi.org/10.4149/neo_2016_007] [PMID: 26639234]

[3]
Wang B, Mezlini AM, Demir F, et al. Similarity network fusion for aggregating data types on a genomic scale. Nat Methods  2014; 11(3): 333-7.
 [http://dx.doi.org/10.1038/nmeth.2810] [PMID: 24464287]

[4]
Wu D, Wang D, Zhang MQ, Gu J. Fast dimension reduction and integrative clustering of multi-omics data using low-rank approximation: Application to cancer molecular classification. BMC Genomics  2015; 16(1): 1022.
 [http://dx.doi.org/10.1186/s12864-015-2223-8] [PMID: 26626453]

[5]
Tini G, Marchetti L, Priami C, Scott-Boyer MP. Multi-omics integration—a comparison of unsupervised clustering methodologies. Brief Bioinform  2019; 20(4): 1269-79.
 [http://dx.doi.org/10.1093/bib/bbx167] [PMID: 29272335]

[6]
John CR, Watson D, Barnes MR, Pitzalis C, Lewis MJ. Spectrum: Fast density-aware spectral clustering for single and multi-omic data. Bioinformatics  2019; 36(4): 1159-66.
 [http://dx.doi.org/10.1093/bioinformatics/btz704]

[7]
Guan Q, Zhao JP, Zheng CH. SNEMO: Spectral clustering based on the neighborhood for multi-omics data. Intelligent Computing Theories and Application. Lecture Notes in Computer Science, vol 12838. Cham: Springer 2021.
 [http://dx.doi.org/10.1007/978-3-030-84532-2_44]

[8]
Mo Q, Shen R, Guo C, Vannucci M, Chan KS, Hilsenbeck SG. A fully Bayesian latent variable model for integrative clustering analysis of multi-type omics data. Biostatistics  2018; 19(1): 71-86.
 [http://dx.doi.org/10.1093/biostatistics/kxx017] [PMID: 28541380]

[9]
Mo Q, Wang S, Seshan VE, et al. Pattern discovery and cancer gene identification in integrated cancer genomic data. Proc Natl Acad Sci  2013; 110(11): 4245-50.
 [http://dx.doi.org/10.1073/pnas.1208949110] [PMID: 23431203]

[10]
Nguyen H, Shrestha S, Draghici S, Nguyen T. PINSPlus: A tool for tumor subtype discovery in integrated genomic data. Bioinformatics  2019; 35(16): 2843-6.
 [http://dx.doi.org/10.1093/bioinformatics/bty1049] [PMID: 30590381]

[11]
Nguyen T, Tagett R, Diaz D, Draghici S. A novel approach for data integration and disease subtyping. Genome Res  2017; 27(12): 2025-39.
 [http://dx.doi.org/10.1101/gr.215129.116] [PMID: 29066617]

[12]
Speicher NK, Pfeifer N. Integrating different data types by regularized unsupervised multiple kernel learning with application to cancer subtype discovery. Bioinformatics  2015; 31(12): i268-75.
 [http://dx.doi.org/10.1093/bioinformatics/btv244] [PMID: 26072491]

[13]
Rappoport N, Shamir R. NEMO: Cancer subtyping by integration of partial multi-omic data. Bioinformatics  2019; 35(18): 3348-56.
 [http://dx.doi.org/10.1093/bioinformatics/btz058] [PMID: 30698637]

[14]
Witten DM, Tibshirani RJ. Extensions of sparse canonical correlation analysis with applications to genomic data. Stat Appl Genet Mol Biol  2009; 8(1): 1-27.
 [http://dx.doi.org/10.2202/1544-6115.1470] [PMID: 19572827]

[15]
Xu C, Jackson SA. Machine learning and complex biological data. Genome Biol  2019; 20(1): 76.
 [http://dx.doi.org/10.1186/s13059-019-1689-0] [PMID: 30992073]

[16]
Chaudhary K, Poirion OB, Lu L, Garmire LX. Deep learning–based multi-omics integration robustly predicts survival in liver cancer. Clin Cancer Res  2018; 24(6): 1248-59.
 [http://dx.doi.org/10.1158/1078-0432.CCR-17-0853] [PMID: 28982688]

[17]
Chen R, Yang L, Goodison S, Sun Y. Deep-learning approach to identifying cancer subtypes using high-dimensional genomic data. Bioinformatics  2020; 36(5): 1476-83.
 [http://dx.doi.org/10.1093/bioinformatics/btz769] [PMID: 31603461]

[18]
Way GP, Greene CS. Extracting a biologically relevant latent space from cancer transcriptomes with variational autoencoders. Proceedings of the Pacific Symposium.  Hawaii, USA 2021; pp. 80-91.
 [http://dx.doi.org/10.1142/9789813235533_0008]

[19]
Xu J, Wu P, Chen Y, Meng Q, Dawood H, Dawood H. A hierarchical integration deep flexible neural forest framework for cancer subtype classification by integrating multi-omics data. BMC Bioinformatics  2019; 20(1): 527.
 [http://dx.doi.org/10.1186/s12859-019-3116-7] [PMID: 31660856]

[20]
Zhang L, Lv C, Jin Y, et al. Deep learning-based multi-omics data integration reveals two prognostic subtypes in high-risk neuroblastoma. Front Genet  2018; 9: 477.
 [http://dx.doi.org/10.3389/fgene.2018.00477] [PMID: 30405689]

[21]
Kingma DP, Welling M. Auto-encoding variational bayes. arXiv 2013.
 [http://dx.doi.org/10.48550/arXiv.1312.6114]

[22]
Hira MT, Razzaque MA, Angione C, Scrivens J, Sawan S, Sarker M. Integrated multi-omics analysis of ovarian cancer using variational autoencoders. Sci Rep  2021; 11(1): 6265.
 [http://dx.doi.org/10.1038/s41598-021-85285-4] [PMID: 33414495]

[23]
Chen Y, Ye J, Li J. Aggregated wasserstein distance and state registration for hidden markov models. IEEE Trans Pattern Anal Mach Intell  2020; 42(9): 2133-47.
 [http://dx.doi.org/10.1109/TPAMI.2019.2908635] [PMID: 30946661]

[24]
Ng AY. On spectral clustering: Analysis and an algorithm. Adv Neural Inf Process Syst  2002; 14(1): 849-56.

[25]
Zhu R, Liu JX, Zhang YK, Guo Y. A robust manifold graph regularized nonnegative matrix factorization algorithm for cancer gene clustering. Molecules  2017; 22(12): 2131.
 [http://dx.doi.org/10.3390/molecules22122131] [PMID: 29207477]

[26]
Yu Y, Zhang LH, Zhang S. Simultaneous clustering of multiview biomedical data using manifold optimization. Bioinformatics  2019; 35(20): 4029-37.
 [http://dx.doi.org/10.1093/bioinformatics/btz217] [PMID: 30918942]

[27]
Rappoport N, Shamir R. Multi-omic and multi-view clustering algorithms: Review and cancer benchmark. Nucleic Acids Res  2018; 46(20): 10546-62.
 [http://dx.doi.org/10.1093/nar/gky889] [PMID: 30295871]

[28]
Noushmehr H, Weisenberger DJ, Diefes K, et al. Identification of a CpG island methylator phenotype that defines a distinct subgroup of glioma. Cancer Cell  2010; 17(5): 510-22.
 [http://dx.doi.org/10.1016/j.ccr.2010.03.017] [PMID: 20399149]

[29]
Levesley J. Radial Basis Functions: Theory and Implementations. Cambridge, UK: Cambridge University Press 2003.
 [http://dx.doi.org/10.2277/0511036019]

[30]
Zhang D, Chen P, Zheng CH, Xia J. Identification of ovarian cancer subtype-specific network modules and candidate drivers through an integrative genomics approach. Oncotarget  2016; 7(4): 4298-309.
 [http://dx.doi.org/10.18632/oncotarget.6774] [PMID: 26735889]

[31]
McLendon R, Friedman A, Bigner D. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature  2008; 455(7216): 1061-8.
 [http://dx.doi.org/10.1038/nature07385] [PMID: 18772890]

[32]
Wang D, Gu J. Integrative clustering methods of multi-omics data for molecule-based cancer classifications. Quant Biol  2016; 4(1): 58-67.
 [http://dx.doi.org/10.1007/s40484-016-0063-4]

[33]
Troyanskaya O, Cantor M, Sherlock G, et al. Missing estimation methods for DNA microarrays. Bioinformatics  2001; 17(6): 520-5.
 [http://dx.doi.org/10.1093/bioinformatics/17.6.520] [PMID: 11395428]

Rights & Permissions Print Cite

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/1574893618666230406105659	Print ISSN 1574-8936
Publisher Name Bentham Science Publisher	Online ISSN 2212-392X

Current Bioinformatics

ADSVAE: An Adaptive Density-aware Spectral Clustering Method for Multi-omics Data Based on Variational Autoencoder

Abstract Play Pause

Graphical Abstract

Related Journals

Related Books

Abstract