Abstract
Introduction: The discovery of tumor subtypes helps to explore tumor pathogenesis, determine the operability of clinical treatment, and improve patient survival. Clustering analysis is increasingly applied to multi-genomic data. However, due to the diversity and complexity of multi-omics data, developing a complete clustering algorithm for tumor molecular typing is still challenging.
Methods: In this study, we present an adaptive density-aware spectral clustering method based on a variational autoencoder (ADSVAE). ADSVAE first learns the underlying spatial information of each omics data using a variational autoencoder (VAE) based on the Wasserstein distance metric. Secondly, a similarity matrix is built for each gene set using an adaptive density-aware kernel. Thirdly, tensor product graphs (TPGs) are used to merge different data sources and reduce noise. Finally, ADSVAE employs a spectral clustering algorithm and utilizes the Gaussian mixture model (GMM) to cluster the final eigenvector matrix to identify cancer subtypes.
Results: We tested ADSVAE on 5 TCGA datasets, all with good performance in comparison with several advanced multi-omics clustering algorithms. Compared with the existing multi-group clustering algorithms, the variational autoencoder based on the Wasserstein distance measure in the ADSVAE algorithm can learn the underlying spatial information on each omics data, which has a better effect on learning complex data distribution. The self-tuning density-aware kernel used by the ADSVAE algorithm enhances the similarity between shared near neighbor points, and the process of tensor product plot data integration and diffusion can better reduce the noise and reveal the underlying structure, improving the performance.
Conclusion: Due to the inherent pitfalls of computational biology in the study of cancer subtype identification, although some research conclusions have been made in this paper on the related issues, as the research in related fields continues to deepen, the clustering study of cancer subtype identification based on genomic data needs further improvement and refinement.
Graphical Abstract
[http://dx.doi.org/10.1038/nrg.2016.49] [PMID: 27184599]
[http://dx.doi.org/10.4149/neo_2016_007] [PMID: 26639234]
[http://dx.doi.org/10.1038/nmeth.2810] [PMID: 24464287]
[http://dx.doi.org/10.1186/s12864-015-2223-8] [PMID: 26626453]
[http://dx.doi.org/10.1093/bib/bbx167] [PMID: 29272335]
[http://dx.doi.org/10.1093/bioinformatics/btz704]
[http://dx.doi.org/10.1007/978-3-030-84532-2_44]
[http://dx.doi.org/10.1093/biostatistics/kxx017] [PMID: 28541380]
[http://dx.doi.org/10.1073/pnas.1208949110] [PMID: 23431203]
[http://dx.doi.org/10.1093/bioinformatics/bty1049] [PMID: 30590381]
[http://dx.doi.org/10.1101/gr.215129.116] [PMID: 29066617]
[http://dx.doi.org/10.1093/bioinformatics/btv244] [PMID: 26072491]
[http://dx.doi.org/10.1093/bioinformatics/btz058] [PMID: 30698637]
[http://dx.doi.org/10.2202/1544-6115.1470] [PMID: 19572827]
[http://dx.doi.org/10.1186/s13059-019-1689-0] [PMID: 30992073]
[http://dx.doi.org/10.1158/1078-0432.CCR-17-0853] [PMID: 28982688]
[http://dx.doi.org/10.1093/bioinformatics/btz769] [PMID: 31603461]
[http://dx.doi.org/10.1142/9789813235533_0008]
[http://dx.doi.org/10.1186/s12859-019-3116-7] [PMID: 31660856]
[http://dx.doi.org/10.3389/fgene.2018.00477] [PMID: 30405689]
[http://dx.doi.org/10.48550/arXiv.1312.6114]
[http://dx.doi.org/10.1038/s41598-021-85285-4] [PMID: 33414495]
[http://dx.doi.org/10.1109/TPAMI.2019.2908635] [PMID: 30946661]
[http://dx.doi.org/10.3390/molecules22122131] [PMID: 29207477]
[http://dx.doi.org/10.1093/bioinformatics/btz217] [PMID: 30918942]
[http://dx.doi.org/10.1093/nar/gky889] [PMID: 30295871]
[http://dx.doi.org/10.1016/j.ccr.2010.03.017] [PMID: 20399149]
[http://dx.doi.org/10.2277/0511036019]
[http://dx.doi.org/10.18632/oncotarget.6774] [PMID: 26735889]
[http://dx.doi.org/10.1038/nature07385] [PMID: 18772890]
[http://dx.doi.org/10.1007/s40484-016-0063-4]
[http://dx.doi.org/10.1093/bioinformatics/17.6.520] [PMID: 11395428]