Abstract
The two most widely used and easily implementable algorithm for clustering and classification- based analysis of data in the unsupervised learning domain are Density-Based Spatial Clustering of Applications with Noise and K-mean cluster analysis. These two techniques can handle most cases effectively when the data has a lot of randomness with no clear set to use as a parameter as in the case of linear or logistic regression algorithms. However, few papers exist that pit these two against each other in a controlled environment to observe which one reigns supreme and the conditions required for the same. In this paper, a renal adenocarcinoma dataset is analyzed and thereafter both DBSCAN and K-mean are applied on the dataset with subsequent examination of the results. The efficacy of both the techniques in this study is compared and based on them the merits and demerits observed are enumerated. Further, the interaction of t-SNE with the generated clusters are explored.
Keywords: DBSCAN, K-mean, renal cancer, oversampling, t-SNE, clustering, scatter-plot.
Graphical Abstract
[http://dx.doi.org/10.1016/j.jksuci.2017.06.001]
[http://dx.doi.org/10.1109/ICCCI.2016.7479923]
[http://dx.doi.org/10.1109/34.927466]
[http://dx.doi.org/10.3923/itj.2006.551.559]
[http://dx.doi.org/10.2172/15002155]
[http://dx.doi.org/10.1016/j.patcog.2019.01.034]
[http://dx.doi.org/10.1016/j.swevo.2018.09.008]
[http://dx.doi.org/10.1145/3068335]
[http://dx.doi.org/10.1038/s41598-017-14237-8] [PMID: 29070859]
[http://dx.doi.org/10.1613/jair.953]
[http://dx.doi.org/10.5121/ijcsit.2018.10203]
[http://dx.doi.org/10.12928/telkomnika.v17i1.9394]
[http://dx.doi.org/10.1504/IJGUC.2017.085911]
[http://dx.doi.org/10.1016/j.procs.2019.01.022]
[http://dx.doi.org/10.1109/TNN.2005.845141] [PMID: 15940994]
[http://dx.doi.org/10.1002/widm.1300]
[http://dx.doi.org/10.1109/ICISIP.2004.1287631]
[http://dx.doi.org/10.26438/ijcse/v7i6.443448]
[http://dx.doi.org/10.1088/1742-6596/1303/1/012095]
[http://dx.doi.org/10.23915/distill.00002]