Abstract
The multimedia data under Spark cloud computing platform are composed of many different types of entities, and all kinds of entity attributes are not exactly same. The traditional multimedia data clustering process based on spectral clustering algorithm assumes that multimedia data are composed of independent entities of the same type, and entities are unrelated with the obtained clustering results giving significant error. A multimedia data clustering method based on semi supervised K-means is proposed, and the K-means clustering algorithm is introduced, on the basis of it, the optimal initial clustering center is determined by the iterative method of graph theory. According to ideology similar to tree clustering algorithm, clustering center is expanded based on the maximum distance θ tree clustering algorithm the similarity is computed between two data objects and between object and cluster, and the clustering of multimedia data is realized under Spark cloud platform. The simulation results show that the proposed method has better speedup ratio, expansion rate and clustering accuracy, and can be applied to the multimedia data clustering under Spark cloud computing platform.
Keywords: Clustering, data, multimedia, spark cloud computing platform.
Graphical Abstract