The Role Of Hubness For Clustering High Dimensional Data
Our Price
₹3,000.00
10000 in stock
Support
Ready to Ship
Description
Clustering algorithm and cluster validity are two highly correlated parts in cluster analysis. In this paper, a novel idea for cluster validity and a clustering algorithm based on the validity index are introduced. A Centroid Ratio is firstly introduced to compare two clustering results. This centroid ratio is then used in prototype-based clustering by introducing a Pairwise Random Swap clustering algorithm to avoid the local optimum problem of k-means. Before clustering, the number of clusters is an essential parameter for the clustering algorithm, while after clustering, the validity of the clustering is performed. the similarity value for comparing two clusterings from the centroid ratio can be used as a stopping criterion in the algorithm. We propose a cluster-level validity criterion called a centroid ratio . It has low time complexity and is applicable for detecting unstable or incorrectly located centroids. Employing the centroid ratio in swap-based clustering, we further suggest a pairwise random swap clustering algorithm, for which no stopping criterion is required. The centroid ratio is shown to be highly correlated to the mean square error (MSE) and other external indices. Moreover, it is fast and simple to calculate. An empirical study of several different datasets indicates that the proposed algorithm works more efficiently than Random Swap, Deterministic Random Swap, Repeated k-means or k-means++. The algorithm is successfully applied to document clustering and color image quantization as well.