Efficient kNN Classification with Different Numbers of Nearest Neighbors
Our Price
₹3,500.00
10000 in stock
Support
Ready to Ship
Description
K nearest neighbor (in) method is a popular classification method in data mining and statistics because of its simple implementation and significant classification performance. However, it is impractical for traditional in methods to assign a fixed k value (even though set by experts) to all test samples. Previous solutions assign different k values to different test samples by the cross validation method but are usually time-consuming. This paper proposes a tree method to learn different optimal k values for different test/new samples, by involving a training stage in the in classification. In the test stage, the tree fast outputs the optimal k value for each test sample, and then, the in classification can be conducted using the learned optimal k value and all training samples. We first perform a brief survey on the recent progress of the KNN classification approaches. Then, the hybrid KNN (HBKNN) classification approach, which takes into account the local and global information of the query sample, is designed to address the problems raised from the special datasets. In the following, the random subspace ensemble frame-work based on HBKNN (RS-HBKNN) classifier is proposed to perform classification on the datasets with noisy attributes in the high-dimensional space. As a result, the proposed tree method has a similar running cost but higher classification accuracy, compared with traditional in methods, which assign a fixed k value to all test samples. Moreover, the proposed tree method needs less running cost but achieves similar classification accuracy, compared with the newly in methods, which assign different k values to different test samples. This paper further proposes an improvement version of tree method (namely, k*Tree method) to speed its test stage by extra storing the information of the training samples in the leaf nodes of tree, such as the training samples located in the leaf nodes, their ken’s, and the nearest neighbor of these ken’s. We call the resulting decision tree as k*Tree, which enables to conduct in classification using a subset of the training samples in the leaf nodes rather than all training samples used in the newly in methods. This actually reduces running cost of test stage. Finally, the nonparametric tests are pro-posed to be adopted to compare the proposed method with other classification approaches over multiple datasets. The experiments on the real-world datasets from the Knowledge Extraction based on Evolutionary Learning dataset repository demonstrate that RS-HBKNN works well on real datasets, and outperforms most of the state-of-the-art classification approaches. Finally, the experimental results on 20 real data sets showed that our proposed methods are much more efficient than the compared methods in terms of classification tasks.