Mining-Frequent-Patterns-without-Candidate-Generation-A-Frequent-Pattern-Tree-Approach
Our Price
₹4,500.00
10000 in stock
Support
Ready to Ship
Description
Existing parallel mining algorithms for frequentitemsets lack a mechanism that enables automatic paralleliza-tion, load balancing, data distribution, and fault tolerance onlarge clusters. As a solution to this problem, we design a par-allel frequent itemsets mining algorithm called FiDoop usingthe MapReduce programming model. To achieve compressedstorage and avoid building conditional pattern bases, FiDoopincorporates the frequent items ultrametric tree, rather thanconventional FP trees. In FiDoop, three MapReduce jobs areimplemented to complete the mining task. In the crucial thirdMapReduce job, the mappers independently decompose itemsets,the reducers perform combination operations by constructingsmall ultrametric trees, and the actual mining of these trees sep-arately. We implement FiDoop on our in-house Hadoop cluster.We show that FiDoop on the cluster is sensitive to data distri-bution and dimensions, because itemsets with different lengthshave different decomposition and construction costs. To improveFiDoop’s performance, we develop a workload balance metric tomeasure load balance across the cluster’s computing nodes. Wedevelop FiDoop-HD, an extension of FiDoop, to speed up the min-ing performance for high-dimensional data analysis. Extensiveexperiments using real-world celestial spectral data demonstratethat our proposed solution is efficient and scalable.