Accelarating Mapreduce On Commodity Clusters An SSD Empowered Approach
Our Price
₹4,500.00
10000 in stock
Support
Ready to Ship
Description
Large companies like Facebook, Google, and Microsoft as well as a number of small and medium enterprises daily process massive amounts of data in batch jobs and in real-time applications. This generates high network traffic, which is hard to support using traditional, oversubscribed, network infrastructures. To address this issue, several novel network topologies have been proposed, aiming at increasing the bandwidth available in enterprise clusters. We observe that in many of the commonly used work-loads, data is aggregated during the process and the output size is a fraction of the input size. In spite of the fact, we found that MapReduce on commodity clusters, which are usually equipped with limited memory and hard-disk drive (HDD) and have processors of multiple or many cores, does not scale as expected as the number of processor cores increases. The key reason for this is that the underlying low-speed HDD storage cannot meet the requirement of frequent IO operations. To deal with the problem and make MapReduce more scalable on commodity clusters, we present a solution that utilizes solid-state drive (SSD) to cache input data and localized data of MapReduce tasks.