mpCache: Accelerating MapReduce with Hybrid Storage System on Many-Core Clusters - Network and Parallel Computing Access content directly
Conference Papers Year : 2014

mpCache: Accelerating MapReduce with Hybrid Storage System on Many-Core Clusters

Abstract

As a widely used programming model and implementation for processing large data sets, MapReduce does not scale well on many-core clusters, which, unfortunately, are common in current data centers. To deal with the problem, this paper: 1) analyzes the causes of poor scalability of MapReduce on many-core clusters and identifies the key one as the underlying low-speed storage (hard disk) can not meet the requirements of frequent IO operations, and 2) proposes mpCache, a SSD based hybrid storage system that caches both Input Data and Localized Data, and dynamically tunes the cache space allocation between them to make full use of the space. mpCache has been incorporated into Hadoop and evaluated on a 7-node cluster by 13 benchmarks. The experimental results show that mpCache gains an average speedup of 2.09 when compared with the original Hadoop, and achieves an average speedup of 1.79 when compared with PACMan, the latest in-memory optimization of MapReduce.
Fichier principal
Vignette du fichier
978-3-662-44917-2_19_Chapter.pdf (288.7 Ko) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

hal-01403087 , version 1 (25-11-2016)

Licence

Attribution

Identifiers

Cite

Bo Wang, Jinlei Jiang, Guangwen Yang. mpCache: Accelerating MapReduce with Hybrid Storage System on Many-Core Clusters. 11th IFIP International Conference on Network and Parallel Computing (NPC), Sep 2014, Ilan, Taiwan. pp.220-233, ⟨10.1007/978-3-662-44917-2_19⟩. ⟨hal-01403087⟩
58 View
97 Download

Altmetric

Share

Gmail Facebook X LinkedIn More