SDAC: Porting Scientific Data to Spark RDDs

Tian Yang; Kenjiro Taura; Liu Chao

doi:10.1007/978-3-319-68210-5_13

Conference Papers Year : 2017

SDAC: Porting Scientific Data to Spark RDDs

(1, 2) , (2) , (1)

1
2

Tian Yang

Function : Author
PersonId : 1027965

Beihang University

The University of Tokyo

Kenjiro Taura

Function : Author

The University of Tokyo

Liu Chao

Function : Author

Beihang University

Abstract

Scientific data processing has exposed a range of technical problems in industrial exploration and specific-domain applications due to its huge input volume and data format diversity. While Big Data analytic frameworks such as Hadoop and Spark lack their native supports for processing increasing heterogeneous scientific data efficiently. In this paper, we introduce our work named SDAC (Scientific Data Auto Chunk) for porting various scientific data to RDDs to support parallel processing and analytics in Apache Spark framework. With the integration of auto-chunk task granularity-specify method, a better-planned theoretical pipeline can be derived to navigate data partitioning and parallel I/O. We showcase performance comparison with H5Spark within 6 benchmarks in both standalone and distributed mode. Experimental results showed SDAC module achieved an overall improvement of 2.1 times over H5Spark in standalone mode, and 1.34 times in distributed mode.

Keywords

Scientific data Spark RDDs HDF5

Domains

Computer Science [cs]

Fichier principal

457609_1_En_13_Chapter.pdf (436.43 Ko)

Origin	Files produced by the author(s)

Hal Ifip : Connect in order to contact the contributor

https://inria.hal.science/hal-01705439

Submitted on : Friday, February 9, 2018-2:26:03 PM

Last modification on : Wednesday, November 3, 2021-4:18:55 AM

Long-term archiving on : Thursday, May 10, 2018-12:41:00 PM

Dates and versions

hal-01705439 , version 1 (09-02-2018)

Licence

Attribution

Identifiers

HAL Id : hal-01705439 , version 1
DOI : 10.1007/978-3-319-68210-5_13

Cite

Tian Yang, Kenjiro Taura, Liu Chao. SDAC: Porting Scientific Data to Spark RDDs. 14th IFIP International Conference on Network and Parallel Computing (NPC), Oct 2017, Hefei, China. pp.127-130, ⟨10.1007/978-3-319-68210-5_13⟩. ⟨hal-01705439⟩

Export

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

IFIP-LNCS IFIP IFIP-TC IFIP-TC10 IFIP-NPC IFIP-WG10-3 IFIP-LNCS-10578

153 View

175 Download

SDAC: Porting Scientific Data to Spark RDDs

Abstract

Keywords

Domains

Dates and versions

Licence

Identifiers

Cite

Export

Collections

Altmetric

Share