دانلود Massive Data Load on Distributed Database Systems over HBase
عنوان انگليسي
:
Massive Data Load on Distributed Database Systems over HBase
چکیده
Abstract
Big Data has become a pervasive technology to manage the ever-increasing volumes of data. Among Big Data solutions, scalable data stores play an important role, especially,key-value data stores due to their large scalability (thousands of nodes). The typical workflow for Big Data applications include two phases. The first one is to load the data into the data store typically as part of an ETL (Extract-Transform-Load) process.The second one is the processing of the data itself. Big Table and HBase are the preferred key-value solutions based on range-partitioned data stores. However, the loading phase is inefficient and creates a single node bottleneck. In this paper, we identify and quantify this bottleneck and propose a tool for parallel massive data loading that solves satisfactorily the bottleneck enabling all the parallelism and throughput of the underlying key-value data store during the loading phase as well. The proposed solution has been implemented as a tool for parallel massive data loading over HBase, the key-value data store of the Hadoop ecosystem.
Keywords:
HBase MapReduce HDFS
سایر منابع مهندسی کامپیوتر و IT-نرم افزار در زمینه داده بزرگ