دانلود Big Data Normalization for Massively Parallel Processing Databases

ترجمه مقاله Big Data Normalization for Massively Parallel Processing Databases
قیمت : 1,150,000 ریال
شناسه محصول : 2008212
نویسنده/ناشر/نام مجله : Computer Standards & Interfaces
سال انتشار: 2017
تعداد صفحات انگليسي : 13
نوع فایل های ضمیمه : Pdf+Word
حجم فایل : 727 Kb
کلمه عبور همه فایلها : www.daneshgahi.com
عنوان انگليسي : Big Data Normalization for Massively Parallel Processing Databases

چکیده

Abstract

High performance querying and ad-hoc querying are commonly viewed as mutually exclusive goals in massively parallel processing databases.  Also there is contradiction between ease of extending the data model and ease of analysis.  Modern approach, called Data Lake, promises extreme ease of adding new data to a data model, while it is prone to eventually converting to Data Swamp- unstructured, ungoverned, and out of control Data Lake where due to a lack of process, standards and governance, data is hard to find, hard to use and is consumed out of context.  This paper introduces a novel technique, highly normalized Big Data using Anchor modeling, that provides a very efficient way to store information and utilize resources, thereby providing ad-hoc querying with high performance for the first time in massively parallel processing databases.  This technique is almost as convenient for expanding data model as a Data Lake, while it is internally protected from transforming to Data Swamp.  A case study of how this approach is used for a Data Warehouse at Avito over three years time, with estimates for and results of real data experiments carried out in HP Vertica, an MPP RDBMS, are also presented.  This paper is an extension of theses from The 34th International Conference on Conceptual Modeling (ER 2015) [1], it is complemented with numerical results about key operating areas of highly normalized big data warehouse, collected over several (1-3) years of commercial operation. Also, the limitations, imposed by using a single MPP database cluster, are described, and cluster fragmentation approach is proposed.

Keywords: Big Data MPP database normalization analytics ad-hoc querying

Skip Navigation Links