大数据处理：Hadoop 1.0版应对数据挑战

需积分: 10 40 浏览量更新于2024-07-17 收藏 7.29MB PDF 举报

《利用Hadoop进行大数据处理第一版》是一本专著，由Revathi T.、Muneeswaran K.和Blessa Binolin Pepsi三位作者撰写，针对当今大数据时代的挑战和需求。随着移动设备、社交媒体、地理信息系统（GIS）、医疗诊断图像技术等产生的海量数据，如何存储、管理和实时处理这些数据成为关键问题。预计未来十年，数据源将扩大50倍，IDC预测在2014年至2019年期间，大数据技术和服务市场将以23.1%的复合年增长率(CAGR)增长，到2019年年度支出可能达到486亿美元。数字化宇宙预计在两年内将使数据量翻倍，至2020年将达到44泽字节（10^21），即44万亿GB。这本书着重介绍了Hadoop框架中的YARN（Yet Another Resource Negotiator）组件，它作为Hadoop分布式计算的核心部分，负责资源管理和调度任务。YARN允许用户构建并运行大型分布式应用程序，通过将计算任务分解为可管理的小片段，使得处理大规模数据成为可能。Hadoop生态系统包括HDFS（Hadoop Distributed File System）用于存储海量数据，MapReduce模型则用于执行并行处理任务，而YARN提供了一个灵活的平台来支持这些操作。书中详细探讨了如何设计适应大数据分析的新架构，引入专门的数据分析沙箱（数据科学家可以在此环境中进行实验和探索），以及整合多种技能，如数据清洗、预处理、机器学习和数据挖掘等，以有效应对这个数据爆炸的时代。此外，作者还可能讨论了如何在Hadoop上实现数据安全、性能优化和故障恢复等关键问题。《利用Hadoop进行大数据处理第一版》为读者提供了处理现代大数据挑战的实用工具和技术，帮助读者理解和掌握如何在迅速发展的数字世界中运用Hadoop技术，挖掘隐藏在海量数据中的价值。这是一本对IT专业人员，特别是数据科学家、数据工程师和企业决策者来说不可或缺的参考资料。

Big Data Overview

the locations of friends and to receive oﬀers from nearby stores and

restaurants.

• Image, audio data can be analyzed for applications such as facial

recognition systems in security systems.

• Microsoft Azure Marketplace, World Bank, Wikipedia etc. provides

data which is publicly available on the web. This data can be taken for

any analysis.

BIG DATA ANALYTICS

Stored data does not generate any business value which is of traditional

databases, data warehouses, and the new technologies for storing big data.

So, once the data is available, it is to be processed further using some data

analytics technologies.

Data analysis is the process of extracting some useful information out of

available data and hence making some conclusions. It uses statistical methods,

questioning, selecting or discarding some subsets, examining, comparing

and confirming, etc.

One step further to analysis is data analytics. Data analytics is the

process of building predictive models and discovering patterns from data.

The evolution of data analytics proceeded from Decision support systems

(DSS) to Business Intelligence (BI) and the data analytics. DSS was used

as a description for an application and an academic discipline. Over time,

decision support applications included online analytical processing (OLAP),

and dashboards which became popular. Then, Business Intelligence, broad

category for analyzing and processing the gathered data to help business

users to make better decisions. Data analytics combines BI and DSS along

Figure 2. Sources of Data Deluge

剩余254页未读，继续阅读

markvivv

粉丝: 5552
资源: 2

大数据处理：Hadoop 1.0版应对数据挑战

Modern Big Data Processing with Hadoop

Big Data Analytics with Hadoop 3 1st Edition

Fast Data Processing with Spark 2, 3rd Edition.pdf

Fast Data Processing With Spark (3rd Edition) PDF

Big Data Analytics with Spark 无水印pdf 0分

Pro Apache Hadoop, 2nd Edition

Fast Data Processing with Spark(PACKT,2ed,2015)

Scalable Big Data Architecture pdf 无水印 0分

Handbook of Big Data Technologies

Hadoop: The Definitive Guide, 4th Edition

最新资源