Intel分布式Hadoop软件参考架构

需积分: 1 195 浏览量更新于2024-09-09 收藏 967KB PDF 举报

"Intel分布式的Apache Hadoop参考架构文档提供了在使用基于Intel Xeon CPU的商用集群硬件上实施Intel Hadoop软件的详细架构。该文档面向寻求在IT环境中构建Hadoop集群以进行大数据分析的客户和系统架构师。文档涵盖了Intel Hadoop软件栈的高层组件，常见的Hadoop用例，Intel的Hadoop分类，以及Intel发行版中独有的Hadoop扩展和增强功能。此外，还详细描述了针对特定Hadoop解决方案的理想部署配置，包括基准测试和性能调优的建议。" Intel分布式的Apache Hadoop软件是大数据处理领域的一个关键组件，它建立在Intel的高性能Xeon处理器之上，为大规模数据处理和分析提供强大支持。这个参考架构文档首先概述了Intel Hadoop软件栈的主要组成部分，这些组件可能包括Hadoop Distributed File System (HDFS)，MapReduce计算框架，YARN资源管理器，以及其他的Hadoop生态系统工具。典型的Apache Hadoop用例广泛，包括数据存储、数据挖掘、机器学习、实时分析等。Intel的Hadoop分类帮助用户更好地理解不同类型的Hadoop应用，以便选择最适合的解决方案。文档中详细介绍了Intel对标准Apache Hadoop发行版的扩展和增强，这可能涉及到硬件优化、性能提升、安全性改进，以及与Intel其他技术（如固态硬盘、网络技术）的集成。在参考架构部分，文档会讨论如何根据预期的工作负载和性能需求来优化配置Intel Xeon处理器驱动的Hadoop集群。这通常涉及节点数量、硬件规格的选择，以及内存和磁盘配置。同时，为了确保最佳性能，文档还会提供基准测试的方法，指导用户如何衡量集群的性能，并提供性能调优策略，包括数据局部性优化、作业调度策略调整等。 Intel Distribution for Hadoop的参考架构为在Intel硬件上构建高效、可扩展的Hadoop环境提供了宝贵的指南，帮助企业和组织充分利用其大数据资产，实现更高效的数据分析和业务洞察。对于想要在现有IT环境中实施Hadoop解决方案的系统架构师来说，这是一个不可或缺的资源。

3.0 Intel Hadoop Use Case Summary

The table below outlines important use cases that a typical Intel Hadoop cluster can be

used for.

• The Data Storage Framework (HDFS)

is the file system that Apache Hadoop

uses to store data on the cluster

nodes. HDFS is a distributed, scalable,

and portable file system. Intel Hadoop

includes compression and encryption for

enhanced security and performance.

• The Data Processing Framework

(MapReduce) is a massively-parallel

compute framework inspired by Google’s

MapReduce papers. Intel Hadoop includes

dynamic replication capabilities that

wax and wane the number of replicas

depending on workload characteristics.

• The Real Time Query Processing

Framework, which includes HBase,

a scalable distributed columnar data

storage system for large tables, and

Hive data warehouse infrastructure for

ad-hoc query processing. Intel Hadoop

includes extensions to support big tables

across geographically distributed data

centers, as well as feature additions to

improve Hbase and Hive performance.

The components that constitute the

Intel Hadoop solution taxonomy are

described below:

• HBase is a columnar database

management framework that uses the

underlying HDFS framework to provide

random and real time update to data.

It has been designed and developed to

provide the capability to host very large

tables that can support billions of rows

with millions of columns.

• Hive is the query engine framework

for Hadoop that facilitates easy data

summarization, ad-hoc queries, and the

analysis of large datasets stored in HDFS

and HBase.

• Apache ZooKeeper* is a high-

performance coordination service for

distributed applications. It is used as

a centralized service for maintaining

configuration information, naming,

providing distributed synchronization,

and providing group services.

• Apache Sqoop* is a tool designed to

efficiently transfer bulk data between

Apache Hadoop and structured data

stores such as relational databases. It

can be used to import data from external

data stores into Hadoop distributed files

system or related systems like Hive and

HBase. Conversely, Sqoop can be used

to run map/reduce jobs that extract

data from Apache Hadoop and export

to external structured data stores and

enterprise data warehouses.

Table 1. Intel Hadoop Solution Use Cases

Use case Description

Big data analytics

Ability to query in real time at the speed of thought

on petabyte scale unstructured and semi structured

data using HBase and Hive.

Data storage

Collect and store unstructured and semi-structured

data in a secure, fault-resilient scalable data store that

can be organized and sorted for indexing and analysis.

Batch processing of unstructured data

Ability to batch-process (index, analyze, etc.) tens to

hundreds of petabytes of unstructured and semi-

structured data.

Data archive

Medium-term (12–36 months) archival of data

from EDW/DBMS to increase the length that data

is retained or to meet data retention policies/

compliance.

Integration with data warehouse

Extract, transfer and load data in and out of Hadoop

into separate DBMS for advanced analytics.

Big data visualization

Capture, index and visualize unstructured and semi

structured big data in real time

Search and predictive analytics

Crawl, extract, index and transform semi

structured and unstructured data for search

and predictive analytics

4.0 Intel Hadoop Solution Taxonomy

In Figure 1, the dark blue layer in the Intel

Hadoop taxonomy is comprised of:

• The Intel® Manager for Apache

Hadoop* software, which is a web-

based management console designed

to install, configure, manage, monitor

and administer the Intel Hadoop

cluster. It uses Nagios and Ganglia to

monitor resources and configure alerts

in the cluster.

Intel® Manager for Apache Hadoop* Software

Deployment, Conﬁguration, Monitoring, Altering, and Security

Sqoop*

Data Exchange

Flume*

Log Collector

ZooKeeper*

Coordination

HBase*

Columnar Storage

Pig*

Scripting

Hive*

SQL-Like Query

Oozie*

Workﬂow

MapReduce*

Distributed Processing Framework

HDFS*

Hadoop Distributed File System

Figure 1. Intel Hadoop solution taxonomy

剩余13页未读，继续阅读

ldqhyfzpz

粉丝: 0
资源: 2

Intel分布式Hadoop软件参考架构

macOS_Sequoia_15.1.password(imacos.top).rdr.split.016

【java毕业设计】小区物业管理系统（springboot+vue+mysql+说明文档）.zip

里面全部都是浪漫的爱心特效，有html和python编写的，大概几十种，欢迎下载，收藏

Delphi 12 控件之FUPX-32bit-PORTABLE.zip

HandyControl

macOS_Sequoia_15.1.password(imacos.top).rdr.split.049

“雅乐”私人牙科诊所管理系统的设计与实现ssm.zip

ISO 8690 2024.pdf

城市公交&java&基于SpringBoot的城市公交管理系统设计与实现

个性化推荐影院&java&基于springboot个性化推荐影院设计与实现

最新资源