3.0 Intel Hadoop Use Case Summary
The table below outlines important use cases that a typical Intel Hadoop cluster can be
used for.
• The Data Storage Framework (HDFS)
is the file system that Apache Hadoop
uses to store data on the cluster
nodes. HDFS is a distributed, scalable,
and portable file system. Intel Hadoop
includes compression and encryption for
enhanced security and performance.
• The Data Processing Framework
(MapReduce) is a massively-parallel
compute framework inspired by Google’s
MapReduce papers. Intel Hadoop includes
dynamic replication capabilities that
wax and wane the number of replicas
depending on workload characteristics.
• The Real Time Query Processing
Framework, which includes HBase,
a scalable distributed columnar data
storage system for large tables, and
Hive data warehouse infrastructure for
ad-hoc query processing. Intel Hadoop
includes extensions to support big tables
across geographically distributed data
centers, as well as feature additions to
improve Hbase and Hive performance.
The components that constitute the
Intel Hadoop solution taxonomy are
described below:
• HBase is a columnar database
management framework that uses the
underlying HDFS framework to provide
random and real time update to data.
It has been designed and developed to
provide the capability to host very large
tables that can support billions of rows
with millions of columns.
• Hive is the query engine framework
for Hadoop that facilitates easy data
summarization, ad-hoc queries, and the
analysis of large datasets stored in HDFS
and HBase.
• Apache ZooKeeper* is a high-
performance coordination service for
distributed applications. It is used as
a centralized service for maintaining
configuration information, naming,
providing distributed synchronization,
and providing group services.
• Apache Sqoop* is a tool designed to
efficiently transfer bulk data between
Apache Hadoop and structured data
stores such as relational databases. It
can be used to import data from external
data stores into Hadoop distributed files
system or related systems like Hive and
HBase. Conversely, Sqoop can be used
to run map/reduce jobs that extract
data from Apache Hadoop and export
to external structured data stores and
enterprise data warehouses.
Table 1. Intel Hadoop Solution Use Cases
Use case Description
Big data analytics
Ability to query in real time at the speed of thought
on petabyte scale unstructured and semi structured
data using HBase and Hive.
Data storage
Collect and store unstructured and semi-structured
data in a secure, fault-resilient scalable data store that
can be organized and sorted for indexing and analysis.
Batch processing of unstructured data
Ability to batch-process (index, analyze, etc.) tens to
hundreds of petabytes of unstructured and semi-
structured data.
Data archive
Medium-term (12–36 months) archival of data
from EDW/DBMS to increase the length that data
is retained or to meet data retention policies/
compliance.
Integration with data warehouse
Extract, transfer and load data in and out of Hadoop
into separate DBMS for advanced analytics.
Big data visualization
Capture, index and visualize unstructured and semi
structured big data in real time
Search and predictive analytics
Crawl, extract, index and transform semi
structured and unstructured data for search
and predictive analytics
4.0 Intel Hadoop Solution Taxonomy
In Figure 1, the dark blue layer in the Intel
Hadoop taxonomy is comprised of:
• The Intel® Manager for Apache
Hadoop* software, which is a web-
based management console designed
to install, configure, manage, monitor
and administer the Intel Hadoop
cluster. It uses Nagios and Ganglia to
monitor resources and configure alerts
in the cluster.
Intel® Manager for Apache Hadoop* Software
Deployment, Configuration, Monitoring, Altering, and Security
Sqoop*
Data Exchange
Flume*
Log Collector
ZooKeeper*
Coordination
HBase*
Columnar Storage
Pig*
Scripting
Hive*
SQL-Like Query
Oozie*
Workflow
MapReduce*
Distributed Processing Framework
HDFS*
Hadoop Distributed File System
Figure 1. Intel Hadoop solution taxonomy
3