2 The Netezza Data Appliance Architecture: A Platform for High Performance Data Warehousing and Analytics
organizations to innovate rapidly and bring high performance analytics to the widest range of
users and processes.
This IBM Redguide™ publication introduces the Netezza Asymmetric Massively Parallel
Processing (AMPP) architecture, and describes how the system orchestrates queries and
analytics to achieve its unprecedented speed. You will see how Netezza software and
hardware come together to extract the maximum utilization from every critical component,
and how a system optimized for tens of thousands of users querying huge data volumes
really works. It is a unique data warehouse and analytics platform with unparalleled
price-performance, ready for today's needs and tomorrow's challenges.
Architectural principles
The Netezza appliances integrate database, processing, and storage in a compact system
optimized for analytical processing and designed for flexible growth. The system architecture
is based on the following core tenets that have been a hallmark of Netezza leadership in the
industry:
Processing close to the data source
Balanced massively parallel architecture
Platform for advanced analytics
Appliance simplicity
Accelerated innovation and performance improvements
Flexible configurations and extreme scalability
Processing close to the data source
The Netezza architecture is based on a fundamental computer science principle: when
operating on large data sets, do not move data unless absolutely necessary. The Netezza
fully exploits this principle by utilizing commodity components called Field Programmable
Gate Arrays (FPGAs) to filter out extraneous data as early in the data stream as possible and
as fast as data streams off the disk. This process of data elimination close to the data source
removes I/O bottlenecks and frees up downstream components such as the CPU, memory,
and network from processing superfluous data, thus having a significant multiplier effect on
system performance.
Balanced, massively parallel architecture
The Netezza architecture combines the best elements of Symmetric Multiprocessing (SMP)
and Massively Parallel Processing (MPP) to create an appliance purpose-built for analyzing
petabytes of data quickly. Every component of the architecture, including the processor,
FPGA, memory, and network, is carefully selected and optimized to service data as fast as
the physics of the disk allows, while minimizing cost and power consumption. The Netezza
software orchestrates these components to operate concurrently on the data stream in a
pipeline fashion, thus maximizing utilization and extracting the utmost throughput from each
MPP node. In addition to raw performance, this balanced architecture delivers linear
scalability to more than a thousand processing streams executing in parallel, while offering a
very economical total cost of ownership.