Benefits of I/O Acceleration Technology (I/OAT) in Clusters
∗
Karthikeyan Vaidyanathan Dhabaleswar K. Panda
Computer Science and Engineering,
The Ohio State University
{vaidyana, panda}@cse.ohio-state.edu
Abstract
Packet processing in the TCP/IP stack at multi-Gigabit data rates
occupies a significant portion of the system overhead. Though there
are several techniques to reduce the packet processing overhead on
the sender-side, the receiver-side continues to remain as a bottleneck.
I/O Acceleration Technology (I/OAT), developed by Intel, is a set of
features particularly designed to reduce the receiver-side packet pro-
cessing overhead. This paper studies the benefits of the I/OAT tech-
nology by extensive evaluations through micro-benchmarks as well
as evaluations on two different application domains: (1) A multi-
tier data-center environment and (2) A Parallel Virtual File System
(PVFS). Our micro-benchmark evaluations show that I/OAT results
in 38% lower overall CPU utilization in comparison with traditional
communication. Due to this reduced CPU utilization, I/OAT delivers
better performance and increased network bandwidth. Our experi-
mental results with data-centers and file systems reveal that I/OAT
can improve the total number of transactions processed by 14% and
throughput by 12%, respectively. In addition, I/OAT can sustain a
large number of concurrent threads (up to a factor of four as com-
pared to non-I/OAT) in data-center environments, thus increasing the
scalability of the servers.
1 Introduction
Over the past few years, there has been an incredible growth
of highly data-intensive applications in various fields such as
medical informatics, genomics, e-commerce, data mining and
satellite weather image analysis. With technology trends, the
ability to store and share the datasets generated by these appli-
cations is also increasing, allowing scientists and institutions
to create large dataset repositories and making them available
for use by others. On the other hand, clusters consisting of
commodity off-the-shelf hardware components have become
increasingly attractive as platforms for high-performance com-
putation and scalable servers. Based on these two trends, re-
searchers have proposed the feasibility and potential of cluster-
based servers [14, 10, 18, 19].
Several clients request these servers for either the raw
or some kind of processed data simultaneously. How-
ever, existing servers are becoming increasingly incapable of
∗
This research is supported in part by NSF grants #CNS-0403342
and #CNS-0509452; DOE grants #DE-FC02-06ER25749 and #DE-FC02-
06ER25755; grants from Intel, Mellanox, Cisco systems, Linux Networx and
Sun Microsystems; and equipment donations from Intel, Mellanox, AMD, Ap-
ple, Appro, Dell, Microway, PathScale, IBM, SilverStorm and Sun Microsys-
tems.
meeting such sky-rocketing processing demands with high-
performance and scalability. These servers rely on TCP/IP
for data communication and typically use Gigabit Ethernet
networks for cost-effective designs. The host-based TCP/IP
protocols on such networks have high CPU utilization and
low bandwidth, thereby limiting the maximum capacity (in
terms of requests they can handle per unit time). Alternatively,
many servers use multiple Gigabit Ethernet networks to cope
with the network traffic. However, at multi-Gigabit data rates,
packet processing in the TCP/IP stack occupies a significant
portion of the system overhead.
Packet processing [12, 13] usually involves manipulating
the headers and moving the data through the TCP/IP stack.
Though this does not require significant computation, pro-
cessor time gets wasted due to delays caused by latency of
memory accesses and data movement operations. To over-
come these overheads, researchers have proposed several tech-
niques [9] such as transport segmentation offload (TSO),
jumbo frames, zero-copy data transfer (sendfile()), interrupt
coalescing, etc. Unfortunately, many of these techniques are
applicable only on the sender side, while the receiver side con-
tinues to remain as a bottleneck in several cases, thus result-
ing in a huge performance gap between the CPU overheads of
sending and receiving packets.
Intel’s I/O Acceleration Technology (I/OAT) [1, 3, 2, 15] is
a set of features which attempts to alleviate the receiver packet
processing overheads. It has three additional features, namely:
(i) split headers, (ii) DMA copy offload engine and (iii) multi-
ple receive queues.
At this point, the following open questions arise:
• What kind of benefits can be expected from the current
I/OAT architecture?
• How does this benefit translate to applications?
In this paper, we focus on the above questions. We first
analyze the performance of I/OAT based on a detailed suite
of micro-benchmarks. Next, we evaluate it on two different
application domains:
• A multi-tier Data-Center environment
• A Parallel Virtual File System (PVFS)
Our micro-benchmark evaluations show that I/OAT reduces the
overall CPU utilization significantly, up to 38%, as compared
1