!
"#
Figure 1: KNN’s data processing speed.
2.2.2 Key Observations
We evaluate 11 computing resource configurations, as
shown by the horizontal axis in Figure 1. Each configu-
ration is denoted as gG-cC-tT, where g represents for the
number of simultaneously running GPU map tasks, c for
the number of simultaneously running CPU tasks, and t for
the number of threads inside each CPU map task. Figure 1
shows the performance, with the vertical axis representing
the data processing speed. We make two significant obser-
vations from the results:
• Using two GPUs only brings very slight performance
gain over one GPU. When only one GPU is used, the
data processing speed is 60MB/s (the first bar in Fig-
ure 1). However, when another GPU is exploited si-
multaneously, the data processing speed is only slightly
increased to 65MB/s (the second bar in Figure 1).
• Coordinating CPUs together with one GPU leads to
worse performance than that of only using one GPU.
When only one GPU is used, the data processing speed
is 60MB/s (the first bar). However, When one or more
CPU tasks are running simultaneously with one GPU
task (bars 3-7), the overall performance would decrease
unexpectedly, varying from 58MB/s to 51MB/s. A
similar observation can be found for the configurations
containing two GPU tasks.
2.2.3 Analysis
First we demonstrate the different behaviors of a CPU
task and a GPU task in Hadoop+, as shown in Figure 2. As
the red line shows, the I/O traffic of a CPU task keeps almost
unchanged during the task execution. The reason is that
Hadoop+ leverages the execution mechanism in Hadoop for
CPU tasks, which iteratively reads only a small piece of
data and processes them quickly, thus the I/O traffic keeps
low. However, the behavior of the GPU task is different, as
shown by the blue line. To obtain high GPU occupancy, the
GPU task reads a chunk of data, transfers it to the GPU,
and launches the GPU kernel to process it, thus it exhibits
obvious phase behavior. In particular, the I/O traffic is high
when the GPU task is reading data from HDFS (via its host
thread), and low when the GPU task is executing the kernel.
To analyze the reason for the observations in Section 2.2.2,
we take one GPU task, denoted x, and examine its perfor-
mance under the 11 configurations. We find that the key
reason is shared I/O resource contention among CPU and
GPU tasks. To demonstrate this, we comment out the com-
putation in x and run it under the 11 configurations. In
Figure 2: Behaviors of CPU/GPU tasks.
!"#$
%&
Figure 3: KNN’s data reading speed.
this way, each GPU task reads a split from HDFS without
any following computations. Figure 3 shows the data read-
ing speed of x. When only x is running, the data reading
speed can reach 72MB/s (the first bar 1G-0C-0T), while it
drops to 36MB/s when another GPU task is running simul-
taneously (the second bar 2G-0C-0T). Furthermore, when 4
single-threaded CPU tasks and another GPU task are run-
ning together with x, its data reading speed would decrease
to only 14MB/s (the last bar 2G-4C-1T).
2.2.4 Summary - The Challenge
The observations and our analyses demonstrate that it is a
challenge to model the heterogeneity for MapReduce appli-
cations running in heterogeneous clusters. In particular, the
challenge can be summarized into the following questions:
• What factors would affect the performance gain when
allocating a computing resource to an application?
• How will the performance contribution of one comput-
ing resource, e.g., GPU, vary with applications?
• How to select a resource configuration for an applica-
tion for different purposes, e.g., to obtain best perfor-
mance, or to be most cost-effective?
3. HADOOP+ FRAMEWORK
Figure 4 gives an overview of our Hadoop+ framework.
Besides the Map and Reduce primitives in Hadoop, Hadoop+
provides another two primitives, PMap and PReduce, to pro-
grammers. The difference is that the PMap and PReduce
in Hadoop+ enable programmers to write explicit parallel
CUDA/OpenCL functions running on GPUs as plug-ins, as
shown by the box of “User-Provided PMap/PReduce Func-
tion” in Figure 4. Meanwhile, users can also use the Map
and Reduce functions in Hadoop. In Hadoop+, users can
provide Map, PMap or both, and Reduce, PReduce or both.
To support explicit parallel Map functions, Hadoop+ pro-
vides different input parameters for Map and PMap.In
particular, the input of Map is (key,value), while the input