While in the map phase, every map task processes various hkey; valuei pairs, in the reduce phase
all the values for a given key are processed by a single reduce task. This is achieved by logically
partitioning the secondary memory of each worker processing a map task into r partitions, and then
determining in which particular partition an output pair should be stored, a process accomplished
by the shuffle step. This step can be viewed as a data routing step, determining which reduce
task will process a data pair based on the pair’s key. This function is performed by the workers
while processing the map tasks. Typically, a function such as (hash(key) mod r) is used, where
the hash function is a simple function, computable in a small constant time, used to map the keys
to a more manageable domain. Other partitioning functions can be defined by the user, especially
if the keys are in numeric form, such as partitioning the keys into r logical partitions representing
various ranges of values.
When all the map tasks have finished, the r reduce tasks are assigned to the available workers
using the same process as for the map tasks. Each reduce task accesses the data assigned to it,
stored across the workers responsible for computing the q map tasks. All the pairs with the same
key are stored in the same partition, and each partition can have pairs with different keys. All
this data is sorted by key and combined such that all values associated with a key are grouped
together in a single hkey; valuei pair. This is sometimes considered as being a second part of the
shuffle step.
Each reduce task then reads its assigned data and processes it one hkey; valuei pair at a time
using the reduce function. The task’s output is then written to global memory, and can either be
the final output of the algorithm or used as input to a new round of MapReduce. The input for a
new round of MapReduce is partitioned into q parts by the master processor and the process just
described is repeated.
The relationship between the system processors and the map and reduce tasks leads to some
interesting aspects of the MapReduce framework. A number of map and reduce tasks can be
performed, in sequence, by a single processor in each round. In each of the map and reduce
phases, tasks are assigned to workers as these finish their previously assigned task. Therefore, if
computation is equally divided between tasks, then every processor will perform about
q
/p map
tasks and
r
/p reduce tasks. If on the other hand the computation time differs for each task, then
load balancing is automatically achieved. It also allows for the efficient handling of fault tolerance.
However, the task assignment strategy also places some limitations on the framework. Between
rounds the data is split up into q parts and each is assigned to a map function. After any task
finishes, the worker’s primary memory is cleared, so data cannot be associated with a single
processor and accessed at will in different rounds. Therefore, any data that is required in multiple
rounds should be specifically stored in global memory.
Given a multiset of n hkey; valuei pairs as an input, the above process describing a single
MapReduce round is defined by two functions: map and reduce, and the shuffle step. These are
defined as follows:
– Given a single input pair from the multiset of the round’s input pairs {hk
1
; v
1
i, hk
2
; v
2
i, . . . , hk
n
; v
n
i},
the map function performs some computation to produce a new intermediate multiset of
hkey; valuei pairs {hl
1
; w
1
i, hl
2
; w
2
i, . . . , hl
m
; w
m
i}.
– The union of all the intermediate multisets produced by the map functions is acted upon in
the shuffle step. All the pairs with the same key l
i
are combined to produce a new set of lists
of the form hl
i
; w
1
, w
2
, . . .i.
– Each of the lists produced by the shuffle step is passed to a separate reduce function that
performs some computation to produce a new list hj
i
; x
1
, x
2
, . . .i.