
MP3DECODER DES DCT
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Dynamic Programming Partitioner Greedy Partitioner
Speedup
(a) 4-core platform
MP3DECODER DES DCT
0.0
0.5
1.0
1.5
2.0
Speedup
Dynamic Programming Partitioner Greedy Partitioner
(b) 8-core platform
Figure 2: The relative performance of 2 partitioning
schemes with respe ct to a na
¨
ıve partitioning s cheme.
The results are shown for 2 platforms and 3 distinct
StreamIt programs. Greedy partitioning performs
well on MP3DECODER on the 4-core platform but
not as well as dynamic programming partitioning
on the 8-core. This relative ordering is reversed for
DES and DCT. Determining the best partitioning
depends on program and platform.
form is known to be NP-complete and it is difficult to devise
a general heuristics [5].
To illustrate this point, consider figure 2, which shows the
performance of each partitioning approach on two different
multi-core platforms: a 2x dual-core (4-core) machine and
a 2x quad-core (8-core) machine. On the 4-core the greedy
scheme performs well for MP3DECODER; it has a lower
communication cost, exploiting data parallelism rather than
the pipeline parallelism favoured by the dynamic p rogram-
ming partitioner. On the 8-core p latform, however, the dy-
namic programming-based heuristic delivers better perfor-
mance as load balancing becomes critical.
When examining two further programs, DES and DCT,
we see the best partitioning algorithm for a particular ma-
chine is reversed. The figure shows that there is no current
”one-fits-all” heuristic and the best heuristic varies across
programs and architectures. Rather than relying on heuris-
tics, we would like a scheme that automatically predicts the
right sequence of fuse and fission operations for each pro-
gram and architecture. In the case of MP3DECODER, th is
means we want to select the operations that give the parti-
tioned code in figure 1 (b) for 4-cores and the partitioned
code in figure 1 (c) for 8-cores. However, predicting the cor-
rect sequences of fuse and fission is highly non-triv ial given
the unbounded stru cture of the input program graphs.
In the next section we describe our novel approach. Rather
than predicting the sequence of fuse and fission operations
directly, it tries to predict the right structure of the final
partitioned program. Given this target structure, it then
searches for a sequence of fuse and fission operations that
generates a partitioned program that fit s the predicted struc-
ture as closely as possible.
3. PREDICTING AND GENERATING A
GOOD PARTITION
One of the hurdles in predicting the best sequence of fu-
sion and fission op erations is that the graph keeps changing
structure after each operation. In figure 3, the second oper-
ation fiss(2.3) would have to be renamed fiss(1.6) if the first
operation (fuse(1.2,1.3,1.4,1.5)) had not taken place. A ny
scheme that tries to predict a sequence of fuse and fission
operations h as therefore to take into consideration the struc-
ture of the graph at each intermediate stage. The supervised
predictive modelling schemes explored to date are incapable
of managing this [9]. We take a different approach. Instead
of trying to predict the sequence of fuse and fission opera-
tions, we divide the problem into two stages as illustrated
in figure 4:
1. Predict the ideal structure of the final partitioned pro-
gram.
2. Search a space of operation sequences th at delivers a
program as close as possible to the ideal structure.
The first stage focuses on determining th e goal of parti-
tioning, i.e., the structure of the partitioned p rogram with-
out regard to how it may be actually realised. The second
stage explores different legal operation sequences u ntil the
generated partitioned p rogram matches the goal. This frees
us from the concern of correctly predicting the syntactically
correct sequence of fuse and fission operations. Instead we
can try arbitrary sequences u ntil we reach a partition that
closely matches our goal. The next section describes how
we can predict a good partitioning goal and is followed by
a section describing how we can generate a sequence of fuse
and fission operations to reach that goal.
3.1 Predicting the Ideal Partitioning Structure
- Setting the Goal
We wish to predict the ideal partitioned stru cture of any
input graph program. In order to cast this as a machine
learning problem, we wish to build a function f which, given
the essential characteristics or features X
orig
of the origi-
nal program predicts the features of the ideal partitioned
program X
ideal
. Building and using such a model follows
the well-known 3 step process for supervised machine learn-
ing [4]: (i) generate training data (ii) train a predictive
model (iii) use th e predictor. We generate training data
by evaluating (executing) randomly generated partitions for
each training program and recording their execution time.
The features of the original and best p artitioned program
are then used to train a mod el which is then used to pre-
dict the best ideal partitioning structure for any new unseen
program. One of the key aspects in building a successful pre-
dictor is developing the right program features in order to
characterise the original and goal program. This is described
in th e next section. This is followed by sections describing
training data generation, building the predictor using near-
est neighbors and then using t he predictor.
3.2 Extracting Features
Rather than trying to deal with unbounded program graphs
as input and outputs to our predictor, we describe the essen-
tial characteristics of such graphs by a fixed feature vector
of numerical values. The intention is that programs with
similarly feature vectors have similar behaviour. We empir-
ically evaluate this assumption in section 6.3. In this work,
we use program features, to characterise a streaming applica-
tion. The set of program features are summarized in table 1.
We extract two sets of those features from the overall stream
graph and the critical path of the program. Thereby, one set