A TASK-LEVEL OOO FRAMEWORK FOR
HETEROGENEOUS SYSTEMS
Junneng Zhang
#1
, Chao Wang
#2
, Xi Li
∗3
, Peng Chen
#4
, Xiaojing Feng
#5
, Xuehai Zhou
∗6
#
Suzhou Institute for Advanced Study, University of Science and Technology of China
Suzhou, Jiangsu, China
1
zjneng@mail.ustc.edu.cn
2
saintwc@mail.ustc.edu.cn
4
qwe123@mail.ustc.edu.cn
5
bangyan@mail.ustc.edu.cn
∗
School of Computer Science, University of Science and Technology of China
Hefei, Anhui, China
3
llxx@ustc.edu.cn
6
xhzhou@ustc.edu.cn
Abstract—This paper proposes a framework targeting the
problem of task-level out-of-order (OoO) execution for heteroge-
neous systems. The framework consists of three layers: 1) Pro-
gramming model; 2) OoO task scheduler; 3) Processing Elements.
In order to uncover task-level parallelism automatically, renam-
ing scheme is applied from instruction-level parallelism (ILP) to
task-level parallelism (TLP). With the help of renaming scheme,
inter-task data dependencies can be detected automatically dur-
ing execution, and then task-level WAW and WAR dependencies
can be eliminated dynamically. We applied Tomasulo algorithm
from ILP to perform task-level OoO execution, and implemented
a prototype on a state-of-art reconfigurable FPGA platform.
Experimental results show that the framework is efficient for
heterogeneous systems.
I. INTRODUCTION
Task-level parallelism (TLP) has been widely researched at
different levels during past decades, e.g. programming model
[1] [2] [3] [4] [5] [6] [7] [8] [9], compiler and runtime library
[10] [11] [12] [13], and architecture [14] [15] [16] [17].
Traditional programming models for TLP, as OpenMP and
MPI, perform well for regular operations (e.g. loop), but for ir-
regular operations, the results may be unsatisfied. At compiler
level, how to detect dependencies statically is still challenging.
Alternatively, using special architecture to support task-level
out-of-order (OoO) execution seems efficient. However, how
to make the architecture flexible to suite for various systems
remains unresolved, especially for heterogeneous systems with
different types of Processing Elements (PEs).
In this paper, we take programming model, compiler and
architecture into consideration, and intend to find an efficient
way to uncover TLP for heterogeneous systems. The funda-
mental system is an FPGA based platform, which contains
different types of PEs: one or several general purpose proces-
sor(s) (GPP) and a variety of Intellectual Property (IP) cores.
So far the following features have been completed:
1) On the basis of state-of-art programming paradigms, we
propose a programming model which supports TLP without
explicit tasks scheduling by programmers. The program is
divided into a series of tasks, which stand for functions to
be executed on PEs (e.g. GPP or IP).
2) We have implemented a hardware MP-Tomasulo module
for heterogeneous platforms to support OoO task execution.
MP-Tomasulo module detects task-level data dependencies
and eliminates WAW and WAR dependencies automatically
at runtime using renaming scheme.
The rest of the paper is organized as follows: section II
illustrates the programming model, section III details the OoO
task execution of MP-Tomasulo module, section IV gives
the experiments method and results, section V describes the
related work, and section VI summarizes the paper.
II. PROGRAMMING MODEL LAYER
In order to make TLP efficient for heterogeneous systems,
in this paper we propose a framework that is composed
of three layers. The top layer is the programming model
layer, which provides programmers with interfaces for parallel
programming. The middle layer is the scheduler layer, which
is in charge of task-level OoO scheduling using renaming
scheme. The bottom layer is PEs, which are responsible for
task execution. Throughout this paper, tasks refer to dynamic
instances created when application programming interfaces
are invoked by user applications [18]. Furthermore, tasks are
regarded as functional abstract instructions, and each IP core is
treated as a dedicated functional unit to run a specific hardware
task.
The programming model is derived from CellSs, which uses
annotations to define tasks. In our framework, PEs are divided
into two categories: hardware PEs and software PEs. Each
type hardware PEs can only do a specific kind of task, while
software PEs have the capability to do all kinds of tasks. In our
framework, the interfaces to call hardware PEs are regulated
in runtime libraries. To use software PEs, programmers need
to define the tasks as in CellSs.
978-1-4673-2845-6/12/$31.00
c
2012 IEEE