A Scheduling Strategy Supporting OpenMP Task on Heterogeneous Multicore
Qian Cao, Min Zuo
School of Computer and Information Engineering
Beijing Technology and Business University
Beijing, China
Abstract- One of the most important topics in software
industry is how to utilize the OpenMP 3.0 programming model
to improve the execution of irregular and unstructured
applications. In this paper, we present an original task
scheduling strategy- Hybrid strategy, which is suited to the
execution of OpenMP programs on Cell heterogeneous
multicore. Hybrid scheduling strategy creates tasks in
breadth-first order while executes tasks in work-first fashion
during application execution. The former is capable of
creating enough tasks, which prevents the worker threads
from idling. While the latter guarantees task dependence is
freed quickly and consequently, the overhead with respect to
task searching is significant decreased. The evaluation, with a
variety of Barcelona OpenMP Task Suite, is conducted on a
PS3 heterogeneous multicore. And the experimental results
indicate that Hybrid policy outperforms the existing work-first
and breadth-first scheduling strategies for most irregular and
unstructured benchmarks, with speedups from 1.5 to 4.6 when
6 SPEs are used.
Keywords- OpenMP 3.0; task; Hybrid scheduling strategy;
heterogeneous multicore
I. INTRODUCTION
With the ever increasing of semconductor industry,
multicore system has become one of the research hotspots in
integrated circuit design. Homogeneous multicore could
greatly improve the application parallelism, but it fails in
exploiting the concurrency available in some complex
applications. Heterogeneous multicore, which integrates a
number of diverse processors in a single chip, is inevitable
in computer architecture
[1]
.
The Cell Broadband Engine (Cell BE)
[2]
is a
representative heterogeneous multicore processor, which has
drawn substantial attention from both industry and academia.
It comprises a conventional Power Processor Element (PPE)
that controls eight synergistic processing elements (SPEs).
The PPE has two levels of hardware cache while SPEs don’t
have caches but each has a 256KB of local store (LS). The
PPE can access the main memory while SPE can only
operate directly on its LS. Such architectural characteristics
make Cell architecture compelling for some specific
scientific computing.
As the hardware complexity increases, modern
applications are getting more sophisticated. Irregular and
dynamic structures, such as recursive kernels, unbounded
loops are prevalent in high performance computing.
Therefore, many mainstream programming environments
[3-9]
adopt tasks as high level abstraction to solve such problems,
with OpenMP 3.0
[10]
as a prominent example. OpenMP 3.0
specification, which adds a new directive task, has shifted
from a thread-centric to a task-centric execution model. Task
mechanism allows the programmers to explicitly identify
units of independent work, leaving the scheduling strategy
of how and when to execute the created tasks to the runtime
system. As a consequence, an appropriate and flexible
scheduling decision of OpenMP 3.0 task mechanism on Cell
architecture has become a rapidly growing area of research.
Broadly speaking, these policies are approximately
categorized into work-first scheduling strategy (WF) and
breadth-first scheduling strategy (BF).
However, there are still some deficiencies existing in the
WF and BF strategies, which will be specified elaborately in
the following related works. Considering this, we design and
implement an original task scheduling strategy - Hybrid
scheduling strategy which supports OpenMP 3.0 task model
targeting Cell heterogeneous multicore. Hybrid scheduling
strategy combines the advantages of existing BF and WF
scheduling strategies. It creates tasks in BF order while
executes tasks in WF fashion. The former could spawn
enough tasks within a short time, which prevents the
computation threads from idling and thus decreases the
number of suspended tasks. Additionally, it avoids a
situation that all tied tasks in OpenMP 3.0 standard tie to a
certain thread in WF task creation fashion. While the latter
guarantees that the tasks at the bottom in the task-tree are
executed primarily. Therefore, task dependence is freed
earlier and the overhead with respect to task lookup is
drastically decreased. Moreover, the overload balance is
achieved.
To evaluate the performance of Hybrid strategy, we
conducted the experiments on a set of Barcelona OpenMP
Task Suite (BOTS) benchmarks. The experimental results
suggest that Hybrid scheduling strategy achieves better
performance improvements than WF and BF policies for
most benchmarks, with the speedups up to 4.6 when 6 SPEs
are used.
The rest of the paper is organized as follows. The related
works are presented in section II. The design of Hybrid
scheduling strategy on Cell BE is presented in section III.
Section IV describes the execution process of Hybrid policy.
Section V shows the evaluation results. And the last section
concludes the paper.
2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops
978-0-7695-4676-6/12 $26.00 © 2012 IEEE
DOI 10.1109/IPDPSW.2012.244
1974
2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum
978-0-7695-4676-6/12 $26.00 © 2012 IEEE
DOI 10.1109/IPDPSW.2012.244
2071
2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum
978-0-7695-4676-6/12 $26.00 © 2012 IEEE
DOI 10.1109/IPDPSW.2012.244
2077