
account and proposes a CPN based scheduling scheme. The details
of our proposed scheme will be presented later in this paper.
3. Task-Level Scoreboarding scheduling scheme
This section presents the Task-Level Scoreboarding scheme. The
description focuses on the architecture and processing flow of the
scheme. At the end of this section, the advantages of our Task-Level
Scoreboarding scheme are presented.
3.1. Architecture overview of Task-Level Scoreboarding scheduler
The block diagram of Task-Level Scoreboarding scheduler is
outlined by dashed lines in Fig. 1. The scheduler takes preprocessed
tasks as input, and buffers them in the task queue. Then the score-
board controller fetches tasks from the queue in order, schedules
them, and dispatches them to different processors for out-of-order
execution. Processor Status Table is utilized to record the statuses
of all processors, while the Variable Result Table indicates which
processor will produce the pending result. These two tables help
scoreboard controller to identify inter-task dependences. Monitor
module collects the running information of all processors and up-
date the content of those two tables immediately upon perceiving
a state change. Under the control of the scheduler, all processors
exchange data with shared memory through on-chip interconnec-
tion. Note that the scheduler can be implemented either as a soft-
ware module running on a GPP, or as a standalone hardware
module attached to on-chip interconnection.
3.2. Processing flow of scoreboard controller
Under the control of the scoreboard controller, every task will
undergo the stages shown in Table 1.
3.2.1. Check destination
A task moves to this stage immediately when entering score-
board controller. In this stage, the readiness of destination variable
is checked. The destination is ready if no other active tasks have
the same destination variable. Otherwise, a WAW hazard will be
detected. The task will move forward to the next stage if WAW
hazards do not exist or they are resolved.
3.2.2. Partition
As mentioned before, there may be more than one suitable
processor which a task can be dispatched to. Therefore, a partition-
ing strategy must be employed to choose one as target processor in
this situation. First of all, the controller looks through the statuses
of all processors to check whether there are any free processors
capable to execute the task. If yes, the controller will substitute
the value of the variable PAR with the identifier of the target pro-
cessor. Otherwise, the task will stall until any suitable processor
becomes free.
3.2.3. Issue
The absence of WAW and structural hazards is guaranteed
when a task enters issue stage. The controller will update related
table entries when a task issues.
3.2.4. Read operands
A source variable is ready when no earlier issued tasks intend to
write it. There must be a RAW hazard if either a source variable is
not ready. The scoreboard controller keeps on monitoring source
variables of issued tasks. Once all the source variables are ready,
the processor will access the memory to fetch source operands.
3.2.5. Execution complete
The processor will begin to execute immediately when source
operands have been read.
3.2.6. Write results
A WAR hazard arises when any source variable of earlier issued
tasks is just the destination variable of the completing task. A pro-
cessor will write its result back only when there are not any WAR
hazards. After that, a task is accomplished.
3.3. Advantages over instruction level scoreboarding
In this section, we present the advantages of our Task-Level
Scoreboarding over instruction level scoreboarding, which includes
two aspects:
(1) Task partitioning: When extending traditional scoreboarding
to operate on MPSoCs, we need consider how to partition
tasks to multiple processors. To the best of our knowledge,
few literatures consider task partitioning together with
our-of-order execution. In our proposed scheme, it is conve-
nient to evaluate different partitioning strategies, since the
partition stage is loosely coupled with the scheduling pro-
cess. For demonstration, we will present how to integrate
a greedy partitioning strategy to our Task-Level Scoreboard-
ing in Section 4.5.
(2) Monitoring and profiling: Our Task-Level Scoreboarding mon-
itors and collects the running information of processors, so it
is able to trace the whole execution process and locate
where the hotspot is through profiling techniques. When
applied on reconfigurable platforms, using the profiling
result, our Task-Level Scoreboarding can guide the process
of hardware reconfiguration at runtime to achieve higher
performance.
4. CPN models of Task-Level Scoreboarding
In order to facilitate the verification and evaluation of Task-Le-
vel Scoreboarding we proposed, CPN models are built. This section
concentrates on the structure and execution process of our model.
We discuss how the places and transitions in CPN are built, how
the inter-task dependences are identified, and finally how the par-
titioning strategies are modeled. Besides, taking a greedy algorithm
Fig. 1. Architecture of Task-Level Scoreboarding.
C. Wang et al. / Journal of Systems Architecture 60 (2014) 293–304
295