5kk03 Embedded Systems Laboratory
[Design of an Embedded JPEG Decoder on a multiprocessor platform]
Geert Kwintenberg
Hardware Expert
0614867
geert.k@zonnet.nl
Lennart de Graaf
Software Engineer
0612829
l.deGraaf@fontys.nl
Wei Tong
Embedded Engineer
0641310
w.tong@student.tue.nl
Manickavasagam
Shanmugam Annamalai
Group Leader
0641287
s.a.manick@gmail.com
Feiteng Yang
Software Engineer
0638263
yftfly@hotmail.com
1. INTRODUCTION
This document describes the implementation of a JPEG De-
coder on a multiprocessor platform. To complete the assign-
ment in the required time an already functional implementa-
tion of a JPEG Decoder was used. The multiprocessor plat-
form used in the assignment consists of a Celoxica RC300
FPGA board. The FPGA on this b oard contains three Sil-
icon Hive VLIW
1
processors. The board also consists of an
external memory and a framebuffer. All of these components
are connected through a NXP Network on Chip called Æthe-
real. The assignment consists of the following contents: Put
an embedded JPEG Decoder on the market at the end of
Q2 2008, port the application code to the embedded VLIW
cores, efficiently map the application to the platform and
optimize the system by using performance metrics. From
a organizational view, to smoothen the design process each
group members was assigned a certain role.
In section 2 of the paper is explained how the design process
was started. Then an overview is given in section 3 of the
code optimizations made to make the application more effi-
cient. Then a description is given of the three different im-
plementations used; Data parallel, Functional parallel and
the Hybrid version in sections 4, 5 and 6 respectively. In
section 7 the benchmark results are depicted. Finally, con-
clusions are presented in section 8.
2. GETTING STARTED
To get started with programming the embedded Silicon Hive
cores, some small programs were written to do some basic
calculations. From this point it become clear that program-
ming the on the cores have some limitations. For example
1
Very Large Instruction Word: Refers to a CPU architecture
designed to take advantage of instruction level parallelism
the cores do not support doubles or floating point opera-
tions. The cores also do not include a hardware divider. So
the divisions are done in a software manner. The follow-
ing sections describe which tasks had to be completed to
implement the JPEG Decoder on a single core.
2.1 Single core porting
The starting point of this project is a working JPEG decoder
solution that runs on the host system. To be able to tryout
the benefits of using multiple cores to do the decoding in
parallel, first two things need to be done:
• Split up the code in a part that runs on the host and
a part that runs on the core(s);
• Port the source code that needs to run on de core(s).
Splitting up the code is fairly easy, since in the original
code there is a good functional seperation between the ini-
tial setup and the actual decoding. Basically the splitup
is already in the original code because JpgToBmp.c does all
initialisations and then calls the decoder() function that is in
the file decoder.c. The most time consuming part here was
to get familiar with the silicon hyves environment, functions
to control the cores (loading, starting and waiting) and ex-
change data between host and core. Also the infamiliarity
with working with makefiles took some time. Porting the
code to the core was tedious. Main issues here is that li-
brary functions that work on the host, are not available on
the core. This most of the times is due to the fact that the
actual hardware is different and functions like printf() and
fget() do not make sense any more. Main changes here are:
• Removing all calls to standard IO (printf functions).
To keep crun working, we made use of #ifdef construc-
tions to remove all functions that were not supported
when run on hyvesCC;
• Changing all calls to file functions. From a core file ac-
cess is not supported. Therefore the host is responsible
for reading the .JPEG file and creating the resulting
.bmp file. Information from these files are passed to