2 EURASIP Journal on Embedded Systems
integrated on a card. More recently a trend has been to inte-
grate multiple processors on a single chip creating SDR CMP
systems. The S DR Forum [1] defines five tiers of solutions.
Tier-0 is a traditional radio implementation in hardware.
Tier-1, software-controlled radio (SCR), implements the
control features for multiple hardware elements in software.
Tier-2, software-defined radio (SDR), implements modu-
lation and baseband processing in software but allows for
multiple frequency fixed function RF hardware. Tier-3, ideal
software radio (ISR), extends programmability through the
RF with analog conversion at the antenna. Tier-4, ultimate
software r adio (USR), provides for fast (millisecond) transi-
tions between communications protocols in addition to dig-
ital processing capability.
The advantages of reconfigurable SDR solutions versus
hardware solutions are significant. First, reconfigurable so-
lutions are more flexible allowing multiple communication
protocols to dynamically execute on the same transistors
thereby reducing hardware costs. Specific functions such as
filters,modulationschemes,encoders/decoderscanbere-
configured adaptively at run time. Second, se veral commu-
nication protocols can be efficiently stored in memory and
coexist or execute concurrently. This significantly reduces
the cost of the system for both the end user and the ser-
vice provider. Third, remote reconfiguration provides sim-
ple and inexpensive maintenance and feature upgrades. This
also allows service providers to differentiate products after
the product is deployed. Fourth, the development time of
new and existing communications protocols is significantly
reduced providing an accelerated t ime to market. Develop-
ment cycles are not limited by long and laborious hardware
design cycles. With SDR, new protocols are quickly added as
soon as the software is available for deployment. Fifth, SDR
prov ides an attractive method of dealing with new standards
releases while assuring backward compatibility with existing
standards.
SDR enabling technologies also have significant advan-
tages from the consumer perspective. First, mobile terminal
independence with the ability to “choose” desired feature sets
is provided. As an example, the same terminal may be ca-
pable of supporting a superset of features but the consumer
only pays for features that they are interested in using. Sec-
ond, global connectivity with the ability to roam across oper-
ators using different communications protocols can be pro-
vided. Third, future scalability and upgr adeability provide
forlongerhandsetlifetimes.
1.2. Processor background
In this section we define a number of terms and provide
background information on general purpose processors, dig-
ital signal processors, and some of the workload differences
between general purpose computers and real-time embed-
ded systems.
The architecture of a computer system is the minimal set
of properties that determine what programs will run and
what results they will produce [2]. It is the contract between
the programmer and the hardware. Every computer is an
interpreter of its machine language—that representation of
programs that resides in memory and is interpreted (exe-
cuted) directly by the (host) hardware.
The logical organization of a computer’s dataflow and
controls is called the implementation or microarchitecture.
The physical structure embodying the implementation is
called the realization. The architecture describes what hap-
pens while the implementation describes how it is made
to happen. Programs of the same architecture should run
unchanged on different i mplementations. An architectural
function is transparent if its implementation does not pro-
duce any architecturally visible side effects. An example of a
nontransparent function is the lo ad delay slot made visible
due to pipeline effects. Generally, it is desirable to have trans-
parent implementations. Most DSP and VLIW implementa-
tions are not transparent and therefore the implementation
affects the architecture [3].
Execution predictability in DSP systems often precludes
the use of many general-purpose design techniques ( e.g.,
speculation, branch prediction, data caches, etc.). Instead,
classical DSP architectures have developed a unique set of
performance-enhancing techniques that are optimized for
their intended market. These techniques are characterized by
hardware that supports efficient filtering, such as the ability
to sustain three memory accesses per cycle (one instruction,
one coefficient, and one data access). Sophisticated address-
ing modes such as bit-reversed and modulo addressing may
also be provided. Multiple address units operate in parallel
with the datapath to sustain the execution of the inner kernel.
In classical DSP architectures, the execution pipelines
were visible to the programmer and necessarily shallow to
allow assembly language optimization. This progr amming
restriction encumbered implementations with tight timing
constraints for both arithmetic execution and memory ac-
cess. The key characteristic that separates modern DSP ar-
chitectures from classical DSP architectures is the focus on
compilability. Once the decision was made to focus the DSP
design on programmer productivity, other constraining de-
cisions could be relaxed. As a result, significantly longer
pipelines with multiple cycles to access memory and multi-
ple cycles to compute arithmetic operations could be utilized.
This has yielded higher clock frequencies and higher perfor-
mance DSPs.
In an attempt to exploit instruction level parallelism in-
herent in DSP applications, modern DSPs tend to use VLIW-
like execution packets. This is partly driven by real-time re-
quirements which require the worst-case execution time to
be minimized. This is in contrast with general purpose CPUs
which tend to minimize average execution times. With long
pipelines and multiple instruction issue, the difficulties of
attempting assembly language programming become appar-
ent. Controlling dependencies between upwards of 100 in-
flight instructions is not an easy task for a programmer. This
is exactly the area where a compiler excels.
One challenge of using some VLIW processors is large
program executables (code bloat) that result from inde-
pendently specifying e very operation with a single instruc-
tion. As an example, a VLIW processor with a 32-bit basic