Architectural Implications for SIMD Processors in
the Wireless Communication Domain
Yaohua Wang, Kai Zhang, Jianghua Wan, Sheng Liu, Xi Ning, Shuming Chen
School of Computer, National University of Defence Technology
410073 Changsha, P.R.China, smchen@163.com
Abstract—To further improve the performance of SIMD (Sin-
gle Instruction Multiple Data) architectures, which are widely
used in the wireless communication domain. The main com-
ponents of Long Term Evolution (LTE) protocol are analyzed.
Performance investigation is taken on a cycle-accurate simulator,
featuring the main characteristics of existing SIMD architectures.
Based on the investigation, three insightful architectural implica-
tions, including the concurrent execution of scalar and parallel
processing, multiple sub-matrixes accessible matrix register file,
and bidirectional shuffle unit are proposed. The experiment
result shows that an average of 30% performance gain can be
achieved by the SIMD architecture enhanced with the proposed
implications. The hardware cost of these implications is also
discussed.
I. INTRODUCTION
The abundant amount of parallelism, existed in wireless
communication applications, makes the SIMD (Single Instruc-
tion Multiple Data) scheme to be the prevailing architectures
for wireless communication processing. Examples include
the stream processors like Imagine[1]. Signal processing and
vector processors like SODA[2] and AnySP[3] also employ
this scheme. The SIMD architecture amortizes the control
overhead across multiple SIMD lanes with an identical control
flow, achieving high power-efficiency. What’s more, much
wider SIMD architectures are proposed with the development
of the VLSI technology, leading to a further improvement of
performance.
Although the high performance of SIMD architectures is
attracting, we should notice that great challenges are still
existing in current wireless communication processing. On
the one hand, the development of mobile signal processing
platforms put much more stringent power constraints. On the
other hand, the evolution of wireless communication protocol
brings an sharp increase of computation requirement. These
challenges put forward urgently demand of new techniques
other than simple scaling of existing resources in SIMD
architectures.
To efficiently solve the problem and provide insightful
architectural implications for existing SIMD architectures, a
deep investigation is carried out on the SIMD architecture.
We choose the widely used LTE[4] wireless communication
protocol as our target application. The main characteristics
of key application kernels in the LTE protocol are analyzed.
Performance evaluations of these kernels are carried out on a
cycle accurate simulator, featuring the main characteristics of
Fig. 1. The main components in the physical layer of LTE.
existing SIMD architectures. The evaluation reveals the under-
utilization problem, the lack of efficient support for commu-
nication, and data alignment overhead in SIMD architectures.
Based on these observations of SIMD architectures, three
insightful architectural implications, including the Concurrent
Execution of Scalar and Parallel processing (CEoSP), the
Multiple Sub-matrixes accessible MRF (MS-MRF), and the
bidirectional shuffle unit (BiShuffle), are proposed. Implemen-
tations of these implications are built into the simulator of
the SIMD architecture, and an average performance gain of
30% is achieved. The approximate hardware overhead is also
discussed.
II. E
MBEDDED MOBILE SINGLE PROCESSING OVERVIEW
Fig. 1 lists the major component of the physical layer in
the LTE protocol. Channel encode/decode (Channel Enc/Dec)
conducts the forward error correction. Then, the modula-
tion/demodulation (Mod/De-Mod) phase converts data se-
quences between real data and complex-valued modulation
symbols. After that, the Interleaving/De-interleaving (Inter/De-
Inter) is used to randomize the sequence of symbols. MIMO
(Multiple Input Multiple Output) encoding scheme then mul-
tiplexes the signals over multiple antennae. The receiver re-
quires an estimation of channel conditions (Channel Est) based
on the pilot signals in the corresponding receiving process. The
estimated channel matrix is then used in the MIMO decoding
phase to recover the data being transmitted. To transmit in the
physical channel, signals have to be mapped/de-mapped (RE
Map/De-Map) to/from the resource grid, and then IFFT/FFT
is used for generating/recovering the orthogonal frequency
division multiplexing (OFDM) signals.
As shown in Fig. 1, the processing before modulation
belongs to the bit-level processing[5], which is supposed to be
implemented by special hardware. However, for the receiving
process, the Demodulation does not belong to the bit-level
2012 IEEE 14th International Conference on High Performance Computing and Communications
978-0-7695-4749-7/12 $26.00 © 2012 IEEE
DOI 10.1109/HPCC.2012.176
1191
2012 IEEE 14th International Conference on High Performance Computing and Communications
978-0-7695-4749-7/12 $26.00 © 2012 IEEE
DOI 10.1109/HPCC.2012.176
1199