IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO. 2, FEBRUARY 2016 769
Fault Tolerant Parallel FFTs Using Error Correction Codes and Parseval Checks
Zhen Gao, Pedro Reviriego, Zhan Xu, Xin Su, Ming Zhao, Jing Wang, and Juan Antonio Maestro
Abstract— Soft errors pose a reliability threat to modern electronic
circuits. This makes protection against soft errors a requirement for
many applications. Communications and signal processing systems are
no exceptions to this trend. For some applications, an interesting option
is to use algorithmic-based fault tolerance (ABFT) techniques that try
to exploit the algorithmic properties to detect and correct errors. Signal
processing and communication applications are well suited for ABFT. One
example is fast Fourier transforms (FFTs) that are a key building block in
many systems. Several protection schemes have been proposed to detect
and correct errors in FFTs. Among those, probably the use of the Parseval
or sum of squares check is the most widely known. In modern communi-
cation systems, it is increasingly common to find several blocks operating
in parallel. Recently, a technique that exploits this fact to implement fault
tolerance on parallel filters has been proposed. In this brief, this technique
is first applied to protect FFTs. Then, two improved protection schemes
that combine the use of error correction codes and Parseval checks are
proposed and evaluated. The results show that the proposed schemes can
further reduce the implementation cost of protection.
Index Terms— Error correction codes (ECCs), fast Fourier
transforms (FFTs), soft errors.
I. I
NTRODUCTION
The complexity of communications and signal processing
circuits increases every year. This is made possible by the CMOS
technology scaling that enables the integration of more and more
transistors on a single device. This increased complexity makes the
circuits more vulnerable to errors. At the same time, the scaling
means that transistors operate with lower voltages and are more sus-
ceptible to errors caused by noise and manufacturing variations [1].
The importance of radiation-induced soft errors also increases as
technology scales [2]. Soft errors can change the logical value of
a circuit node creating a temporary error that can affect the system
operation. To ensure that soft errors do not affect the operation of
a given circuit, a wide variety of techniques can be used [3]. These
include the use of special manufacturing processes for the integrated
circuits like, for example, the silicon on insulator. Another option
is to design basic circuit blocks or complete design libraries to
minimize the probability of soft errors. Finally, it is also possible
to add redundancy at the system level to detect and correct errors.
Manuscript received September 10, 2014; revised November 24, 2014 and
February 12, 2015; accepted February 26, 2015. Date of publication March 11,
2015; date of current version January 19, 2016. This work was supported in
part by the China’s 863 Plan Program under Grant 2012AA01A502, in part
by the Beijing Natural Science Foundation under Grant 4110001, in
part by the National Basic Research Program of China under
Grant 2012CB316000, in part by the National Natural Science Foundation of
China under Grant 61402044, and in part by the Spanish Ministry of Science
and Education under Grant AYA2009-13300-C03.
Z. Gao is with the School of Electronic Information Engineering, Tianjin
University, Tianjin 300072, China (e-mail: zgao@tju.edu.cn).
P. Reviriego and J. A. Maestro are with the Universidad Antonio de Nebrija,
Madrid E-28040, Spain (e-mail: previrie@nebrija.es; jmaestro@nebrija.es).
Z. Xu is with the School of Information and Communication Engineering,
Beijing Information Science and Technology University, Beijing 100085,
China (e-mail: xuzhan@tsinghua.edu.cn).
X. Su, M. Zhao, and J. Wang are with the Tsinghua National
Laboratory for Information Science and Technology, Tsinghua University,
Beijing 100084, China (e-mail: suxin@tsinghua.edu.cn; zhaoming@
tsinghua.edu.cn; wangj@tsinghua.edu.cn).
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TVLSI.2015.2408621
One classical example is the use of triple modular redundancy (TMR)
that triples a block and votes among the three outputs to detect
and correct errors. The main issue with those soft errors mitigation
techniques is that they require a large overhead in terms of circuit
implementation. For example, for TMR, the overhead is >200%.
This is because the unprotected module is replicated three times
(which requires a 200% overhead versus the unprotected module),
and additionally, voters are needed to correct the errors making the
overhead >200%. This overhead is excessive for many applications.
Another approach is to try to use the algorithmic properties of the
circuit to detect/correct errors. This is commonly referred to as
algorithm-based fault tolerance (ABFT) [4]. This strategy can reduce
the overhead required to protect a circuit.
Signal processing and communications circuits are well suited
for ABFT as they have regular structures and many algorithmic
properties [4]. Over the years, many ABFT techniques have been
proposed to protect the basic blocks that are commonly used in
those circuits. Several works have considered the protection of digital
filters [5], [6]. For example, the use of replication using reduced
precision copies of the filter has been proposed as an alternative to
TMR but with a lower cost [7]. The knowledge of the distribution
of the filter output has also been recently exploited to detect and
correct errors with lower overheads [8]. The protection of fast Fourier
transforms (FFTs) has also been widely studied [9], [10].
As signal-processing circuits become more complex, it is
common to find several filters or FFTs operating in parallel.
This occurs for example in filter banks [11] or in multiple-input
multiple-output (MIMO) communication systems [12]. In particular,
MIMO orthogonal frequency division modulation (MIMO-OFDM)
systems use parallel iFFTs/FFTs for modulation/demodulation [13].
MIMO-OFDM is implemented on long-term evolution mobile
systems [14] and also on WiMax [15]. The presence of parallel
filters or FFTs creates an opportunity to implement ABFT techniques
for the entire group of parallel modules instead of for each one
independently. This has been studied for digital filters initially
in [16] where two filters were considered. More recently, a general
scheme based on the use of error correction codes (ECCs) has been
proposed [17]. In this technique, the idea is that each filter can
be the equivalent of a bit in an ECC and parity check bits can be
computed using addition. This technique can be used for operations,
in which the output of the sum of several inputs is the sum of
the individual outputs. This is true for any linear operation as, for
example, the discrete Fourier transform (DFT).
In this brief, the protection of parallel FFTs is studied. In particular,
it is assumed that there can only be a single error on the system at any
given point in time. This is a common assumption when considering
the protection against radiation-induced soft errors [3]. There are
three main contributions in this brief.
1) The evaluation of the ECC technique [17] for the protection
of parallel FFTs showing its effectiveness in terms of overhead
and protection effectiveness.
2) The proposal of a new technique based on the use of Parseval or
sum of squares (SOSs) checks [4] combined with a parity FFT.
3) The proposal of a new technique on which the ECC is used
on the SOS checks instead of on the FFTs.
1063-8210 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.