并行FFT容错技术：错误校正码与Parseval检查

45 浏览量更新于2024-08-27 收藏 1.26MB PDF 举报

"这篇研究论文探讨了在现代电子电路中，由于软错误导致的可靠性问题，并提出了使用错误校正码和Parseval检查实现容错并行快速傅里叶变换（FFT）的方法。" 在电子系统中，特别是大规模集成电路（VLSI）系统，软错误已经成为一个不容忽视的可靠性威胁。这使得对于许多应用来说，抵御软错误成为了一项必要需求。通信和信号处理系统也不例外。算法基础的容错技术（ABFT）为这类问题提供了一个有趣的选择，它试图利用算法本身的特性来检测和纠正错误。在许多系统中作为关键模块的快速傅里叶变换（FFT），特别适合采用ABFT方法。 FFT在信号处理和通信应用中起着至关重要的作用。为了保护FFT免受错误影响，已经提出了一些检测和纠正错误的策略。其中，Parseval定理或平方和检查可能是最广为人知的一种。Parseval定理是傅里叶变换的一个重要性质，它将信号在时域和频域的能量保持一致，因此可以用来验证变换的正确性。在现代通信系统中，经常会出现多个处理块并行工作的情况。为此，论文提出了结合错误校正码和Parseval检查的容错并行FFT技术。错误校正码，如奇偶校验码或涡轮码，可以在数据传输或处理过程中检测和纠正错误。当多个FFT单元并行运行时，这种方法能有效地提高系统的整体鲁棒性。具体实现中，每个FFT单元不仅会执行常规的傅里叶变换，还会添加额外的校验步骤，例如通过Parseval定理计算输入和输出的总能量。如果检测到能量不匹配，就表明可能发生了错误，然后使用错误校正码来定位和修复这些错误。这种方法既提高了错误检测的效率，又减少了对硬件冗余的需求，因此在资源有限的嵌入式系统中特别有价值。这篇论文详细介绍了如何利用错误校正码和Parseval检查来构建容错并行FFT系统，这对于提升现代通信和信号处理系统的可靠性具有重要意义。这种方法能够适应并行处理的挑战，同时确保在软错误环境中的正确运行，对于未来设计更可靠的高性能系统提供了新的思路。

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO. 2, FEBRUARY 2016 769

Fault Tolerant Parallel FFTs Using Error Correction Codes and Parseval Checks

Zhen Gao, Pedro Reviriego, Zhan Xu, Xin Su, Ming Zhao, Jing Wang, and Juan Antonio Maestro

Abstract— Soft errors pose a reliability threat to modern electronic

circuits. This makes protection against soft errors a requirement for

many applications. Communications and signal processing systems are

no exceptions to this trend. For some applications, an interesting option

is to use algorithmic-based fault tolerance (ABFT) techniques that try

to exploit the algorithmic properties to detect and correct errors. Signal

processing and communication applications are well suited for ABFT. One

example is fast Fourier transforms (FFTs) that are a key building block in

many systems. Several protection schemes have been proposed to detect

and correct errors in FFTs. Among those, probably the use of the Parseval

or sum of squares check is the most widely known. In modern communi-

cation systems, it is increasingly common to ﬁnd several blocks operating

in parallel. Recently, a technique that exploits this fact to implement fault

tolerance on parallel ﬁlters has been proposed. In this brief, this technique

is ﬁrst applied to protect FFTs. Then, two improved protection schemes

that combine the use of error correction codes and Parseval checks are

proposed and evaluated. The results show that the proposed schemes can

further reduce the implementation cost of protection.

Index Terms— Error correction codes (ECCs), fast Fourier

transforms (FFTs), soft errors.

I. I

NTRODUCTION

The complexity of communications and signal processing

circuits increases every year. This is made possible by the CMOS

technology scaling that enables the integration of more and more

transistors on a single device. This increased complexity makes the

circuits more vulnerable to errors. At the same time, the scaling

means that transistors operate with lower voltages and are more sus-

ceptible to errors caused by noise and manufacturing variations [1].

The importance of radiation-induced soft errors also increases as

technology scales [2]. Soft errors can change the logical value of

a circuit node creating a temporary error that can affect the system

operation. To ensure that soft errors do not affect the operation of

a given circuit, a wide variety of techniques can be used [3]. These

include the use of special manufacturing processes for the integrated

circuits like, for example, the silicon on insulator. Another option

is to design basic circuit blocks or complete design libraries to

minimize the probability of soft errors. Finally, it is also possible

to add redundancy at the system level to detect and correct errors.

Manuscript received September 10, 2014; revised November 24, 2014 and

February 12, 2015; accepted February 26, 2015. Date of publication March 11,

2015; date of current version January 19, 2016. This work was supported in

part by the China’s 863 Plan Program under Grant 2012AA01A502, in part

by the Beijing Natural Science Foundation under Grant 4110001, in

part by the National Basic Research Program of China under

Grant 2012CB316000, in part by the National Natural Science Foundation of

China under Grant 61402044, and in part by the Spanish Ministry of Science

and Education under Grant AYA2009-13300-C03.

Z. Gao is with the School of Electronic Information Engineering, Tianjin

University, Tianjin 300072, China (e-mail: zgao@tju.edu.cn).

P. Reviriego and J. A. Maestro are with the Universidad Antonio de Nebrija,

Madrid E-28040, Spain (e-mail: previrie@nebrija.es; jmaestro@nebrija.es).

Z. Xu is with the School of Information and Communication Engineering,

Beijing Information Science and Technology University, Beijing 100085,

China (e-mail: xuzhan@tsinghua.edu.cn).

X. Su, M. Zhao, and J. Wang are with the Tsinghua National

Laboratory for Information Science and Technology, Tsinghua University,

Beijing 100084, China (e-mail: suxin@tsinghua.edu.cn; zhaoming@

tsinghua.edu.cn; wangj@tsinghua.edu.cn).

Color versions of one or more of the ﬁgures in this paper are available

online at http://ieeexplore.ieee.org.

Digital Object Identiﬁer 10.1109/TVLSI.2015.2408621

One classical example is the use of triple modular redundancy (TMR)

that triples a block and votes among the three outputs to detect

and correct errors. The main issue with those soft errors mitigation

techniques is that they require a large overhead in terms of circuit

implementation. For example, for TMR, the overhead is >200%.

This is because the unprotected module is replicated three times

(which requires a 200% overhead versus the unprotected module),

and additionally, voters are needed to correct the errors making the

overhead >200%. This overhead is excessive for many applications.

Another approach is to try to use the algorithmic properties of the

circuit to detect/correct errors. This is commonly referred to as

algorithm-based fault tolerance (ABFT) [4]. This strategy can reduce

the overhead required to protect a circuit.

Signal processing and communications circuits are well suited

for ABFT as they have regular structures and many algorithmic

properties [4]. Over the years, many ABFT techniques have been

proposed to protect the basic blocks that are commonly used in

those circuits. Several works have considered the protection of digital

ﬁlters [5], [6]. For example, the use of replication using reduced

precision copies of the ﬁlter has been proposed as an alternative to

TMR but with a lower cost [7]. The knowledge of the distribution

of the ﬁlter output has also been recently exploited to detect and

correct errors with lower overheads [8]. The protection of fast Fourier

transforms (FFTs) has also been widely studied [9], [10].

As signal-processing circuits become more complex, it is

common to ﬁnd several ﬁlters or FFTs operating in parallel.

This occurs for example in ﬁlter banks [11] or in multiple-input

multiple-output (MIMO) communication systems [12]. In particular,

MIMO orthogonal frequency division modulation (MIMO-OFDM)

systems use parallel iFFTs/FFTs for modulation/demodulation [13].

MIMO-OFDM is implemented on long-term evolution mobile

systems [14] and also on WiMax [15]. The presence of parallel

ﬁlters or FFTs creates an opportunity to implement ABFT techniques

for the entire group of parallel modules instead of for each one

independently. This has been studied for digital ﬁlters initially

in [16] where two ﬁlters were considered. More recently, a general

scheme based on the use of error correction codes (ECCs) has been

proposed [17]. In this technique, the idea is that each ﬁlter can

be the equivalent of a bit in an ECC and parity check bits can be

computed using addition. This technique can be used for operations,

in which the output of the sum of several inputs is the sum of

the individual outputs. This is true for any linear operation as, for

example, the discrete Fourier transform (DFT).

In this brief, the protection of parallel FFTs is studied. In particular,

it is assumed that there can only be a single error on the system at any

given point in time. This is a common assumption when considering

the protection against radiation-induced soft errors [3]. There are

three main contributions in this brief.

1) The evaluation of the ECC technique [17] for the protection

of parallel FFTs showing its effectiveness in terms of overhead

and protection effectiveness.

2) The proposal of a new technique based on the use of Parseval or

sum of squares (SOSs) checks [4] combined with a parity FFT.

3) The proposal of a new technique on which the ECC is used

on the SOS checks instead of on the FFTs.

See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

下载后可阅读完整内容，剩余4页未读，立即下载

weixin_38742951

粉丝: 16
资源: 938

并行FFT容错技术：错误校正码与Parseval检查

Flink流处理框架的核心执行与Fault Tolerant机制解析

Flink核心框架解析：执行流程与Fault Tolerant机制

VLSI设计的容错技术研究与应用分析

fault tolerant systems

Fault tolerant systems design

西门子_Safe and Fault Tolerant Controllers.pdf

Fault Tolerant Attitude Control Design for Rigid Satellite Using Sliding Mode Observer Technique

Electromagnetic Performance Analysis of Axial Field Flux-Switching Fault-Tolerant Machine Using Equivalent Magnetic Circuit Method

分布式email系统 完美fault tolerant

A Survey of Fault Tolerant Methodologies for FPGAs

最新资源

分布式email系统完美fault tolerant