FFT Algorithms

基-2,基-3,基-5混合基FFT

4星 · 超过85%的资源需积分: 24 108 浏览量更新于2023-07-15 1 收藏 230KB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

资源详情

资源推荐

FFT Algorithms

Brian Gough, bjg@network-theory.co.uk

May 1997

1 Introduction

Fast Fourier Transforms (FFTs) are eﬃcient algorithms for calculating the discrete fourier transform

(DFT),

= DFT(g

) (1)

N−1

b=0

exp(−2πiab/N) 0 ≤ a ≤ N − 1 (2)

N−1

b=0

= exp(−2πi/N) (3)

The DFT usually arises as an approximation to the continuous fourier transform when functions are

sampled at discrete intervals in space or time. The naive evaluation of the discrete fourier transform

is a matrix-vector multiplication W~g, and would take O(N

) operations for N data-points. The

general principle of the Fast Fourier Transform algorithms is to use a divide-and-conquer strategy

to factorize the matrix W into smaller sub-matrices, typically reducing the operation count to

O(N

) if N can be factorized into smaller integers, N = f

. . . f

This chapter explains the algorithms used in the GSL FFT routines and provides some infor-

mation on how to extend them. To learn more about the FFT you should read the review article

Fast Fourier Transforms: A Tutorial Review and A State of the Art by Duhamel and Vetterli [1].

There are several introductory books on the FFT with example programs, such as The Fast Fourier

Transform by Brigham [2] and DFT/FFT and Convolution Algorithms by Burrus and Parks [3]. In

1979 the IEEE published a compendium of carefully-reviewed Fortran FFT programs in Programs

for Digital Signal Processing [4] which is a useful reference for implementations of many diﬀerent

FFT algorithms. If you are interested in using DSPs then the Handbook of Real-Time Fast Fourier

Transforms [5] provides detailed information on the algorithms and hardware needed to design,

build and test DSP applications. Many FFT algorithms rely on results from number theory. These

results are covered in the books Fast transforms: algorithms, analyses, applications, by Elliott and

Rao [6], Fast Algorithms for Digital Signal Processing by Blahut [7] and Number Theory in Digital

Signal Processing by McClellan and Rader [8]. There is also an annotated bibliography of papers

on the FFT and related topics by Burrus [9].

2 Families of FFT algorithms

There are two main families of FFT algorithms: the Cooley-Tukey algorithm and the Prime Factor

algorithm. These diﬀer in the way they map the full FFT into smaller sub-transforms. Of the

Cooley-Tukey algorithms there are two types of routine in common use: mixed-radix (general-N)

algorithms and radix-2 (power of 2) algorithms. Each type of algorithm can be further classiﬁed

by additional characteristics, such as whether it operates in-place or uses additional scratch space,

whether its output is in a sorted or scrambled order, and whether it uses decimation-in-time or

-frequency iterations.

Mixed-radix algorithms work by factorizing the data vector into shorter lengths. These can then

be transformed by small-N FFTs. Typical programs include FFTs for small prime factors, such as

2, 3, 5, . . . which are highly optimized. The small-N FFT modules act as building blocks and can

be multiplied together to make longer transforms. By combining a reasonable set of modules it is

possible to compute FFTs of many diﬀerent lengths. If the small-N modules are supplemented by

an O(N

) general-N module then an FFT of any length can be computed, in principle. Of course,

any lengths which contain large prime factors would perform only as O(N

Radix-2 algorithms, or “power of two” algorithms, are simpliﬁed versions of the mixed-radix

algorithm. They are restricted to lengths which are a power of two. The basic radix-2 FFT module

only involves addition and subtraction, so the algorithms are very simple. Radix-2 algorithms have

been the subject of much research into optimizing the FFT. Many of the most eﬃcient radix-2

routines are based on the “split-radix” algorithm. This is actually a hybrid which combines the

best parts of both radix-2 and radix-4 (“power of 4”) algorithms [10, 11].

The prime factor algorithm (PFA) is an alternative form of general-N algorithm based on a

diﬀerent way of recombining small-N FFT modules [12, 13]. It has a very simple indexing scheme

which makes it attractive. However it only works in the case where all factors are mutually prime.

This requirement makes it more suitable as a specialized algorithm for given lengths.

2.1 FFTs of prime lengths

Large prime lengths cannot be handled eﬃciently by any of these algorithms. However it may still

possible to compute a DFT, by using results from number theory. Rader showed that it is possible

to convert a length-p FFT (where p is prime) into a convolution of length-(p − 1). There is a

simple identity between the convolution of length N and the FFT of the same length, so if p − 1

is easily factorizable this allows the convolution to be computed eﬃciently via the FFT. The idea

is presented in the original paper by Rader [14] (also reprinted in [8]), but for more details see the

theoretical books mentioned earlier.

2.2 Optimization

There is no such thing as the single fastest FFT algorithm. FFT algorithms involve a mixture of

ﬂoating point calculations, integer arithmetic and memory access. Each of these operations will

have diﬀerent relative speeds on diﬀerent platforms. The performance of an algorithm is a function

of the hardware it is implemented on. The goal of optimization is thus to choose the algorithm best

suited to the characteristics of a given hardware platform.

For example, the Winograd Fourier Transform (WFTA) is an algorithm which is designed to

reduce the number of ﬂoating point multiplications in the FFT. However, it does this at the expense

of using many more additions and data transfers than other algorithms. As a consequence the

WFTA might be a good candidate algorithm for machines where data transfers occupy a negligible

time relative to ﬂoating point arithmetic. However on most modern machines, where the speed of

data transfers is comparable to or slower than ﬂoating point operations, it would be outperformed

by an algorithm which used a better mix of operations (i.e. more ﬂoating point operations but

fewer data transfers).

For a study of this sort of eﬀect in detail, comparing the diﬀerent algorithms on diﬀerent plat-

forms consult the paper Eﬀects of Architecture Implementation on DFT Algorithm Performance

by Mehalic, Rustan and Route [15]. The paper was written in the early 1980’s and has data for

super- and mini-computers which you are unlikely to see today, except in a museum. However, the

methodology is still valid and it would be interesting to see similar results for present day computers.

3 FFT Concepts

Factorization is the key principle of the mixed-radix FFT divide-and-conquer strategy. If N can be

factorized into a product of n

integers,

N = f

...f

, (4)

then the FFT itself can be divided into smaller FFTs for each factor. More precisely, an FFT of

length N can be broken up into,

(N/f

) FFTs of length f

(N/f

) FFTs of length f

. . .

(N/f

) FFTs of length f

The total number of operations for these sub-operations will be O(N(f

+ f

+ ... + f

)). When

the factors of N are all small integers this will be substantially less than O(N

). For example,

when N is a power of 2 an FFT of length N = 2

can be reduced to mN/2 FFTs of length 2, or

O(N log

N) operations. Here is a demonstration which shows this:

We start with the full DFT,

N−1

b=0

= exp(−2πi/N) (5)

and split the sum into even and odd terms,

N/2−1

b=0

a(2b)

N/2−1

b=0

2b+1

a(2b+1)

. (6)

This converts the original DFT of length N into two DFTs of length N/2,

N/2−1

b=0

(N/2)

+ W

N/2−1

b=0

2b+1

(N/2)

(7)

The ﬁrst term is a DFT of the even elements of g. The second term is a DFT of the odd elements

of g, premultiplied by an exponential factor W

(known as a twiddle factor).

DFT(h) = DFT(g

even

) + W

DFT(g

odd

) (8)

By splitting the DFT into its even and odd parts we have reduced the operation count from N

(for a DFT of length N) to 2(N/2)

(for two DFTs of length N/2). The cost of the splitting is that

we need an additional O(N) operations to multiply by the twiddle factor W

and recombine the

two sums.

We can repeat the splitting procedure recursively log

N times until the full DFT is reduced to

DFTs of single terms. The DFT of a single value is just the identity operation, which costs nothing.

However since O(N) operations were needed at each stage to recombine the even and odd parts the

total number of operations to obtain the full DFT is O(N log

N). If we had used a length which

was a product of factors f

, f

, . . . we could have split the sum in a similar way. First we would

split terms corresponding to the factor f

, instead of the even and odd terms corresponding to a

factor of two. Then we would repeat this procedure for the subsequent factors. This would lead to

a ﬁnal operation count of O(N

This procedure gives some motivation for why the number of operations in a DFT can in principle

be reduced from O(N

) to O(N

). It does not give a good explanation of how to implement

the algorithm in practice which is what we shall do in the next section.

4 Radix-2 Algorithms

For radix-2 FFTs it is natural to write array indices in binary form because the length of the data

is a power of two. This is nicely explained in the article The FFT: Fourier Transforming One Bit at

a Time by P.B. Visscher [16]. A binary representation for indices is the key to deriving the simplest

eﬃcient radix-2 algorithms.

We can write an index b (0 ≤ b < 2

n−1

) in binary representation like this,

b = [b

n−1

. . . b

] = 2

n−1

+ . . . + 2b

+ b

. (9)

Each of the b

, b

, . . . , b

n−1

are the bits (either 0 or 1) of b.

Using this notation the original deﬁnition of the DFT can be rewritten as a sum over the bits

of b,

h(a) =

N−1

b=0

exp(−2πiab/N) (10)

to give an equivalent summation like this,

h([a

n−1

. . . a

]) =

. . .

n−1

g([b

n−1

. . . b

])W

(11)

where the bits of a are a = [a

n−1

. . . a

To reduce the number of operations in the sum we will use the periodicity of the exponential

term,

x+N

= W

. (12)

Most of the products ab in W

are greater than N. By making use of this periodicity they can all

be collapsed down into the range 0 . . . N − 1. This allows us to reduce the number of operations

by combining common terms, modulo N. Using this idea we can derive decimation-in-time or

decimation-in-frequency algorithms, depending on how we break the DFT summation down into

common terms. We’ll ﬁrst consider the decimation-in-time algorithm.

剩余36页未读，继续阅读

steven359258

粉丝: 0
资源: 1

会员权益专享

FFT算法详解

混合基快速傅里叶变换(2FFT+4fft)优化算法C++实现代码

FFT算法Matlab程序

fft algorithms brian gough

AT32 cmsis FFT

spfft.fft和np.fft.fft的异同

xf = np.fft.fft(fft_data) xfp = np.fft.fftfreq(len(fft_data), d=1 / sampling_rate) # fftfreq(window length,)

STM32使用FFT库做FFT正变换和FFT逆变换的代码

实数fft和复数fft

matlab fft和fft2

data_fft = np.fft.fft(data, fft_len, axis=0)

np.fft.fft

fft.*conj(fft)

延迟反馈DF型FFT架构 延迟转换DC型FFT架构 串行转换SC型FFT架构 三者对比

np.fft.fft与no.fft.ifft的区别

def fft(data): return np.fft.fft(data)

matlab中的fft2和fft

Cooley-Tukey FFT就是FFT吗

np.fft.fft如何使用

mit-bih arrhythmia FFT

会员权益专享

最新资源

延迟反馈DF型FFT架构延迟转换DC型FFT架构串行转换SC型FFT架构三者对比