没有合适的资源?快使用搜索试试~ 我知道了~
首页FFT Algorithms
FFT Algorithms
![](https://csdnimg.cn/release/wenkucmsfe/public/img/star.98a08eaa.png)
基-2,基-3,基-5混合基FFT算法介绍,作者Brian Gough。
资源详情
资源推荐
![](https://csdnimg.cn/release/download_crawler_static/1765562/bg1.jpg)
FFT Algorithms
Brian Gough, bjg@network-theory.co.uk
May 1997
1 Introduction
Fast Fourier Transforms (FFTs) are efficient algorithms for calculating the discrete fourier transform
(DFT),
h
a
= DFT(g
b
) (1)
=
N−1
X
b=0
g
b
exp(−2πiab/N) 0 ≤ a ≤ N − 1 (2)
=
N−1
X
b=0
g
b
W
ab
N
W
N
= exp(−2πi/N) (3)
The DFT usually arises as an approximation to the continuous fourier transform when functions are
sampled at discrete intervals in space or time. The naive evaluation of the discrete fourier transform
is a matrix-vector multiplication W~g, and would take O(N
2
) operations for N data-points. The
general principle of the Fast Fourier Transform algorithms is to use a divide-and-conquer strategy
to factorize the matrix W into smaller sub-matrices, typically reducing the operation count to
O(N
P
f
i
) if N can be factorized into smaller integers, N = f
1
f
2
. . . f
n
.
This chapter explains the algorithms used in the GSL FFT routines and provides some infor-
mation on how to extend them. To learn more about the FFT you should read the review article
Fast Fourier Transforms: A Tutorial Review and A State of the Art by Duhamel and Vetterli [1].
There are several introductory books on the FFT with example programs, such as The Fast Fourier
Transform by Brigham [2] and DFT/FFT and Convolution Algorithms by Burrus and Parks [3]. In
1979 the IEEE published a compendium of carefully-reviewed Fortran FFT programs in Programs
for Digital Signal Processing [4] which is a useful reference for implementations of many different
FFT algorithms. If you are interested in using DSPs then the Handbook of Real-Time Fast Fourier
Transforms [5] provides detailed information on the algorithms and hardware needed to design,
build and test DSP applications. Many FFT algorithms rely on results from number theory. These
results are covered in the books Fast transforms: algorithms, analyses, applications, by Elliott and
Rao [6], Fast Algorithms for Digital Signal Processing by Blahut [7] and Number Theory in Digital
Signal Processing by McClellan and Rader [8]. There is also an annotated bibliography of papers
on the FFT and related topics by Burrus [9].
2 Families of FFT algorithms
There are two main families of FFT algorithms: the Cooley-Tukey algorithm and the Prime Factor
algorithm. These differ in the way they map the full FFT into smaller sub-transforms. Of the
1
![](https://csdnimg.cn/release/download_crawler_static/1765562/bg2.jpg)
Cooley-Tukey algorithms there are two types of routine in common use: mixed-radix (general-N)
algorithms and radix-2 (power of 2) algorithms. Each type of algorithm can be further classified
by additional characteristics, such as whether it operates in-place or uses additional scratch space,
whether its output is in a sorted or scrambled order, and whether it uses decimation-in-time or
-frequency iterations.
Mixed-radix algorithms work by factorizing the data vector into shorter lengths. These can then
be transformed by small-N FFTs. Typical programs include FFTs for small prime factors, such as
2, 3, 5, . . . which are highly optimized. The small-N FFT modules act as building blocks and can
be multiplied together to make longer transforms. By combining a reasonable set of modules it is
possible to compute FFTs of many different lengths. If the small-N modules are supplemented by
an O(N
2
) general-N module then an FFT of any length can be computed, in principle. Of course,
any lengths which contain large prime factors would perform only as O(N
2
).
Radix-2 algorithms, or “power of two” algorithms, are simplified versions of the mixed-radix
algorithm. They are restricted to lengths which are a power of two. The basic radix-2 FFT module
only involves addition and subtraction, so the algorithms are very simple. Radix-2 algorithms have
been the subject of much research into optimizing the FFT. Many of the most efficient radix-2
routines are based on the “split-radix” algorithm. This is actually a hybrid which combines the
best parts of both radix-2 and radix-4 (“power of 4”) algorithms [10, 11].
The prime factor algorithm (PFA) is an alternative form of general-N algorithm based on a
different way of recombining small-N FFT modules [12, 13]. It has a very simple indexing scheme
which makes it attractive. However it only works in the case where all factors are mutually prime.
This requirement makes it more suitable as a specialized algorithm for given lengths.
2.1 FFTs of prime lengths
Large prime lengths cannot be handled efficiently by any of these algorithms. However it may still
possible to compute a DFT, by using results from number theory. Rader showed that it is possible
to convert a length-p FFT (where p is prime) into a convolution of length-(p − 1). There is a
simple identity between the convolution of length N and the FFT of the same length, so if p − 1
is easily factorizable this allows the convolution to be computed efficiently via the FFT. The idea
is presented in the original paper by Rader [14] (also reprinted in [8]), but for more details see the
theoretical books mentioned earlier.
2.2 Optimization
There is no such thing as the single fastest FFT algorithm. FFT algorithms involve a mixture of
floating point calculations, integer arithmetic and memory access. Each of these operations will
have different relative speeds on different platforms. The performance of an algorithm is a function
of the hardware it is implemented on. The goal of optimization is thus to choose the algorithm best
suited to the characteristics of a given hardware platform.
For example, the Winograd Fourier Transform (WFTA) is an algorithm which is designed to
reduce the number of floating point multiplications in the FFT. However, it does this at the expense
of using many more additions and data transfers than other algorithms. As a consequence the
WFTA might be a good candidate algorithm for machines where data transfers occupy a negligible
time relative to floating point arithmetic. However on most modern machines, where the speed of
data transfers is comparable to or slower than floating point operations, it would be outperformed
by an algorithm which used a better mix of operations (i.e. more floating point operations but
2
![](https://csdnimg.cn/release/download_crawler_static/1765562/bg3.jpg)
fewer data transfers).
For a study of this sort of effect in detail, comparing the different algorithms on different plat-
forms consult the paper Effects of Architecture Implementation on DFT Algorithm Performance
by Mehalic, Rustan and Route [15]. The paper was written in the early 1980’s and has data for
super- and mini-computers which you are unlikely to see today, except in a museum. However, the
methodology is still valid and it would be interesting to see similar results for present day computers.
3 FFT Concepts
Factorization is the key principle of the mixed-radix FFT divide-and-conquer strategy. If N can be
factorized into a product of n
f
integers,
N = f
1
f
2
...f
n
f
, (4)
then the FFT itself can be divided into smaller FFTs for each factor. More precisely, an FFT of
length N can be broken up into,
(N/f
1
) FFTs of length f
1
,
(N/f
2
) FFTs of length f
2
,
. . .
(N/f
n
f
) FFTs of length f
n
f
.
The total number of operations for these sub-operations will be O(N(f
1
+ f
2
+ ... + f
n
f
)). When
the factors of N are all small integers this will be substantially less than O(N
2
). For example,
when N is a power of 2 an FFT of length N = 2
m
can be reduced to mN/2 FFTs of length 2, or
O(N log
2
N) operations. Here is a demonstration which shows this:
We start with the full DFT,
h
a
=
N−1
X
b=0
g
b
W
ab
N
W
N
= exp(−2πi/N) (5)
and split the sum into even and odd terms,
=
N/2−1
X
b=0
g
2b
W
a(2b)
N
+
N/2−1
X
b=0
g
2b+1
W
a(2b+1)
N
. (6)
This converts the original DFT of length N into two DFTs of length N/2,
h
a
=
N/2−1
X
b=0
g
2b
W
ab
(N/2)
+ W
a
N
N/2−1
X
b=0
g
2b+1
W
ab
(N/2)
(7)
The first term is a DFT of the even elements of g. The second term is a DFT of the odd elements
of g, premultiplied by an exponential factor W
k
N
(known as a twiddle factor).
DFT(h) = DFT(g
even
) + W
k
N
DFT(g
odd
) (8)
By splitting the DFT into its even and odd parts we have reduced the operation count from N
2
(for a DFT of length N) to 2(N/2)
2
(for two DFTs of length N/2). The cost of the splitting is that
we need an additional O(N) operations to multiply by the twiddle factor W
k
N
and recombine the
two sums.
3
![](https://csdnimg.cn/release/download_crawler_static/1765562/bg4.jpg)
We can repeat the splitting procedure recursively log
2
N times until the full DFT is reduced to
DFTs of single terms. The DFT of a single value is just the identity operation, which costs nothing.
However since O(N) operations were needed at each stage to recombine the even and odd parts the
total number of operations to obtain the full DFT is O(N log
2
N). If we had used a length which
was a product of factors f
1
, f
2
, . . . we could have split the sum in a similar way. First we would
split terms corresponding to the factor f
1
, instead of the even and odd terms corresponding to a
factor of two. Then we would repeat this procedure for the subsequent factors. This would lead to
a final operation count of O(N
P
f
i
).
This procedure gives some motivation for why the number of operations in a DFT can in principle
be reduced from O(N
2
) to O(N
P
f
i
). It does not give a good explanation of how to implement
the algorithm in practice which is what we shall do in the next section.
4 Radix-2 Algorithms
For radix-2 FFTs it is natural to write array indices in binary form because the length of the data
is a power of two. This is nicely explained in the article The FFT: Fourier Transforming One Bit at
a Time by P.B. Visscher [16]. A binary representation for indices is the key to deriving the simplest
efficient radix-2 algorithms.
We can write an index b (0 ≤ b < 2
n−1
) in binary representation like this,
b = [b
n−1
. . . b
1
b
0
] = 2
n−1
b
n−1
+ . . . + 2b
1
+ b
0
. (9)
Each of the b
0
, b
1
, . . . , b
n−1
are the bits (either 0 or 1) of b.
Using this notation the original definition of the DFT can be rewritten as a sum over the bits
of b,
h(a) =
N−1
X
b=0
g
b
exp(−2πiab/N) (10)
to give an equivalent summation like this,
h([a
n−1
. . . a
1
a
0
]) =
1
X
b
0
=0
1
X
b
1
=0
. . .
1
X
b
n−1
=0
g([b
n−1
. . . b
1
b
0
])W
ab
N
(11)
where the bits of a are a = [a
n−1
. . . a
1
a
0
].
To reduce the number of operations in the sum we will use the periodicity of the exponential
term,
W
x+N
N
= W
x
N
. (12)
Most of the products ab in W
ab
N
are greater than N. By making use of this periodicity they can all
be collapsed down into the range 0 . . . N − 1. This allows us to reduce the number of operations
by combining common terms, modulo N. Using this idea we can derive decimation-in-time or
decimation-in-frequency algorithms, depending on how we break the DFT summation down into
common terms. We’ll first consider the decimation-in-time algorithm.
4
![](https://csdnimg.cn/release/download_crawler_static/1765562/bg5.jpg)
4.1 Radix-2 Decimation-in-Time (DIT)
To derive the the decimation-in-time algorithm we start by separating out the most significant bit
of the index b,
[b
n−1
. . . b
1
b
0
] = 2
n−1
b
n−1
+ [b
n−2
. . . b
1
b
0
] (13)
Now we can evaluate the innermost sum of the DFT without any dependence on the remaining bits
of b in the exponential,
h([a
n−1
. . . a
1
a
0
]) =
1
X
b
0
=0
1
X
b
1
=0
. . .
1
X
b
n−1
=0
g(b)W
a(2
n−1
b
n−1
+[b
n−2
...b
1
b
0
])
N
(14)
=
1
X
b
0
=0
1
X
b
1
=0
. . .
1
X
b
n−2
=0
W
a[b
n−2
...b
1
b
0
]
N
1
X
b
n−1
=0
g(b)W
a(2
n−1
b
n−1
)
N
(15)
Looking at the term W
a(2
n−1
b
n−1
)
N
we see that we can also remove most of the dependence on a as
well, by using the periodicity of the exponential,
W
a(2
n−1
b
n−1
)
N
= exp(−2πi[a
n−1
. . . a
1
a
0
]2
n−1
b
n−1
/2
n
) (16)
= exp(−2πi[a
n−1
. . . a
1
a
0
]b
n−1
/2) (17)
= exp(−2πi(2
n−2
a
n−1
+ . . . + a
1
+ (a
0
/2))b
n−1
) (18)
= exp(−2πia
0
b
n−1
/2) (19)
= W
a
0
b
n−1
2
(20)
Thus the innermost exponential term simplifies so that it only involves the highest order bit of b
and the lowest order bit of a, and the sum can be reduced to,
h([a
n−1
. . . a
1
a
0
]) =
1
X
b
0
=0
1
X
b
1
=0
. . .
1
X
b
n−2
=0
W
a[b
n−2
...b
1
b
0
]
N
1
X
b
n−1
=0
g(b)W
a
0
b
n−1
2
. (21)
We can repeat this this procedure for the next most significant bit of b, b
n−2
, using a similar identity,
W
a(2
n−2
b
n−2
)
N
= exp(−2πi[a
n−1
. . . a
1
a
0
]2
n−2
b
n−2
/2
n
) (22)
= W
[a
1
a
0
]b
n−2
4
. (23)
to give a formula with even less dependence on the bits of a,
h([a
n−1
. . . a
1
a
0
]) =
1
X
b
0
=0
1
X
b
1
=0
. . .
1
X
b
n−3
=0
W
a[b
n−3
...b
1
b
0
]
N
1
X
b
n−2
=0
W
[a
1
a
0
]b
n−2
4
1
X
b
n−1
=0
g(b)W
a
0
b
n−1
2
. (24)
If we repeat the process for all the remaining bits we obtain a simplified DFT formula which is the
basis of the radix-2 decimation-in-time algorithm,
h([a
n−1
. . . a
1
a
0
]) =
1
X
b
0
=0
W
[a
n−1
...a
1
a
0
]b
0
N
. . .
1
X
b
n−2
=0
W
[a
1
a
0
]b
n−2
4
1
X
b
n−1
=0
W
a
0
b
n−1
2
g(b) (25)
5
剩余36页未读,继续阅读
![doc](https://img-home.csdnimg.cn/images/20210720083327.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://profile-avatar.csdnimg.cn/default.jpg!1)
steven359258
- 粉丝: 0
- 资源: 1
上传资源 快速赚钱
我的内容管理 收起
我的资源 快来上传第一个资源
我的收益
登录查看自己的收益我的积分 登录查看自己的积分
我的C币 登录后查看C币余额
我的收藏
我的下载
下载帮助
![](https://csdnimg.cn/release/wenkucmsfe/public/img/voice.245cc511.png)
会员权益专享
最新资源
- 基于嵌入式ARMLinux的播放器的设计与实现 word格式.doc
- 经典:大学答辩通过_基于ARM微处理器的嵌入式指纹识别系统设计.pdf
- 嵌入式系统课程设计.doc
- 基于飞思卡尔控制器的智能寻迹车设计ARM基础课程课程设计.doc
- 下载基于ARM7的压电陶瓷换能器导纳圆测量仪的研制PDF格式可编辑.pdf
- 课程设计基于ARM的嵌入式家居监控系统的研究与设计.doc
- 论文基于嵌入式ARM的图像采集处理系统设计.doc
- 嵌入式基于ARM9的中断驱动程序设计—课程设计.doc
- 在Linux系统下基于ARM嵌入式的俄罗斯方块.doc
- STK-MirrorStore Product Release Notes(96130)-44
- STK-MirrorStore Storage Connectivity Guide for StorageTek Disk A
- 龙虾养殖远程监控系统的设计与实现数据采集上位-机软件模块-本科毕业设计.doc
- 龙虾养殖远程监控系统的设计与实现数据采集上位-机软件模块-.doc
- 龙虾养殖远程监控系统的设计与实现数据采集上位-机软件模块-本科生毕业论文.doc
- 麻阳风貌展示网站的设计与实现毕业论文.pdf
- 高速走丝气中电火花线切割精加工编程设计.doc
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
![](https://img-home.csdnimg.cn/images/20220527035711.png)
![](https://img-home.csdnimg.cn/images/20220527035711.png)
![](https://img-home.csdnimg.cn/images/20220527035111.png)
安全验证
文档复制为VIP权益,开通VIP直接复制
![](https://csdnimg.cn/release/wenkucmsfe/public/img/green-success.6a4acb44.png)