没有合适的资源?快使用搜索试试~ 我知道了~
首页FFT Algorithms
FFT Algorithms
4星 · 超过85%的资源 需积分: 24 30 下载量 125 浏览量
更新于2023-07-15
1
收藏 230KB PDF 举报
基-2,基-3,基-5混合基FFT算法介绍,作者Brian Gough。
资源详情
资源推荐
FFT Algorithms
Brian Gough, bjg@network-theory.co.uk
May 1997
1 Introduction
Fast Fourier Transforms (FFTs) are efficient algorithms for calculating the discrete fourier transform
(DFT),
h
a
= DFT(g
b
) (1)
=
N−1
X
b=0
g
b
exp(−2πiab/N) 0 ≤ a ≤ N − 1 (2)
=
N−1
X
b=0
g
b
W
ab
N
W
N
= exp(−2πi/N) (3)
The DFT usually arises as an approximation to the continuous fourier transform when functions are
sampled at discrete intervals in space or time. The naive evaluation of the discrete fourier transform
is a matrix-vector multiplication W~g, and would take O(N
2
) operations for N data-points. The
general principle of the Fast Fourier Transform algorithms is to use a divide-and-conquer strategy
to factorize the matrix W into smaller sub-matrices, typically reducing the operation count to
O(N
P
f
i
) if N can be factorized into smaller integers, N = f
1
f
2
. . . f
n
.
This chapter explains the algorithms used in the GSL FFT routines and provides some infor-
mation on how to extend them. To learn more about the FFT you should read the review article
Fast Fourier Transforms: A Tutorial Review and A State of the Art by Duhamel and Vetterli [1].
There are several introductory books on the FFT with example programs, such as The Fast Fourier
Transform by Brigham [2] and DFT/FFT and Convolution Algorithms by Burrus and Parks [3]. In
1979 the IEEE published a compendium of carefully-reviewed Fortran FFT programs in Programs
for Digital Signal Processing [4] which is a useful reference for implementations of many different
FFT algorithms. If you are interested in using DSPs then the Handbook of Real-Time Fast Fourier
Transforms [5] provides detailed information on the algorithms and hardware needed to design,
build and test DSP applications. Many FFT algorithms rely on results from number theory. These
results are covered in the books Fast transforms: algorithms, analyses, applications, by Elliott and
Rao [6], Fast Algorithms for Digital Signal Processing by Blahut [7] and Number Theory in Digital
Signal Processing by McClellan and Rader [8]. There is also an annotated bibliography of papers
on the FFT and related topics by Burrus [9].
2 Families of FFT algorithms
There are two main families of FFT algorithms: the Cooley-Tukey algorithm and the Prime Factor
algorithm. These differ in the way they map the full FFT into smaller sub-transforms. Of the
1
Cooley-Tukey algorithms there are two types of routine in common use: mixed-radix (general-N)
algorithms and radix-2 (power of 2) algorithms. Each type of algorithm can be further classified
by additional characteristics, such as whether it operates in-place or uses additional scratch space,
whether its output is in a sorted or scrambled order, and whether it uses decimation-in-time or
-frequency iterations.
Mixed-radix algorithms work by factorizing the data vector into shorter lengths. These can then
be transformed by small-N FFTs. Typical programs include FFTs for small prime factors, such as
2, 3, 5, . . . which are highly optimized. The small-N FFT modules act as building blocks and can
be multiplied together to make longer transforms. By combining a reasonable set of modules it is
possible to compute FFTs of many different lengths. If the small-N modules are supplemented by
an O(N
2
) general-N module then an FFT of any length can be computed, in principle. Of course,
any lengths which contain large prime factors would perform only as O(N
2
).
Radix-2 algorithms, or “power of two” algorithms, are simplified versions of the mixed-radix
algorithm. They are restricted to lengths which are a power of two. The basic radix-2 FFT module
only involves addition and subtraction, so the algorithms are very simple. Radix-2 algorithms have
been the subject of much research into optimizing the FFT. Many of the most efficient radix-2
routines are based on the “split-radix” algorithm. This is actually a hybrid which combines the
best parts of both radix-2 and radix-4 (“power of 4”) algorithms [10, 11].
The prime factor algorithm (PFA) is an alternative form of general-N algorithm based on a
different way of recombining small-N FFT modules [12, 13]. It has a very simple indexing scheme
which makes it attractive. However it only works in the case where all factors are mutually prime.
This requirement makes it more suitable as a specialized algorithm for given lengths.
2.1 FFTs of prime lengths
Large prime lengths cannot be handled efficiently by any of these algorithms. However it may still
possible to compute a DFT, by using results from number theory. Rader showed that it is possible
to convert a length-p FFT (where p is prime) into a convolution of length-(p − 1). There is a
simple identity between the convolution of length N and the FFT of the same length, so if p − 1
is easily factorizable this allows the convolution to be computed efficiently via the FFT. The idea
is presented in the original paper by Rader [14] (also reprinted in [8]), but for more details see the
theoretical books mentioned earlier.
2.2 Optimization
There is no such thing as the single fastest FFT algorithm. FFT algorithms involve a mixture of
floating point calculations, integer arithmetic and memory access. Each of these operations will
have different relative speeds on different platforms. The performance of an algorithm is a function
of the hardware it is implemented on. The goal of optimization is thus to choose the algorithm best
suited to the characteristics of a given hardware platform.
For example, the Winograd Fourier Transform (WFTA) is an algorithm which is designed to
reduce the number of floating point multiplications in the FFT. However, it does this at the expense
of using many more additions and data transfers than other algorithms. As a consequence the
WFTA might be a good candidate algorithm for machines where data transfers occupy a negligible
time relative to floating point arithmetic. However on most modern machines, where the speed of
data transfers is comparable to or slower than floating point operations, it would be outperformed
by an algorithm which used a better mix of operations (i.e. more floating point operations but
2
fewer data transfers).
For a study of this sort of effect in detail, comparing the different algorithms on different plat-
forms consult the paper Effects of Architecture Implementation on DFT Algorithm Performance
by Mehalic, Rustan and Route [15]. The paper was written in the early 1980’s and has data for
super- and mini-computers which you are unlikely to see today, except in a museum. However, the
methodology is still valid and it would be interesting to see similar results for present day computers.
3 FFT Concepts
Factorization is the key principle of the mixed-radix FFT divide-and-conquer strategy. If N can be
factorized into a product of n
f
integers,
N = f
1
f
2
...f
n
f
, (4)
then the FFT itself can be divided into smaller FFTs for each factor. More precisely, an FFT of
length N can be broken up into,
(N/f
1
) FFTs of length f
1
,
(N/f
2
) FFTs of length f
2
,
. . .
(N/f
n
f
) FFTs of length f
n
f
.
The total number of operations for these sub-operations will be O(N(f
1
+ f
2
+ ... + f
n
f
)). When
the factors of N are all small integers this will be substantially less than O(N
2
). For example,
when N is a power of 2 an FFT of length N = 2
m
can be reduced to mN/2 FFTs of length 2, or
O(N log
2
N) operations. Here is a demonstration which shows this:
We start with the full DFT,
h
a
=
N−1
X
b=0
g
b
W
ab
N
W
N
= exp(−2πi/N) (5)
and split the sum into even and odd terms,
=
N/2−1
X
b=0
g
2b
W
a(2b)
N
+
N/2−1
X
b=0
g
2b+1
W
a(2b+1)
N
. (6)
This converts the original DFT of length N into two DFTs of length N/2,
h
a
=
N/2−1
X
b=0
g
2b
W
ab
(N/2)
+ W
a
N
N/2−1
X
b=0
g
2b+1
W
ab
(N/2)
(7)
The first term is a DFT of the even elements of g. The second term is a DFT of the odd elements
of g, premultiplied by an exponential factor W
k
N
(known as a twiddle factor).
DFT(h) = DFT(g
even
) + W
k
N
DFT(g
odd
) (8)
By splitting the DFT into its even and odd parts we have reduced the operation count from N
2
(for a DFT of length N) to 2(N/2)
2
(for two DFTs of length N/2). The cost of the splitting is that
we need an additional O(N) operations to multiply by the twiddle factor W
k
N
and recombine the
two sums.
3
We can repeat the splitting procedure recursively log
2
N times until the full DFT is reduced to
DFTs of single terms. The DFT of a single value is just the identity operation, which costs nothing.
However since O(N) operations were needed at each stage to recombine the even and odd parts the
total number of operations to obtain the full DFT is O(N log
2
N). If we had used a length which
was a product of factors f
1
, f
2
, . . . we could have split the sum in a similar way. First we would
split terms corresponding to the factor f
1
, instead of the even and odd terms corresponding to a
factor of two. Then we would repeat this procedure for the subsequent factors. This would lead to
a final operation count of O(N
P
f
i
).
This procedure gives some motivation for why the number of operations in a DFT can in principle
be reduced from O(N
2
) to O(N
P
f
i
). It does not give a good explanation of how to implement
the algorithm in practice which is what we shall do in the next section.
4 Radix-2 Algorithms
For radix-2 FFTs it is natural to write array indices in binary form because the length of the data
is a power of two. This is nicely explained in the article The FFT: Fourier Transforming One Bit at
a Time by P.B. Visscher [16]. A binary representation for indices is the key to deriving the simplest
efficient radix-2 algorithms.
We can write an index b (0 ≤ b < 2
n−1
) in binary representation like this,
b = [b
n−1
. . . b
1
b
0
] = 2
n−1
b
n−1
+ . . . + 2b
1
+ b
0
. (9)
Each of the b
0
, b
1
, . . . , b
n−1
are the bits (either 0 or 1) of b.
Using this notation the original definition of the DFT can be rewritten as a sum over the bits
of b,
h(a) =
N−1
X
b=0
g
b
exp(−2πiab/N) (10)
to give an equivalent summation like this,
h([a
n−1
. . . a
1
a
0
]) =
1
X
b
0
=0
1
X
b
1
=0
. . .
1
X
b
n−1
=0
g([b
n−1
. . . b
1
b
0
])W
ab
N
(11)
where the bits of a are a = [a
n−1
. . . a
1
a
0
].
To reduce the number of operations in the sum we will use the periodicity of the exponential
term,
W
x+N
N
= W
x
N
. (12)
Most of the products ab in W
ab
N
are greater than N. By making use of this periodicity they can all
be collapsed down into the range 0 . . . N − 1. This allows us to reduce the number of operations
by combining common terms, modulo N. Using this idea we can derive decimation-in-time or
decimation-in-frequency algorithms, depending on how we break the DFT summation down into
common terms. We’ll first consider the decimation-in-time algorithm.
4
4.1 Radix-2 Decimation-in-Time (DIT)
To derive the the decimation-in-time algorithm we start by separating out the most significant bit
of the index b,
[b
n−1
. . . b
1
b
0
] = 2
n−1
b
n−1
+ [b
n−2
. . . b
1
b
0
] (13)
Now we can evaluate the innermost sum of the DFT without any dependence on the remaining bits
of b in the exponential,
h([a
n−1
. . . a
1
a
0
]) =
1
X
b
0
=0
1
X
b
1
=0
. . .
1
X
b
n−1
=0
g(b)W
a(2
n−1
b
n−1
+[b
n−2
...b
1
b
0
])
N
(14)
=
1
X
b
0
=0
1
X
b
1
=0
. . .
1
X
b
n−2
=0
W
a[b
n−2
...b
1
b
0
]
N
1
X
b
n−1
=0
g(b)W
a(2
n−1
b
n−1
)
N
(15)
Looking at the term W
a(2
n−1
b
n−1
)
N
we see that we can also remove most of the dependence on a as
well, by using the periodicity of the exponential,
W
a(2
n−1
b
n−1
)
N
= exp(−2πi[a
n−1
. . . a
1
a
0
]2
n−1
b
n−1
/2
n
) (16)
= exp(−2πi[a
n−1
. . . a
1
a
0
]b
n−1
/2) (17)
= exp(−2πi(2
n−2
a
n−1
+ . . . + a
1
+ (a
0
/2))b
n−1
) (18)
= exp(−2πia
0
b
n−1
/2) (19)
= W
a
0
b
n−1
2
(20)
Thus the innermost exponential term simplifies so that it only involves the highest order bit of b
and the lowest order bit of a, and the sum can be reduced to,
h([a
n−1
. . . a
1
a
0
]) =
1
X
b
0
=0
1
X
b
1
=0
. . .
1
X
b
n−2
=0
W
a[b
n−2
...b
1
b
0
]
N
1
X
b
n−1
=0
g(b)W
a
0
b
n−1
2
. (21)
We can repeat this this procedure for the next most significant bit of b, b
n−2
, using a similar identity,
W
a(2
n−2
b
n−2
)
N
= exp(−2πi[a
n−1
. . . a
1
a
0
]2
n−2
b
n−2
/2
n
) (22)
= W
[a
1
a
0
]b
n−2
4
. (23)
to give a formula with even less dependence on the bits of a,
h([a
n−1
. . . a
1
a
0
]) =
1
X
b
0
=0
1
X
b
1
=0
. . .
1
X
b
n−3
=0
W
a[b
n−3
...b
1
b
0
]
N
1
X
b
n−2
=0
W
[a
1
a
0
]b
n−2
4
1
X
b
n−1
=0
g(b)W
a
0
b
n−1
2
. (24)
If we repeat the process for all the remaining bits we obtain a simplified DFT formula which is the
basis of the radix-2 decimation-in-time algorithm,
h([a
n−1
. . . a
1
a
0
]) =
1
X
b
0
=0
W
[a
n−1
...a
1
a
0
]b
0
N
. . .
1
X
b
n−2
=0
W
[a
1
a
0
]b
n−2
4
1
X
b
n−1
=0
W
a
0
b
n−1
2
g(b) (25)
5
剩余36页未读,继续阅读
steven359258
- 粉丝: 0
- 资源: 1
上传资源 快速赚钱
- 我的内容管理 收起
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
会员权益专享
最新资源
- 保险服务门店新年工作计划PPT.pptx
- 车辆安全工作计划PPT.pptx
- ipqc工作总结PPT.pptx
- 车间员工上半年工作总结PPT.pptx
- 保险公司员工的工作总结PPT.pptx
- 报价工作总结PPT.pptx
- 冲压车间实习工作总结PPT.pptx
- ktv周工作总结PPT.pptx
- 保育院总务工作计划PPT.pptx
- xx年度现代教育技术工作总结PPT.pptx
- 出纳的年终总结PPT.pptx
- 贝贝班班级工作计划PPT.pptx
- 变电值班员技术个人工作总结PPT.pptx
- 大学生读书活动策划书PPT.pptx
- 财务出纳月工作总结PPT.pptx
- 大学生“三支一扶”服务期满工作总结(2)PPT.pptx
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功