低延迟低误差浮点TCORDIC算法：解决正弦/余弦函数高效实现

34 浏览量更新于2024-08-26 收藏 3.33MB PDF 举报

本文主要探讨了一种名为"基于低延迟和低误差浮点正弦/余弦函数的TCORDIC算法"的研究论文，发表在2017年4月的IEEE Transactions on Circuits and Systems-I: Regular Papers, Vol. 64, No. 4上。TCORDIC算法（Trigonometric Co-ordinate Rotation Digital Computer）是一种广泛应用于信号处理和数字信号处理器（DSP）中的高效算法，用于计算正弦和余弦函数。然而，传统的TCORDIC算法存在延迟较高和精度在输入角度接近0或π/2时显著下降的问题。为了改进这些问题，论文提出了一种结合了低延迟CORDIC和泰勒算法的新型TCORDIC方法。首先，通过引入sign prediction（符号预测）技术，该算法减少了迭代次数，从而降低了延迟。同时，采用compressive iterations（压缩迭代）和parallel iterations（并行迭代）进一步提升了计算效率，使得算法在保持性能的同时缩短了执行时间。论文的核心创新在于设计了一个决定何时切换到泰勒算法的计算边界N，这个参数旨在平衡算法的面积和延迟。通过优化这个界限，算法能够在有限的硬件资源下提供更好的性能。此外，文中还提到了truncated multipliers（截断乘法器）的应用，这种方法可以有效地减少电路面积，进一步减小硬件实现的复杂度。这篇研究论文针对TCORDIC算法的局限性进行了深入的分析和优化，提供了一种能够在低延迟和低误差条件下高效计算浮点正弦/余弦函数的新方法。这对于实时信号处理应用、高性能DSP设计以及嵌入式系统中的数学运算有着重要的实践价值。

894 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS–I: REGULAR PAPERS, VOL. 64, NO. 4, APRIL 2017

Fig. 1. Iterative structure based on traditional CORDIC.

LUT

and T

bshi f t(u,v)

refer to the delay for look-up table

and u-bit barrel shifter of v control signals, respectively.

The delay can be decreased signiﬁcantly by reducing the

number of iterations, and the main delay of each iteration

deriving from CLA. Due to linear convergence of CORDIC,

the bit-width of operands and the number of iterations increase

linearly with precision. High precision requirement causes

great delay in carry propagation.

2) Sign Prediction Scheme: Low latency CORDIC algo-

rithm eliminates the ﬁrst data dependence by using sign

prediction technique in Z path. The binary expression of Z



j=0

×2

−j

, b

{0, 1}.IfZ

= b

, b

,...,b

j−1

,...,b

and b

= b

= ··· = b

j−1

, transformation rule of sign

prediction between the j

and k

bits is deﬁned as follows:

if Z

is positive or in other word b

j−1

= 0, σ

is equal to 1;

otherwise, σ

is equal to -1. Since i > j −1, σ

i+1

is equal to

-1 if b

= 0, and σ

i+1

is equal to 1 if b

= 1.

Angle approximation error of each iteration in this rule is

−i

−α

, and the cumulative error of k −i +1 iterations must

be less than 2

−n

to ensure convergence. So, k ≤ 3i + 1must

be satisﬁed. With index i ≥(n −log

3)/3, it can be known

that 2

−i

−α

< 2

−n

. Thus, α

can be replaced with 2

−i

;so,the

last 2/3 iterations adopt transformation rule of sign prediction

directly. With index i < (n −log

3)/3, correct iterations are

added to ensure prediction accuracy in the iteration sequence

according to k ≤ 3i + 1.

3) Compressive Iterations Based on CSA: Based on sign

prediction, the ﬁrst half of the iterations are compressed by

CSA in X and Y paths. CSA eliminates carry propagation

delay of each compressive iteration and makes it irrelevant to

the bit-width of operands. X

and Y

are divided into sum and

carry, respectively.



= X

+ X

= Y

+ Y

(3)

The iteration formulas are converted as (4), where CLAs

in X and Y paths are replaced with 4:2 CSAs.



i+1

+ X

i+1

= X

+ X

− σ

−i

+ Y

)

i+1

+ Y

i+1

= Y

+ Y

+ σ

−i

+ X

(4)

4) Parallel Iterations Based on Multiplication: The last half

of the iterations are calculated by parallel iterations, which

eliminate the second kind of data dependence and reduce the

number of iterations. The formulas of the i

iteration are as

follows:



i+1

= X

− σ

−i

i+1

= Y

+ σ

−i

(5)

Plugging the i

iteration formulas into the (i + 1)

,the

following formulas are available:



i+2

= X

(1 − σ

i+1

−2i−1

) − Y

(σ

−i

+ σ

i+1

−i−1

)

i+2

= Y

(1 − σ

i+1

−2i−1

) + X

(σ

−i

+ σ

i+1

−i−1

(6)

In this way, the iteration formulas between the m

and the

(n − 1)

are expanded:

⎧

⎪

⎨

⎪

⎩

= X

m,n

− Y

m,n

= Y

m,n

+ X

m,n



1 −



−i−j



−i−j −k−l

−.. + (−1)



i(1)



i(2t)

i(1)

..σ

i(2t)

−i(1)−..i(2t)



m,n





−i

−



−i−j −k

+.. + (−1)



i(1)



i(2t+1)

i(1)

..σ

i(2t+1)

−i(1)−..i(2t+1)



(7)

where i, j, k, i(1),...,i(2t), i(2t +1) are all integers from m

to n −1, and satisﬁed m −1 < i < j < k < n, m −1 < i(1)<

i(2)... < i(2t)<i(2t +1)<n.Whenm ≥ n/2 +1, it can be

known that i + j ≥ 2m + 1 ≥ n + 3. Except the ﬁrst item 1

in A

m,n

, the maximum sum of other items is less than 2

−n−1

Except the ﬁrst item 

(−i)

in B

m,n

, the maximum sum

of other items is less than 2

−n−2

. Because of Y

≤ 1and

≤ 1, the error of X

or Y

is less than 2

−n

, as analyzed

in the Appendix.

Thus, iterations from the (n/2 + 1)

to the (n − 1)

can

be simpliﬁed as follows:



= X

n/2+1

− Y

n/2+1



i=n/2+1

−i

= Y

n/2+1

− X

n/2+1



i=n/2+1

−i

(8)

The last half of the iterations can be regarded as the

rotation with angle



i=n/2+1

−i

, which is equal to Z

n/2+1

Therefore, the formulas are converted as follows:



= X

n/2+1

− Y

n/2+1

= Y

n/2+1

− X

n/2+1

(9)

Thus, the last half of the iterations can be completed with

two multipliers and two adders.

B. Error Analysis for Floating-Point Sine/Cosine

When the input is close to 0 or π/2, the relative error

of ﬂoating-point sine/cosine function is large, due to the

following errors:

1) Angle Approximation Error: Angle approximation error

comes from ﬁnite number of iterations. The resolution of

results is 2

−n

after n iterations; so, the angle approximation

error approaches 2

−n

. Absolute error is smaller as the number

of iterations is larger. However, the magnitude of error is rel-

ative to the exponent under IEEE-754 ﬂoating-point standard.

To round off correctly, absolute error should be less than

剩余13页未读，继续阅读

weixin_38518376

粉丝: 5
资源: 909

低延迟低误差浮点TCORDIC算法：解决正弦/余弦函数高效实现

控制台绘制正弦/余弦曲线

基于CORDIC算法的32位浮点三角超越函数之正余弦函数的FPGA实现

基于CORDIC算法的32位浮点三角超越函数之正余弦函数的FPGA实现 aug2.pdf

32位浮点正余弦函数的pfga实现

数字信号处理 加窗处理 MATLAB tukeywin函数,一般的矩形窗/汉明窗/余弦窗等等,均可以参考设计.

FPGA_ASIC-浮点正余弦函数的FPGA及自定义指令实现-综合文档

(完整版)基于CCS的软件仿真正弦三角余弦任意波形汇编语言.pdf

一阶低通数字滤波器定点补偿算法C语言函数 这函数可以实现一阶低通数字滤波器定点运算结果和浮点结果完美对应，不会出现精度损失问题

高速低延迟的IEEE P754浮点乘法器设计

FPGA实现的高速浮点FFT/IFFT处理器设计

最新资源

数字信号处理加窗处理 MATLAB tukeywin函数,一般的矩形窗/汉明窗/余弦窗等等,均可以参考设计.

一阶低通数字滤波器定点补偿算法C语言函数这函数可以实现一阶低通数字滤波器定点运算结果和浮点结果完美对应，不会出现精度损失问题