894 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS–I: REGULAR PAPERS, VOL. 64, NO. 4, APRIL 2017
Fig. 1. Iterative structure based on traditional CORDIC.
T
LUT
and T
bshi f t(u,v)
refer to the delay for look-up table
and u-bit barrel shifter of v control signals, respectively.
The delay can be decreased significantly by reducing the
number of iterations, and the main delay of each iteration
deriving from CLA. Due to linear convergence of CORDIC,
the bit-width of operands and the number of iterations increase
linearly with precision. High precision requirement causes
great delay in carry propagation.
2) Sign Prediction Scheme: Low latency CORDIC algo-
rithm eliminates the first data dependence by using sign
prediction technique in Z path. The binary expression of Z
j
is
n
j=0
b
j
×2
−j
, b
j
{0, 1}.IfZ
j
= b
0
, b
1
,...,b
j−1
b
j
,...,b
n
and b
0
= b
1
= ··· = b
j−1
, transformation rule of sign
prediction between the j
th
and k
th
bits is defined as follows:
if Z
j
is positive or in other word b
j−1
= 0, σ
j
is equal to 1;
otherwise, σ
j
is equal to -1. Since i > j −1, σ
i+1
is equal to
-1 if b
i
= 0, and σ
i+1
is equal to 1 if b
i
= 1.
Angle approximation error of each iteration in this rule is
2
−i
−α
i
, and the cumulative error of k −i +1 iterations must
be less than 2
−n
to ensure convergence. So, k ≤ 3i + 1must
be satisfied. With index i ≥(n −log
2
3)/3, it can be known
that 2
−i
−α
i
< 2
−n
. Thus, α
i
can be replaced with 2
−i
;so,the
last 2/3 iterations adopt transformation rule of sign prediction
directly. With index i < (n −log
2
3)/3, correct iterations are
added to ensure prediction accuracy in the iteration sequence
according to k ≤ 3i + 1.
3) Compressive Iterations Based on CSA: Based on sign
prediction, the first half of the iterations are compressed by
CSA in X and Y paths. CSA eliminates carry propagation
delay of each compressive iteration and makes it irrelevant to
the bit-width of operands. X
i
and Y
i
are divided into sum and
carry, respectively.
˙
X
i
= X
i
C
+ X
i
S
Y
i
= Y
i
C
+ Y
i
S
.
(3)
The iteration formulas are converted as (4), where CLAs
in X and Y paths are replaced with 4:2 CSAs.
˙
X
C
i+1
+ X
S
i+1
= X
i
C
+ X
i
S
− σ
i
2
−i
(Y
i
C
+ Y
i
S
)
Y
C
i+1
+ Y
S
i+1
= Y
i
C
+ Y
i
S
+ σ
i
2
−i
(X
i
C
+ X
i
S
).
(4)
4) Parallel Iterations Based on Multiplication: The last half
of the iterations are calculated by parallel iterations, which
eliminate the second kind of data dependence and reduce the
number of iterations. The formulas of the i
th
iteration are as
follows:
˙
X
i+1
= X
i
− σ
i
2
−i
Y
i
Y
i+1
= Y
i
+ σ
i
2
−i
X
i
.
(5)
Plugging the i
th
iteration formulas into the (i + 1)
th
,the
following formulas are available:
˙
X
i+2
= X
i
(1 − σ
i
σ
i+1
2
−2i−1
) − Y
i
(σ
i
2
−i
+ σ
i+1
2
−i−1
)
Y
i+2
= Y
i
(1 − σ
i
σ
i+1
2
−2i−1
) + X
i
(σ
i
2
−i
+ σ
i+1
2
−i−1
).
(6)
In this way, the iteration formulas between the m
th
and the
(n − 1)
th
are expanded:
⎧
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎩
.
X
n
= X
m
A
m,n
− Y
m
B
m,n
Y
n
= Y
m
A
m,n
+ X
m
B
m,n
A
m,n
=
1 −
i
j
σ
i
σ
j
2
−i−j
+
i
j
k
l
σ
i
σ
j
σ
k
σ
l
2
−i−j −k−l
−.. + (−1)
t
i(1)
..
i(2t)
σ
i(1)
..σ
i(2t)
2
−i(1)−..i(2t)
B
m,n
=
i
σ
i
2
−i
−
i
j
k
σ
i
σ
j
σ
k
2
−i−j −k
+.. + (−1)
t
i(1)
..
i(2t+1)
σ
i(1)
..σ
i(2t+1)
2
−i(1)−..i(2t+1)
(7)
where i, j, k, i(1),...,i(2t), i(2t +1) are all integers from m
to n −1, and satisfied m −1 < i < j < k < n, m −1 < i(1)<
i(2)... < i(2t)<i(2t +1)<n.Whenm ≥ n/2 +1, it can be
known that i + j ≥ 2m + 1 ≥ n + 3. Except the first item 1
in A
m,n
, the maximum sum of other items is less than 2
−n−1
.
Except the first item
i
σ
i
2
(−i)
in B
m,n
, the maximum sum
of other items is less than 2
−n−2
. Because of Y
m
≤ 1and
X
m
≤ 1, the error of X
n
or Y
n
is less than 2
−n
, as analyzed
in the Appendix.
Thus, iterations from the (n/2 + 1)
th
to the (n − 1)
th
can
be simplified as follows:
˙
X
n
= X
n/2+1
− Y
n/2+1
n
i=n/2+1
σ
i
2
−i
Y
n
= Y
n/2+1
− X
n/2+1
n
i=n/2+1
σ
i
2
−i
.
(8)
The last half of the iterations can be regarded as the
rotation with angle
n
i=n/2+1
σ
i
2
−i
, which is equal to Z
n/2+1
.
Therefore, the formulas are converted as follows:
˙
X
n
= X
n/2+1
− Y
n/2+1
Z
n/2+1
Y
n
= Y
n/2+1
− X
n/2+1
Z
n/2+1
.
(9)
Thus, the last half of the iterations can be completed with
two multipliers and two adders.
B. Error Analysis for Floating-Point Sine/Cosine
When the input is close to 0 or π/2, the relative error
of floating-point sine/cosine function is large, due to the
following errors:
1) Angle Approximation Error: Angle approximation error
comes from finite number of iterations. The resolution of
results is 2
−n
after n iterations; so, the angle approximation
error approaches 2
−n
. Absolute error is smaller as the number
of iterations is larger. However, the magnitude of error is rel-
ative to the exponent under IEEE-754 floating-point standard.
To round off correctly, absolute error should be less than