A Parallel IEEE P754 Decimal Floating-Point Multiplier
Brian Hickmann, Andrew Krioukov, and Michael Schulte
University of Wisconsin - Madison
Dept. of Electrical and Computer Engineering
Madison, WI 53706
{bjhickmann, krioukov, and schulte}@wisc.edu
Mark Erle
International Business Machines
6677 Sauterne Drive
Macungie, PA 18062
merle@us.ibm.com
Abstract
Decimal floating-point multiplication is important in
many commercial applications including banking, tax cal-
culation, currency conversion, and other financial areas.
This paper presents a fully parallel decimal floating-point
multiplier compliant with the recent draft of the IEEE P754
Standard for Floating-point Arithmetic (IEEE P754). The
novelty of the design is that it is the first parallel deci-
mal floating-point multiplier offering low latency and high
throughput. This design is based on a previously published
parallel fixed-point decimal multiplier which uses alternate
decimal digit encodings to reduce area and delay. The
fixed-point design is extended to support floating-point mul-
tiplication by adding several components including expo-
nent generation, rounding, shifting, and exception handling.
Area and delay estimates are presented that show a signifi-
cant latency and throughput improvement with a substantial
increase in area as compared to the only published IEEE
P754 compliant sequential floating-point multiplier. To the
best of our knowledge, this is the first publication to present
a fully parallel decimal floating-point multiplier that com-
plies with IEEE P754.
1. Introduction
Decimal arithmetic is necessary in many financial and
commercial applications, which process decimal values and
perform decimal rounding. However, current software im-
plementations are prohibitively slow [6], prompting hard-
ware manufacturers such as IBM to add decimal floating-
point(DFP) arithmetic support to upcoming microproces-
sors [18]. Furthermore, the IEEE 754 Working Group has
recognized the importance of decimal arithmetic by adding
it to the revised IEEE P754 Draft Standard for Floating-
Point Arithmetic (IEEE P754) [11].
Previous decimal multipliers have primarily focused on
fixed-point multiplication. Designs including [14, 8, 5, 12]
use a sequential approach of iterating over the digits of the
multiplier and selecting an appropriate multiple of the mul-
tiplicand. Generally, these designs have high latency and
low throughput due to their sequential approach.
A few parallel fixed-point multiplier designs have also
been proposed [1, 13]. The floating-point multiplier pre-
sented in this paper is based on the radix-10 fixed-point
multiplier in [1] due to its highly efficient structure. This
multiplier generates a sufficient subset of multiplicand mul-
tiples and then selects all the partial-products in parallel
based on the digits of the multiplier operand. Only a few
designs supporting DFP multiplication have been presented
[3, 2, 15]. However, to our knowledge currently only the
iterative multiplier from [15] complies with the IEEE P754
standard.
This paper presents a parallel DFP multiplier based on
a parallel fixed-point multiplier [1] and a previous imple-
mentation of a DFP multiplier [15]. The novelty of the de-
sign is that it is the first parallel DFP multiplier, offering
low latency, high throughput, and IEEE P754 compliance.
In addition, novel early shift amount calculation and excep-
tion pass-through mechanisms are used to provide increased
performance. This design allows trade-offs between clock
frequency and overall latency by adding pipeline stages.
As compared to the sequential design in [15], an 11-stage
pipelined version of our design has similar clock speed, sig-
nificantly reduced latency (11 vs. 21 cycles), and one result
per cycle throughput, while incurring a substantial 371% in-
crease in area. To the best of our knowledge, this is the first
published design of a parallel decimal floating-point multi-
plier that is compliant with IEEE P754.
The outline of the paper is as follows. In Section 2,
background information on decimal floating-point multipli-
cation and IEEE P754 formats is presented. Section 3 con-
tains a detailed description of the design; starting with a
high-level overview, followed by descriptions of the fixed-
point multiplier, intermediate exponent and shift calcula-
tion, rounding, and special case handling. Results are pre-
sented in Section 4, followed by a summary in Section 5.