高速低延迟的IEEE P754浮点乘法器设计

需积分: 50 167 浏览量更新于2024-09-11 1 收藏 292KB PDF 举报

"这篇文档是关于浮点乘法器的详细介绍，主要关注基于FPGA的实现，特别是符合IEEE P754标准的并行十进制浮点乘法器设计。作者团队包括来自University of Wisconsin-Madison和International Business Machines的专业人士。论文中提到的设计创新之处在于它是一个提供低延迟和高吞吐量的首个并行十进制浮点乘法器。设计灵感来源于先前发布的使用交替数字编码以减少面积和延迟的并行固定点十进制乘法器，并通过添加如指数生成等组件来支持浮点乘法运算。" 浮点乘法器是计算机硬件中的关键组成部分，特别是在需要精确浮点计算的领域，如金融、科学计算和工程应用。IEEE P754标准是浮点运算的国际标准，定义了浮点数的表示方式、算术运算规则以及异常处理等。符合这一标准的浮点乘法器能确保跨平台的一致性和可移植性。文中提到的并行设计是提高浮点运算速度的关键，因为它允许多个部分同时进行计算，从而大大减少了运算时间。在FPGA（Field-Programmable Gate Array）上实现这样的设计，可以灵活地调整硬件结构以优化性能，这对于需要高效能计算但又受限于功耗或成本的场合特别有价值。该并行浮点乘法器的创新点在于其低延迟和高吞吐量特性，这是通过采用一种改进的固定点十进制乘法器结构实现的，该结构利用了交替的十进制数字编码，可以有效地减少逻辑资源的占用和运算延迟。此外，为了支持浮点运算，设计中还加入了处理指数的部分，这是浮点数的核心部分，负责处理数值的大小和范围。浮点乘法操作涉及到多个步骤，包括对齐小数点、处理指数、以及进行 mantissa（尾数）的乘法。文中提出的并行设计可能涉及将这些步骤并行化，以加速整个计算过程。指数生成组件负责处理两个浮点数的指数相加和可能的溢出情况。同时，可能还包括舍入和规格化机制，以确保结果符合IEEE P754标准的规定。总结来说，这篇文档深入探讨了一个基于FPGA的并行十进制浮点乘法器设计，该设计具有低延迟和高吞吐量的优势，对于需要高效浮点运算的领域，如金融计算和高性能计算，具有重要的实际应用价值。

A Parallel IEEE P754 Decimal Floating-Point Multiplier

Brian Hickmann, Andrew Krioukov, and Michael Schulte

University of Wisconsin - Madison

Dept. of Electrical and Computer Engineering

Madison, WI 53706

{bjhickmann, krioukov, and schulte}@wisc.edu

Mark Erle

International Business Machines

6677 Sauterne Drive

Macungie, PA 18062

merle@us.ibm.com

Abstract

Decimal ﬂoating-point multiplication is important in

many commercial applications including banking, tax cal-

culation, currency conversion, and other ﬁnancial areas.

This paper presents a fully parallel decimal ﬂoating-point

multiplier compliant with the recent draft of the IEEE P754

Standard for Floating-point Arithmetic (IEEE P754). The

novelty of the design is that it is the ﬁrst parallel deci-

mal ﬂoating-point multiplier offering low latency and high

throughput. This design is based on a previously published

parallel ﬁxed-point decimal multiplier which uses alternate

decimal digit encodings to reduce area and delay. The

ﬁxed-point design is extended to support ﬂoating-point mul-

tiplication by adding several components including expo-

nent generation, rounding, shifting, and exception handling.

Area and delay estimates are presented that show a signiﬁ-

cant latency and throughput improvement with a substantial

increase in area as compared to the only published IEEE

P754 compliant sequential ﬂoating-point multiplier. To the

best of our knowledge, this is the ﬁrst publication to present

a fully parallel decimal ﬂoating-point multiplier that com-

plies with IEEE P754.

1. Introduction

Decimal arithmetic is necessary in many ﬁnancial and

commercial applications, which process decimal values and

perform decimal rounding. However, current software im-

plementations are prohibitively slow [6], prompting hard-

ware manufacturers such as IBM to add decimal ﬂoating-

point(DFP) arithmetic support to upcoming microproces-

sors [18]. Furthermore, the IEEE 754 Working Group has

recognized the importance of decimal arithmetic by adding

it to the revised IEEE P754 Draft Standard for Floating-

Point Arithmetic (IEEE P754) [11].

Previous decimal multipliers have primarily focused on

ﬁxed-point multiplication. Designs including [14, 8, 5, 12]

use a sequential approach of iterating over the digits of the

multiplier and selecting an appropriate multiple of the mul-

tiplicand. Generally, these designs have high latency and

low throughput due to their sequential approach.

A few parallel ﬁxed-point multiplier designs have also

been proposed [1, 13]. The ﬂoating-point multiplier pre-

sented in this paper is based on the radix-10 ﬁxed-point

multiplier in [1] due to its highly efﬁcient structure. This

multiplier generates a sufﬁcient subset of multiplicand mul-

tiples and then selects all the partial-products in parallel

based on the digits of the multiplier operand. Only a few

designs supporting DFP multiplication have been presented

[3, 2, 15]. However, to our knowledge currently only the

iterative multiplier from [15] complies with the IEEE P754

standard.

This paper presents a parallel DFP multiplier based on

a parallel ﬁxed-point multiplier [1] and a previous imple-

mentation of a DFP multiplier [15]. The novelty of the de-

sign is that it is the ﬁrst parallel DFP multiplier, offering

low latency, high throughput, and IEEE P754 compliance.

In addition, novel early shift amount calculation and excep-

tion pass-through mechanisms are used to provide increased

performance. This design allows trade-offs between clock

frequency and overall latency by adding pipeline stages.

As compared to the sequential design in [15], an 11-stage

pipelined version of our design has similar clock speed, sig-

niﬁcantly reduced latency (11 vs. 21 cycles), and one result

per cycle throughput, while incurring a substantial 371% in-

crease in area. To the best of our knowledge, this is the ﬁrst

published design of a parallel decimal ﬂoating-point multi-

plier that is compliant with IEEE P754.

The outline of the paper is as follows. In Section 2,

background information on decimal ﬂoating-point multipli-

cation and IEEE P754 formats is presented. Section 3 con-

tains a detailed description of the design; starting with a

high-level overview, followed by descriptions of the ﬁxed-

point multiplier, intermediate exponent and shift calcula-

tion, rounding, and special case handling. Results are pre-

sented in Section 4, followed by a summary in Section 5.

下载后可阅读完整内容，剩余7页未读，立即下载

Saint900721

粉丝: 0

高速低延迟的IEEE P754浮点乘法器设计

浮点数乘法器，verilog

FPGA 64位浮点乘法器代码

浮点数乘法器的FPGA实现

Mul32.rar_Mul32_mul32*32_verilog浮点乘法_vhdl 浮点 乘法_浮点乘法器

FPGA 浮点乘法器源码

fpmul.zip_FPMUL_浮点 verilog_浮点乘法器_浮点数_浮点数乘法器

fpga中浮点乘法器的实现

Verilog HDL实现单精度浮点乘法器

32位定-浮点乘法器设计

浮点乘法器IEEE舍入的实现

最新资源

Mul32.rar_Mul32_mul32*32_verilog浮点乘法_vhdl 浮点乘法_浮点乘法器