提高性能：偏斜容忍的多米诺电路技术

下载需积分: 7 | PDF格式 | 207KB | 更新于2024-09-12 | 139 浏览量 | 举报

"Skew-Tolerant Domino Circuits" 在高性能的CMOS微处理器设计中，多米诺电路（Domino Circuits）被广泛应用。然而，传统的多米诺门级联管道（pipelines）面临显著的时序开销，这主要源于时钟偏斜（clock skew）、锁存器延迟以及无法借用时间的能力。为了克服这些限制，一些设计师引入了重叠时钟相位，确保多米诺门总是在关键输入到达时准备好进行评估，并且在下一个门使用结果之前不进行预充电。本文介绍了一种称为“偏斜容忍的多米诺电路”（Skew-Tolerant Domino Circuits）的系统性框架，用于理解和分析具有重叠时钟的多米诺电路。模拟结果显示，在高速系统中，这种方法相比传统的多米诺电路能实现25%或更高的速度提升。关键词：加法器、时钟偏斜、时钟、CMOS数字集成电路、动态逻辑、VLSI电路设计。一、引言随着微架构改进带来的收益逐渐减少，微处理器设计者们越来越依赖于电路层面的创新来提升性能。多米诺逻辑是一种动态逻辑技术，它在高速运算中表现出色，但其固有的时钟偏斜问题和锁存器延迟限制了其性能潜力。传统的多米诺电路在处理时钟偏斜时，可能导致数据在不适当的时间被评估，从而降低整体系统的效率。二、偏斜容忍的多米诺电路原理偏斜容忍的多米诺电路通过提供多个重叠的时钟相位来解决这些问题。这种设计策略使得每个多米诺门在关键输入到达时都能及时启动评估，而不会因为等待预充电而延迟。这种提前启动并延迟预充电的方法允许电路在时钟周期内更有效地利用时间，从而提高整体运行速度。三、分析与建模该文提出的方法提供了对多米诺电路中重叠时钟行为的深入理解，包括时钟偏斜的影响、门级延迟优化以及如何通过调整时钟相位来最大化性能。这种分析框架对于设计者来说是一个宝贵的工具，可以帮助他们在设计阶段就预测和减少潜在的时序问题。四、模拟结果与比较通过模拟实验，研究者验证了偏斜容忍的多米诺电路相对于传统多米诺电路的性能优势。在高速系统环境下，实现了25%以上的速度提升，这表明该方法对于提高微处理器的运算速度和效率具有显著的效果。五、应用与未来工作这项工作不仅对微处理器设计有直接影响，还可能推广到其他需要高速运算的VLSI（超大规模集成电路）设计中。未来的研究可能会进一步优化这种框架，以适应更复杂的时钟网络和更严格的功耗约束。偏斜容忍的多米诺电路是解决多米诺逻辑时序挑战的一种有效途径，通过创新的时钟管理和电路设计，能够在保持高运算速度的同时，降低时序开销，从而提高整个系统的性能。这一领域的研究对于推动微处理器技术的进步具有重要意义。

1702 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 11, NOVEMBER 1997

Skew-Tolerant Domino Circuits

David Harris, Student Member, IEEE, and Mark A. Horowitz, Senior Member, IEEE

Abstract—Domino circuits are widely used in high-performance

CMOS microprocessors. However, textbook domino pipelines

suffer signiﬁcant timing overhead from clock skew, latch delay,

and the inability to borrow time. To eliminate this overhead, some

designers provide multiple overlapping clock phases such that

domino gates are always ready for evaluation by the time critical

inputs arrive and do not precharge until the next gate consumes

the result. This paper describes a systematic framework, called

skew-tolerant domino circuits, for understanding and analyzing

domino circuits with overlapping clocks. Simulations conﬁrm that

a speedup of 25% or more can be achieved over textbook domino

circuits in high-speed systems.

Index Terms—Adders, clock skew, clocks, CMOS digital inte-

grated circuits, dynamic logic, VLSI circuit design.

I. INTRODUCTION

INCE microarchitectural improvements have been yield-

ing diminishing returns, microprocessor designers seeking

high performance have been forced to aggressively reduce

cycle times beyond that which simple process scaling would

permit. We can normalize cycle time improvement due to

faster processes by expressing cycle time in terms of the

delay of a fanout-of-four (FO4) inverter, i.e., an inverter

driving a load that is four times its input capacitance. Today’s

fastest microprocessors are operating at cycle times below 18

fanout-of-four inverter delays [1].

Domino circuits [2] are an

important enabler for this cycle time improvement [3]–[5]. At

such short cycle times, however, clocking overhead which was

once negligible becomes a signiﬁcant fraction of the clock

period.

As we will see in Section II, when domino circuits are

pipelined in the same way that two-phase static circuits have

traditionally been pipelined, they are highly sensitive to clock

skew, include latch delays on the critical path, and are in-

capable of borrowing time across clock phases to balance

the pipeline. Some designers have discovered that by over-

lapping the clocks controlling domino gates, these sources of

overhead can be hidden, as we illustrate in Section III. We

proceed to analyze domino gates using overlapping clocks in

a systematic framework which we call skew-tolerant domino.

Section IV presents the analysis under a single clock skew

budget. Even more global clock skew can be hidden if we

take advantage of tighter bounds on local clock skew, as

Manuscript received April 10, 1997; revised August 5, 1997. This work was

supported in part by a National Science Foundation fellowship, by Stanford’s

Center for Integrated Systems, and by DARPA Contract DABT63-94-C-0054.

The authors are with Stanford University, Stanford, CA 94305 USA.

Publisher Item Identiﬁer S 0018-9200(97)08035-9.

DEC reports an Alpha 21164 cycle time of 14 “gate delays” where a “gate

delay” is roughly an average fanout-of-three two-input gate. Simulation found

that the average of a two-input fanout-of-three

NAND and NOR delay is about

1.24 fanout-of-four inverter delays.

described in Section V. For many reasonable designs, this

global skew tolerance greatly exceeds the actual system skews,

so Section VI explains how to take advantage of the extra

overlap to allow time borrowing across phases. Section VII

then addresses the critical issue of clock generation and

shows how a single global clock and relatively simple local

clock generators can produce the needed clock phases, while

Section VIII looks at the interfaces of skew-tolerant domino

with static and self-timed logic. Section IX presents simulation

results of skew-tolerant domino applied to an adder self-bypass

path. Finally, Section X summarizes the skew-tolerant domino

techniques and the performance beneﬁts which they offer.

II. T

EXTBOOK DOMINO CIRCUITS

We begin with a review of a simple form of domino circuits,

including a motivation of why domino is beneﬁcial, how

pipelines can be constructed, and why such textbook pipelines

have serious overhead.

Static CMOS gates are slow because an input must drive

both NMOS and PMOS transistors. In any transition, either

the pull-up or pull-down network is activated, meaning the

input capacitance of the inactive network loads down the path.

Moreover, PMOS transistors have poor mobility and must be

sized larger to achieve comparable rising and falling delays,

further increasing input capacitance. Dynamic gates overcome

this weakness by eliminating the PMOS transistors and re-

placing them with a single precharge transistor. The dynamic

gate is precharged high, then may evaluate low through an

NMOS stack. Unfortunately, if one dynamic inverter directly

drives another, a race can corrupt the result. When clk rises,

both outputs have been precharged high. The high input to

the ﬁrst gate causes its output to fall, but the second gate’s

output also falls in response to its initial high input. The circuit

therefore produces an incorrect result because the second

output will never rise during evaluation. Domino circuits solve

this problem by using inverting static gates between dynamic

gates so that the input to each dynamic gate is initially low. The

falling dynamic output and rising static output ripple through a

chain of gates like a stream of toppling dominos. In summary,

domino logic runs 1.5–2

faster than static CMOS logic [6]

because dynamic gates present a much lower input capacitance

for the same output current and have a lower switching

threshold, and because the inverting static gate can be skewed

to favor the critical monotonically rising evaluation edges.

After domino gates evaluate, they must be precharged before

they can be used in the next cycle. If all domino gates were to

precharge simultaneously, the circuit would waste time during

which no useful computation occurs. Therefore, domino logic

is conventionally divided into two phases, ping-ponged such

0018–9200/97$10.00  1997 IEEE

下载后可阅读完整内容，剩余9页未读，立即下载

didiqlx

粉丝: 1

提高性能：偏斜容忍的多米诺电路技术

SkewT_Example_skew_python_Wyoming_T-logP_SkewT-logP_源码

MIPI D-PHY Specification_v00-89-00_6202007141530[1].pdf

qc-skew-heaps:QuickChecking 倾斜堆..

mipi_M-PHY_specification_v3-0.pdf

mipi_M-PHY_specification_v4-1a.pdf

数字设计中的时钟与约束 - IC_learner - 博客园.pdf

Xilinx XC6SLX16-05_full_adder.zip

DS_K4T1G08_16_4QJ-B_Rev1_0-1.pdf

A 960-Mb/s/pin Interface for Skew-Tolerant Bus Using Low Jitter PLL

最新资源