普林斯顿大学开源众核研究框架：推动规模与可扩展性

需积分: 9 60 浏览量更新于2024-07-19 收藏 1.98MB PDF 举报

OpenPiton是一个由普林斯顿大学主导的开源众核处理器研究框架，旨在解决学术界在开发大规模、复杂多核处理器时所面临的挑战。该框架由多位业界权威和学者共同参与，包括 Jonathan Balkind、Michael McKeown、Yaosheng Fu、Tri Nguyen、Yanqi Zhou、Alexey Lavrov、Mohammad Shahrad、Adi Fuchs、Samuel Payne、Xiaohua Liang、Matthew Matl 和 David Wentzlaff 等，他们的合作展示了跨学科的合作精神，致力于推动学术成果的可扩展性、可扩展性和配置灵活性。 OpenPiton的设计目标是为模拟、综合和软件探索提供一个开放架构平台，使得研究人员能够从单核到数亿核的规模进行研究，实现从基础到实际应用的无缝连接。这一框架强调了通用性，支持多线程处理，旨在成为多核领域的一个坚实基石，促进学术界与工业界的交流与合作。在OpenPiton中，关键特点包括： 1. 可扩展性：允许研究人员根据研究需求轻松扩展核心数量，适应不断增长的计算需求，无论是实验新设计还是优化现有架构。 2. 灵活性：框架提供了丰富的配置选项，使得研究者可以根据具体应用场景调整核心结构、内存系统和I/O接口等，以满足各种性能需求。 3. 验证工具支持：框架集成了成熟的验证工具，确保设计的正确性和可靠性，减少了从概念到实现的过程中的不确定性。 4. 软件生态系统：OpenPiton支持广泛的软件开发和调试环境，包括操作系统、编译器和应用程序，便于开发者快速构建和测试多核应用。 5. 开源性质：作为一个开源项目，OpenPiton鼓励社区参与，促进知识共享，加快创新速度，并降低进入门槛，有利于众核技术的普及和发展。 6. 学术合作：普林斯顿大学的研究团队与NVIDIA等公司紧密合作，这意味着研究成果不仅限于学术界，也有实际应用的可能性。通过OpenPiton，学术界能够与工业界在多核处理器研发上共享知识和资源，加速技术进步，推动计算机系统的整体发展。这将有助于填补理论与实践之间的鸿沟，使得未来的计算机硬件更加高效、灵活且易于定制。对于那些对多核处理器感兴趣的学生、研究人员和工程师来说，OpenPiton无疑是一个宝贵的学习和创新工具。

Global control logic

NoC3

input

buf

NoC2

output

buf

MSHR

Tag

array

State

array

Stall logic

Way

selection

Way

selection

Decode

Dir

array

Data

array

Msg

to send

NoC1

input

buf

Figure 4: The architecture of the L2 cache.

coherence packet formats, and a write-back layer, caching

stores from the write-through L1 data cache. It is an 8KB

4-way set associative write-back cache (the same size as the

L1 data cache by default) with conﬁgurable associativity

and size. The line size is the same as the L1 data cache at

16-bytes.

The L1.5 communicates requests and responses to and

from the core through CCX. The CCX bus is preserved as

the primary interface to the OpenSPARC T1. The L1.5 CCX

interface could relatively easily be replaced with other inter-

faces like AMBA or AXI to accommodate different cores.

When a memory request results in a miss, the L1.5 trans-

lates and forwards request to the L2 through the network-

on-chip (NoC) channels. Generally, the L1.5 issues requests

on NoC1, receives data on NoC2, and writes back modiﬁed

cache lines on NoC3, as shown in Figure 3.

While the L1.5 was named as such during the devel-

opment of the Piton ASIC prototype, in traditional com-

puter architecture contexts it would be appropriate to call

it the “private L2” and to call the next level cache the

“shared/distributed L3”. The L1.5 is inclusive of the L1 data

cache; each can be independently sized with independent

eviction policies. As a space- and performance-conscious

optimization, the L1.5 does not cache instructions–these

cache lines are bypassed directly between the L1 instruc-

tion cache and the L2. It is possible to modify the L1.5 to

also cache instructions.

2.3.3 L2 Cache

The L2 cache is a distributed write-back cache shared by all

tiles. The default cache conﬁguration is 64KB per tile and

4-way set associativity, but both the cache size and associa-

tivity are conﬁgurable. The cache line size is 64 bytes, larger

than caches lower in the hierarchy. The integrated directory

cache has 64 bits per entry, so it can precisely keep track of

up to 64 sharers by default.

The L2 cache is inclusive of the private caches (L1 and

L1.5). Cache line way mapping between the L1.5 and the

L2 is independent and is entirely subject to the replacement

policy of each cache. In fact, since the L2 is distributed,

cache lines consecutively mapped in the L1.5 are likely to

be strewn across multiple L2 tiles (L2 tile referring to a

portion of the distributed L2 cache in a single tile). By

default, OpenPiton maps cache lines using constant strides

with the lower address bits across all L2 tiles, but Coherence

Domain Restriction (CDR) [30], an experimental research

feature integrated into OpenPiton, can be used to interleave

cache lines belonging to a single application or page across

a software-speciﬁed set of L2 tiles.

As shown in Figure 4, the L2 cache is designed with

dual parallel pipelines. The ﬁrst pipeline (top) receives cache

miss request packets from lower in the cache hierarchy on

NoC1 and sends memory request packets to off-chip DRAM

and cache ﬁll response packets to lower in the cache hierar-

chy on NoC2. The second pipeline (bottom) receives mem-

ory response packets from off-chip DRAM and modiﬁed

cache line writeback packets from lower in the cache hier-

archy on NoC3. The ﬁrst L2 pipeline contains 4 stages and

the second pipeline contains only 3 stages since it does not

transmit output packets. The interaction between the L2 and

the three NoCs is also depicted in Figure 3.

2.4 Cache Coherence and Memory Consistency Model

The memory subsystem maintains cache coherence with a

directory-based MESI coherence protocol. It adheres to the

TSO memory consistency model used by the OpenSPARC

T1. Coherent messages between L1.5 caches and L2 caches

communicate through three NoCs, carefully designed to en-

sure deadlock-free operation.

The L2 is the point of coherence for all memory requests,

except for non-cacheable loads and stores which directly by-

pass the L2 cache. All other memory operations (including

atomic operations such as compare-and-swap) are ordered

and the L2 strictly follows this order when servicing re-

quests.

The L2 also keeps the instruction and data caches coher-

ent. Per the OpenSPARC T1’s original design, coherence be-

tween the two L1 caches is maintained at the L2. When a line

is present in a core’s L1 instruction cache and is loaded as

data, the L2 will send invalidations to the relevant instruction

caches before servicing the load.

High-level features of the coherence protocol include:

•

4-step message communication

•

Silent eviction in Exclusive and Shared states

•

No acknowledgments for dirty write-backs

•

Three 64-bit physical NoCs with point-to-point ordering

•

Co-location of L2 cache and coherence directory

2.5 Interconnect

There are two major interconnection types used in Open-

Piton, the NoCs and the chip bridge.

220

剩余15页未读，继续阅读

xiaocat85

粉丝: 0
资源: 2

普林斯顿大学开源众核研究框架：推动规模与可扩展性

hypermill中文帮助

darwin.rar

Mapping Strategies on Manycore Systems

manycore.rar_行业发展研究_WORD_

exageostat:Manycore系统上用于地统计的高性能统一框架

NoC基Manycore系统容错拓扑重配置策略

numexpr-2.8.3-cp38-cp38-win_amd64.whl

ujson-5.3.0-cp311-cp311-win_amd64.whl

基于MATLAB车牌识别程序技术实现面板GUI.zip

RJFireWall-maste赛资源

最新资源