网格架构的高效容错最小路由算法：路径计数器法

63 浏览量更新于2024-08-26 收藏 770KB PDF 举报

网格体系结构的通用容错最小路由是一种针对网格网络架构设计的故障容忍性路由算法。该算法的目标是寻找源节点与目标节点之间的曼哈顿路径，并确保即使存在故障节点，也能找到一条避开它们的最短路径。在这个过程中，除了故障节点，还有一些非故障节点（无助节点）也可能对最小路径的构建有影响，因为它们可能阻碍了直接连接源目标的路径。有效识别并处理这些无助节点是算法设计的关键挑战。传统的容错最小路由算法可能无法高效地解决这个问题，因为它们可能在寻找路径时忽视了无助节点的考虑。为了改进这一状况，研究人员提出了允许路径计数器（Allowed-Path-Counter Method）算法。这种方法的核心思想是利用路径计数器来区分和标记那些对构建故障容忍性最小路径无益的非故障节点。这种方法的优点在于： 1. **低时间复杂度**：允许路径计数器算法设计得足够高效，能够在相对短的时间内完成节点的分类和路径计数，这对于大规模网络尤为重要，避免了因计算复杂度过高导致的性能瓶颈。 2. **灵活性**：算法能够支持任意的故障分布情况，无论故障如何分布在网格中，都能够找到有效的容错最小路径。 3. **路径检查与保留**：通过路径计数，算法能够检查是否存在至少一条故障容忍性最小路径，同时不会误判或遗漏任何可能的路径，保证了路径选择的完整性。 4. **适用范围**：这种算法不仅适用于网络-on-chip（NoC，片上网络）设计，也适用于更广泛的网格架构，因为网格结构在现代计算机芯片设计中广泛应用，对于故障容忍性和性能优化至关重要。 5. **挑战与解决方案**：针对无助节点的标记问题，允许路径计数器方法提供了一个创新的解决方案，这使得网格体系结构下的容错最小路由算法更加完善，能够适应复杂的网络环境。总结来说，网格体系结构的通用容错最小路由算法是一个重要的研究领域，尤其是面对网络中的故障管理和无助节点处理。允许路径计数器方法作为最新的技术进步，解决了传统方法在这一问题上的不足，为网格网络的稳定性和可靠性提供了强大的保障。

A General Fault-Tolerant Minimal Routing

for Mesh Architectures

Hongzhi Zhao, Member, IEEE, Nader Bagherzadeh,

Fellow, IEEE, and Jie Wu, Member, IEEE

Abstract—Fault-tolerant minimal routing algorithms aim at ﬁnding a Manhattan

path between the source and destination nodes and route around all faulty nodes.

Additionally, some non-faulty nodes that are helpless to make up of a fault-tolerant

minimal path should also be routed around. How to label such non-faulty nodes

efﬁciently is a major challenge. State-of-the-art solutions could not address it very

well. We propose a path-counter method. It can label every node that are helpless

to make up of a fault-tolerant minimal path with low time complexity. By counter the

number of fault-tolerant minimal paths, it can: support arbitrary fault distribution,

check the existence of fault-tolerant minimal paths, not sacriﬁce any available fault-

tolerant minimal paths.

Index Terms—Allowed-path-counter method, fault-tolerant minimal routing,

network-on-chip, mesh architectures

1INTRODUCTION

THE mesh-connected topology is one of the most thoroughly inves-

tigated network topologies for massively parallel computer net-

works [1] and Network-on-Chip (NoC) [2]. 2/3-dimensional (2D/

3D) mesh are lower dimensional meshes that have been commonly

discussed due to structural regularity for ease of construction and

high potential for handling various algorithms [3]. In mesh archi-

tectures with a large number of computing nodes (i.e., processors,

cores or processing elements), the network performance depends

heavily on the efﬁciency of routing algorithms.

As the scale of mesh architecture increases, the chance of failure

also increases [2]. The complex nature of network also makes them

vulnerable to disturbances. Therefore, the ability of tolerating fail-

ure for routing algorithms is becoming increasingly important [3].

To ﬁnd one fault-free path for reliable communication, fault-toler-

ant routing is usually used and has been studied extensively [3],

[4]. According to the length of path achieved, fault-tolerant routing

can be classiﬁed as two types: one is the fault-tolerant minimal

routing algorithm which always routes the message along a Man-

hattan distance path in mesh [3], [4]; the other is the non-minimal

fault-tolerant routing algorithm. Because minimal routing (or Man-

hattan routing) algorithms can always route the packet to the desti-

nation in the quickest way, it is meaningful to study the minimal

(or Manhattan) fault-tolerant algorithms for mesh networks with

faulty nodes [3], [4], [5]. This paper focuses on the fault-tolerant

ability, so link weight is not taken into account.

Major challenges of fault-tolerant minimal routing algorithms

include: tolerating arbitrary fault distribution [2], checking the exis-

tence of fault-tolerant minimal paths [4], not sacriﬁcing any avail-

able fault-tolerant minimal paths, and low time-complexity [5].

This paper will examine these challenges. Section 2 reviews related

work. Section 3 presents an allowed-path-counter method. Section 4

introduces how to apply the allowed-path-counter method to

design fault-tolerant minimal routing algorithms. Section 5 con-

cludes the paper.

2RELATED WORK

There are many algorithms supporting fault-tolerant minimal rout-

ing in mesh architectures. In earlier years, only a certain number of

faulty nodes were tolerated. In 1996, Glass and Ni modiﬁed the

negative-ﬁrst routing algorithm to make it (n-1)-fault tolerant for

n-dimensional meshes [1]. It can only tolerate one faulty node in

the 2D mesh network, or two faulty nodes in 3D mesh network.

The routing algorithm presented by Shih can tolerate any pattern

of faulty nodes as long as the number of faulty nodes is no more

than certain number [6]. Although it also supports minimal rout-

ing, the certain number of faulty nodes tolerated limits its applica-

tion. To tolerate more faulty nodes for routing algorithms, Wu

proposed a fault-tolerant extended XY-routing protocol [7]. It can

tolerate more faulty nodes than the aforementioned work. But his

approach does not support failures of edge nodes.

After that, the concept of “fault block model” was presented to

address above problems. Fault block model need label nodes that

are helpless to make up of a fault-tolerant Manhattan path because

of the blocking from faulty nodes. Except faulty nodes, Wang pre-

sented the concepts “useless node” and “unreachable node” to

describe such helpless nodes in 2003 [4]. For example, Fu et al.

used coarse-grained rectangle block model for cost-sensitive NoC

that can include all the faulty nodes regardless of the fault distribu-

tion [2], but their routing algorithm needs to deactivate healthy

nodes. Especially, it is possible for a healthy destination node to be

deactivated and then included in a fault block or fault ring. For this

condition, all the available fault-tolerant minimal paths from a

source node to the destination node are sacriﬁced. Fukushima et al.

proposed a routing algorithm named Overlapped-Ring-Chain-

Route for NoC to reduce the number of deactivated nodes in ﬁne-

grained rectangle regions, but still sacriﬁced healthy nodes [8].

Especially, these works could not decide the existence of fault-tol-

erant paths. So even if the available fault-tolerant paths do not

exist, the packets will still be injected into the network so that the

network is congested pointlessly.

To decrease the number of sacriﬁced healthy nodes, Tse et al.

used minimal rectangle fault block to surround faulty nodes and

then designed a centralized fault-tolerant routing algorithm [9].

The rectangle region did not include healthy nodes, but many rect-

angle regions need to be preprocessed with high time-complexity

according to the global information of mesh networks. Although

the XY routing algorithm that was used can avoid deadlock prob-

lem, XY routing algorithm itself will sacriﬁce many available fault-

tolerant minimal paths for the sake of its low adaptivity. But, it can-

not check the existence of fault-tolerant paths.

To our knowledge, Minimal-Connected-Component (MCC)

fault block model proposed by Wang [4] and its variants [3] can

identify the existence of fault-tolerant minimal paths online. So

trafﬁc that cannot reach the destination along available minimal

paths will not be injected into the network. However, constructing

all MCC fault blocks needs to collect and distribute the MCC infor-

mation via information exchanges among neighbors. Its time-com-

plexity is up to Oðn

Þ where n is the number of faulty nodes.

To overcome above drawbacks existed in fault block models, we

propose a Allowed-Path-Counter method for fault-tolerant mini-

mal routing algorithms. It can label all useless, all unreachable

nodes and prohibited nodes with low time-complexity by counting

every nodes fault-tolerant Manhattan paths to the source or desti-

nation node. Based on the result of labeling, it is easy to ﬁnd fault-

tolerant minimal paths.

 H. Zhao is with the School of Computer and Information Technology, Beijing Jiao

Tong University, Beijing 100044, China. E-mail: hzzhao@m.bjtu.edu.cn.

 N. Bagherzadeh and J. Wu are with the Department of EECS, University of

California, Irvine, CA 92697. E-mail: {nader, wuj8}@uci.edu.

Manuscript received 19 Oct. 2016; revised 3 Jan. 2017; accepted 8 Jan. 2017. Date of pub-

lication 10 Jan. 2017; date of current version 15 June 2017.

Recommended for acceptance by J.D. Bruguera.

For information on obtaining reprints of this article, please send e-mail to: reprints@ieee.

org, and reference the Digital Object Identiﬁer below.

Digital Object Identiﬁer no. 10.1109/TC.2017.2651828

1240 IEEE TRANSACTIONS ON COMPUTERS, VOL. 66, NO. 7, JULY 2017

0018-9340 ß 2017 IEEE. Personal use is per mitted, but republication/redistribution requires IEEE permission.

See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

下载后可阅读完整内容，剩余6页未读，立即下载

weixin_38518518

粉丝: 6
资源: 959

网格架构的高效容错最小路由算法：路径计数器法

基于P2P Overlay的网格体系结构及关键技术研究

分布式系统领域教程pdf

2D-mesh上无死锁容错路由算法提升片上网络性能

分布式系统中的服务治理与监控体系建设

计算机视觉开发：OpenCV入门教程及应用

围绕着一系列的经典Python练习题 .zip

毕设源码-python-django基于python技术的学生管理系统的设计与开发-期末大作业+说明文档.rar

python入门-安装Python软件包.pdf

消息中间件源码学习（打注释学习）.zip

阿里消息中间件MetaQ学习Demo.zip

最新资源