FU et al.: FAULT-TOLERANT ROUTING FOR 2-D MESHES 115
Fig. 4. Faulty blocks without shared boundary channels. Dark nodes represent
faults and gray nodes indicate unsafe nodes.
According to Definition 4, node (4, 3) changes to unsafe in
the second iteration because it has a danger neighbor (3, 3)
in x-dimension and a semi-safe-y neighbor (5, 3). Meanwhile,
nodes (5, 3), (3, 4), and (4, 4) also change to unsafe according
to Definition 4. Faulty blocks are formed in two iterations.
It is worthy to note that allowing faulty blocks to share
boundaries could further reduce the number of sacrificed
fault-free nodes. However, shared boundaries will significantly
increase the routing complexity. The discussion about the
tradeoffs between the number of sacrificed fault-free nodes
and the routing complexity is left as the future work.
III. R
ELATED WORK
Since the NoCs this paper concerns do not have vir-
tual channels, the fault-tolerant routing algorithms rely-
ing on virtual channels, such as [12], [15]–[17], are not
reviewed in this section. Furthermore, there is another kind
of fault-tolerant routing algorithms, stochastic algorithms.
Stochastic routing algorithms enhance NoC reliability by
sending multiple replicated packets through redundant routes,
such as the probabilistic gossip flooding algorithm [18] and
N-Random walk algorithm [19], or by deflection, such as [20],
[21]. Although stochastic routing algorithms can be highly
resilient, they also face some design challenges, such as
high energy and bandwidth consumption. Thus, this paper
mainly focuses on nonstochastic routing algorithms. There
are some flow control techniques also can be used to avoid
deadlock, such as the bubble flow control [22] and the one
proposed in [23]. Since this paper focuses on wormhole-
switched networks, those flow control techniques will not be
reviewed in this section.
Fault-tolerant routing algorithms designed for networks
without virtual channels can be categorized into two classes,
turn model-based and segment-based. For example, Glass
and Ni [9] proposed a nonminimal version of negative-first
routing [6].Wu proposed a fault-tolerant routing based on
odd–even turn model [10]. Zhang et al. [11] proposed a
reconfigurable router to tolerate one faulty block.
Fick et al. [24], [25] proposed a distributed algorithm
to reconfigure the routing table. Fu et al. [26] proposed a
multiple-round dimension-order routing.
Segment-based routing classifies networks into subnets, and
subnets into segments [27]. By placing a bidirectional turn
restriction in each segment, the network can be guaranteed
deadlock free. Cooperating with the logic-based distributed
routing [28] or universal LBDR [29], segment-based routing
provides a way to improve the reliability of NoCs.
We should note that fault-tolerant routing algorithms are
expected to be high resilience, high performance, high scala-
bility, and low cost. However, these objectives are somewhat
conflicting. Therefore, there is a tradeoff in designing fault-
tolerant routing.
For example, algorithms relying on off-line analysis with
global fault information, such as those segment-based routing
algorithms [26], [27], [29], can tolerate more faults. However,
for NoCs which cannot afford virtual channels, collecting and
dumping global fault information is usually too expensive.
Routing table provides the flexibility to reconfigure the
network in the presence of faults. However, algorithms relying
on a routing table, such as [24], [30], are not suitable for large-
scale NoCs, especially for those without virtual channels, due
to the cost problem [28].
Logic-based fault-tolerant routing algorithms, such as
in [9]–[11], is low cost. However, the main problem
in [9] and [11] is that only one fault can be tolerated.
Zhang et al. [11] claimed that their algorithm can be extended
to tolerate multiple faults by including them into one convex
faulty block. However, this usually leads to a large number
of disabled fault-free nodes. The main problem in [10] is the
way that is used to handle the faults locating on four network
edges as well as the two columns that are adjacent to the left
and right network edges. For example, if a fault appears at
these places, nodes of the corresponding edge or column are
all disabled. Unfortunately, the number of disabled nodes is
large.
This paper concerns the NoCs without virtual channels. We
believe that this kind of NoC is quite cost sensitive. Therefore,
we select the logic-based fault-tolerant routing algorithms,
such as [10] and [11], as the baseline algorithms. The main
difference between the proposed ZoneDefense routing and
previous work [10], [11] is the use of defense zones. The main
benefit of ZoneDefense routing is the significantly reduced
number of disabled fault-free nodes.
IV. D
EFENSE ZONES
According to [7], a network is deadlock free if all rightmost
columns are removed. As shown in Fig. 3, ES, SW, EN,
and NW turns are necessary to form rightmost columns. To
distinguish them from others, they are called unexpected turns.
Unfortunately, unexpected turns may be introduced if a packet
hits the boundary of a faulty block. To avoid unexpected turns,
we introduce the defense zones, so that packets could find the
faulty block and route around it in advance.
The formation of defense zones is triggered by the detection
of faults using such as build-in self-test techniques [31]. In this
paper, we utilize the dynamic fault model, but assume that no
new fault occurs during a routing process like [10]. However,
in practice, faults may occur at any time. To support dynamic
faults, one can exploit more reliable flow control techniques,
such as APCS [32] and the one proposed in [33]. These