International Journal of Hybrid Information Technology
Vol. 6, No. 3, May, 2013
65
A Study on the Impact of Multiple Failures on OSPF Convergence
Dan Zhao, Xiaofeng Hu and Chunqing Wu
School of computer, National University of Defense Technology
Changsha, Hunan, China
danzhao.nudt@gmail.com,{xfhu,wuchungqing}@nudt.edu.cn
Abstract
Open Shortest Path First (OSPF) is a popular link state routing protocol widely used in
Internet infrastructure. OSPF implements several timers to limit the protocol overhead. With
these timers, it usually takes several tens of seconds for OSPF network to recover from a
failure. The convergence time is delayed mainly by the timers of failure detection and routing
calculation scheduling. In this paper we analyze OSPF convergence behavior in presence of
multiple failures, where the interactions between failure detection and routing calculation
scheduling could generate complicated dynamics during convergence process. We also
present experimental study to understand the impact of multiple failures on convergence. The
results demonstrate that multiple failures have a greater chance to delay the convergence.
This suggests that operators should take it into account while configuring OSPF network.
Keywords: OSPF, link state protocol, multiple failures, convergence
1. Introduction
Open Shortest Path First (OSPF) [1] is a successful link state protocol that is widely used
in intra-domain ISP networks. In OSPF network, every router establishes adjacency with its
connected counterparts and describes the connection status using Link State Advertisement
(LSA). Once the topology changes, routers employ specific mechanism, typically Hello
protocol prescribed in OSPF standard, to detect the failure and generate new LSAs. After the
synchronization of LSAs throughout the network by flooding, routers are capable of
calculating the correct routing table for packet forwarding.
The primary philosophy of protocol design is to limit the processing/bandwidth
requirements of the protocol, while the time required to recover from a failure in the network
topology (speed of convergence) was of secondary importance [5]. The tradeoff between
protocol overhead and efficiency is conventionally regulated by protocol timers. For instance,
hello packet is periodically exchanged between neighboring routers with the frequency
determined by HelloInterval, which limits the number of hello packets. When routers are
about to calculate the routing table using SPF algorithm, the calculation is delayed by
spfDelay and spfHold, expecting to acquire the most up-to-date RIB with fewer calculation.
With these timers taking effect, OSPF network normally converges in several tens of
seconds because the timers are often configured in the granularity of second. Given the advent
of real-time applications, significant attentions have been drawn to achieve fast convergence
to accommodate uninterrupted traffic delivery. There have been proposals that reducing
timers can achieve sub-second convergence [2], but it increases the processing overhead and
convergence dynamics which might impact the stability. Hence the controversy of fast
convergence and protocol stability requires continuous investigation.
In this paper we aim to understand OSPF convergence behavior, especially in presence of
multiple failures. In recent years, there has initiated a research trend in measuring and