Live Updating Operating Systems Using Virtualization
∗
Haibo Chen,Rong Chen,Fengzhe
Zhang,Binyu Zang
Parallel Processing Institute, Fudan University
{hb chen,chenrong,fzzhang,byzang}@fudan.edu.cn
Pen-Chung Yew
Department of Computer Science and Engineering,
University of Minnesota at Twin-Cities
yew@cs.umn.edu
Abstract
Many critical IT infrastructures require non-disruptive operations.
However, the operating systems thereon are far from perfect that
patches and upgrades are frequently applied, in order to close vul-
nerabilities, add new features and enhance performance. To miti-
gate the loss of availability, such operating systems need to provide
features such as live update through which patches and upgrades
can be applied without having to stop and reboot the operating sys-
tem. Unfortunately, most current live updating approaches cannot
be easily applied to existing operating systems: some are tightly
bound to specific design approaches (e.g. object-oriented); others
can only be used under particular circumstances (e.g. quiescence
states).
In this paper, we propose using virtualization to provide the live
update capability. The proposed approach allows a broad range of
patches and upgrades to be applied at any time without the require-
ment of a quiescence state. Moreover, such approach shares good
portability for its OS-transparency and is suitable for inclusion in
general virtualization systems. We present a working prototype,
LUCOS, which supports live update capability on Linux running
on Xen virtual machine monitor. To demonstrate the applicability
of our approach, we use real-life kernel patches from Linux ker-
nel 2.6.10 to Linux kernel 2.6.11, and apply some of those kernel
patches on the fly. Performance measurements show that our im-
plementation incurs negligible performance overhead: a less than
1% performance degradation compared to a Xen-Linux. The time
to apply a patch is also very minimal.
Categories and Subject Descriptors
K.6.3 [Management of Computing and Information Systems]: Soft-
ware Management—Software maintenance; D.4.5 [Operating
Systems ]: Reliability
General Terms
Reliability, Management, Design, Experimentation
∗
This research was funded by China National 973 Plan under grant num-
bered 2005CB321905.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. To copy otherwise, to republish, to post on servers or to redistribute
to lists, requires prior specific permission and/or a fee.
VEE’06
June 14–16, 2006, Ottawa, Ontario,Canada.
Copyright
c
° 2006 ACM 1-59593-332-6/06/0006. ..$5.00.
Keywords
Live Update, Virtualization, Operating System, Availability
1. Introduction
Patches and upgrades are a part of everyday life for a contemporary
operating system. Such patches and upgrades are frequently applied
in order to plug security holes, add new features and enhance
performance. Unfortunately, this process usually requires stopping
and restarting a running operating system, which could constitute
a major source of its loss of availability. However, for some long
running and mission-critical systems, any such disruption could be
expensive and intolerable [1]. They have to keep all tasks running
all the time, otherwise, risk dire consequences. Therefore, features
such as the live update capability [2] have become increasingly
important, because it could minimize the planned and unplanned
downtime in order to diminish the loss of availability.
Most modern operating systems are large and complex. To live
update such operating systems safely, several requirements are
identified in [3, 4, 5]. First, updatable units in an operating system
need to be easily defined. For an operating system using an object-
oriented approach such as K42, an object is a natural updatable
unit. Second, a quiescent state or a safe point [6] needs to be
detected or enforced before a dynamic patch could be applied.
Otherwise, the operating system may result in an inconsistent state.
This necessitates an efficient way to track the states of the operating
system, for example, using a reference counter to track the number
of threads executing in an updatable unit. Finally, an effective
approach is required to redirect invocations from the original unit
to the newly updated unit after a dynamic patch is applied.
However, most existing operating systems are not designed with
a live update capability in mind. First, they are usually implemented
using non-object-oriented approaches. Hence, function calls are of-
ten made directly rather than going through an indirection table,
making it difficult to redirect function calls. Moreover, they of-
ten lack well-defined boundaries among various components, pre-
venting component-level live updates. Second, they usually lack
the mechanism that supports safe points detection (e.g. reference
count). It makes a quiescent state detection either very time con-
suming or simply impractical. Furthermore, it is very rare for hot
spots in an operating system to enter a quiescent state in which live
updates can be safely applied. Examples include network modules
in a web server and a root file system module. A network mod-
ule is always busy receiving and sending packets, and a root file
system module cannot be unmounted while the operating system
is still running. Under such circumstances, emergency patches and
updates need to be indefinitely postponed, exposing the whole sys-
tem to possible attacks or corruption. Finally, even if such a safe
state could be reached and detected, due to the fact that the update
process is executing inside the operating system, it may trigger an
execution of the code in the patch program and result in a dead lock