PREPARE: Predictive Performance Anomaly
Prevention for Virtualized Cloud Systems
Yongmin Tan, Hiep Nguyen, Zhiming Shen, Xiaohui Gu
North Carolina State University
Raleigh, NC, USA
Email: {ytan2,hcnguye3,zshen5}@ncsu.edu, gu@csc.ncsu.edu
Chitra Venkatramani, Deepak Rajan
IBM T. J. Watson Research
Hawthorne, NY, USA
Email: {chitrav,drajan}@us.ibm.com
Abstract—Virtualized cloud systems are prone to perfor-
mance anomalies due to various reasons such as resource
contentions, software bugs, and hardware failures. In this
paper, we present a novel PREdictive Performance Anomaly
pREvention (PREPARE) system that provides automatic per-
formance anomaly prevention for virtualized cloud computing
infrastructures. PREPARE integrates online anomaly predic-
tion, learning-based cause inference, and predictive prevention
actuation to minimize the performance anomaly penalty
without human intervention. We have implemented PREPARE
on top of the Xen platform and tested it on the NCSU’s Virtual
Computing Lab using a commercial data stream processing
system (IBM System S) and an online auction benchmark
(RUBiS). The experimental results show that PREPARE can
effectively prevent performance anomalies while imposing low
overhead to the cloud infrastructure.
Index Terms—performance anomaly prevention, online
anomaly prediction, cloud computing
I. INTRODUCTION
Infrastructure-as-a-Service (IaaS) cloud systems [1] al-
low users to lease resources in a pay-as-you-go fashion.
Cloud systems provide application service providers (ASPs)
with a more cost-effective solution than in-house computing
by obviating the need for ASPs to own and maintain a
complicated physical computing infrastructure. Since cloud
systems are often shared by multiple users, virtualization
technologies [2], [3] are used to achieve isolation among
different users. However, applications running inside the
cloud are prone to performance anomalies due to various
reasons such as resource contentions, software bugs, and
hardware failures. Although application developers often
perform rigorous debugging offline, many tough bugs only
manifest during large-scale runs. It will be a daunting
task for system administrators to manually keep track of
the execution status of many virtual machines (VMs) all
the time. Moreover, manual diagnosis can cause prolonged
service level objective (SLO) violation time, which is often
associated with big financial penalty.
It is challenging to diagnose and prevent performance
anomalies in virtualized cloud computing environments.
First, the application running inside the IaaS cloud often
appears as a black-box to the cloud service provider,
which makes it infeasible to obtain detailed measurements
about the application and apply previous intrusive diagnosis
techniques. Second, the cloud management system wishes
to automatically prevent any performance anomaly in order
to minimize the financial penalty. As a result, traditional
reactive anomaly management is often insufficient.
In this paper, we present a novel PREdictive Performance
Anomaly pREvention (PREPARE) system for virtualized
cloud systems. PREPARE integrates online anomaly pre-
diction and virtualization-based prevention techniques (e.g.,
elastic resource scaling [4], [5] and live VM migration [6])
to automatically prevent performance anomalies in cloud
systems. PREPARE applies statistical learning algorithms
over system-level metrics (e.g., CPU, memory, network
I/O statistics) to achieve two objectives: 1) early anomaly
detection that can raise advance anomaly alerts before
a performance anomaly happens; and 2) coarse-grained
anomaly cause inference that can pinpoint faulty VMs
and infer the system metrics that are related to the per-
formance anomaly. Based on the informative prediction
results, PREPARE leverages virtualization technologies to
perform VM perturbations for automatically preventing per-
formance anomalies. PREPARE also performs false alarm
filtering and prevention effectiveness validation to cope
with online anomaly prediction errors. Specifically, this
paper makes the following contributions:
• We present PREPARE, a prediction-driven perfor-
mance anomaly prevention system for virtualized
cloud computing infrastructures. PREPARE is non-
intrusive and application-agnostic, which can be read-
ily applied to any application running inside the IaaS
cloud.
• We show how to achieve accurate and informative
online anomaly prediction using only system-level
metrics by integrating the 2-dependent Markov chain
model with the tree-augmented Bayesian networks
(TAN) model.
• We introduce several prevention validation schemes to
cope with online anomaly prediction errors.
We have implemented a prototype of PREPARE on top
of the Xen platform [2]. We have deployed and tested
PREPARE on NCSU’s virtual computing lab (VCL) [7]
that operates in a similar way as Amazon EC2 [1]. We
conducted extensive experiments by running real distributed