虚拟化云系统性能异常预测与预防：PREPARE系统

需积分: 1 91 浏览量更新于2024-09-11 收藏 305KB PDF 举报

"PREPARE: Predictive Performance Anomaly Prevention for Virtualized Cloud Systems" 【ICDC2012 BEST PAPER】这篇论文是2012年国际数据中心大会(ICDCS)的最佳论文，主要关注虚拟化云系统中的性能异常预测与预防。论文作者包括Yongmin Tan、Hiep Nguyen、Zhiming Shen、Xiaohui Gu（来自北卡罗来纳州立大学）以及Chitra Venkatramani和Deepak Rajan（来自IBM T.J. Watson Research Center）。他们的工作重点在于解决由于资源竞争、软件错误和硬件故障等因素导致的云环境性能问题。论文提出了一种名为PREPARE的新颖系统，即Predictive Performance Anomaly Prevention（预测性性能异常预防）。该系统旨在为虚拟化云计算基础设施提供自动的性能异常预防功能，目标是在无需人工干预的情况下最小化性能异常的影响。 PREPARE的核心在于结合了在线异常预测、基于学习的原因推断以及预测性预防执行这三个关键组件。在线异常预测通过实时监控和分析系统状态，预测可能出现的性能下降情况；学习基础的原因推断利用机器学习技术，分析出引起性能异常的潜在原因；预测性预防执行则根据预测结果，自动采取措施防止性能异常的发生。在实现方面，PREPARE构建在Xen虚拟化平台上，并在北卡罗来纳州立大学的虚拟计算实验室中进行了测试。实验中使用了一个商业数据流处理系统（IBM System S）和一个在线拍卖基准（RUBiS）来模拟真实世界的应用场景。通过这些实验，论文展示了PREPARE的有效性，证明了它能有效地降低性能异常对系统的影响。这篇获奖论文为云环境的性能优化提供了一种前瞻性的解决方案，通过预测和自动预防性能问题，提高了虚拟化云系统的稳定性和可靠性。这一工作对于理解如何管理和维护大规模虚拟化环境，以及在实际应用中防止因性能问题导致的服务中断具有重要意义。

PREPARE: Predictive Performance Anomaly

Prevention for Virtualized Cloud Systems

Yongmin Tan, Hiep Nguyen, Zhiming Shen, Xiaohui Gu

North Carolina State University

Raleigh, NC, USA

Email: {ytan2,hcnguye3,zshen5}@ncsu.edu, gu@csc.ncsu.edu

Chitra Venkatramani, Deepak Rajan

IBM T. J. Watson Research

Hawthorne, NY, USA

Email: {chitrav,drajan}@us.ibm.com

Abstract—Virtualized cloud systems are prone to perfor-

mance anomalies due to various reasons such as resource

contentions, software bugs, and hardware failures. In this

paper, we present a novel PREdictive Performance Anomaly

pREvention (PREPARE) system that provides automatic per-

formance anomaly prevention for virtualized cloud computing

infrastructures. PREPARE integrates online anomaly predic-

tion, learning-based cause inference, and predictive prevention

actuation to minimize the performance anomaly penalty

without human intervention. We have implemented PREPARE

on top of the Xen platform and tested it on the NCSU’s Virtual

Computing Lab using a commercial data stream processing

system (IBM System S) and an online auction benchmark

(RUBiS). The experimental results show that PREPARE can

effectively prevent performance anomalies while imposing low

overhead to the cloud infrastructure.

Index Terms—performance anomaly prevention, online

anomaly prediction, cloud computing

I. INTRODUCTION

Infrastructure-as-a-Service (IaaS) cloud systems [1] al-

low users to lease resources in a pay-as-you-go fashion.

Cloud systems provide application service providers (ASPs)

with a more cost-effective solution than in-house computing

by obviating the need for ASPs to own and maintain a

complicated physical computing infrastructure. Since cloud

systems are often shared by multiple users, virtualization

technologies [2], [3] are used to achieve isolation among

different users. However, applications running inside the

cloud are prone to performance anomalies due to various

reasons such as resource contentions, software bugs, and

hardware failures. Although application developers often

perform rigorous debugging ofﬂine, many tough bugs only

manifest during large-scale runs. It will be a daunting

task for system administrators to manually keep track of

the execution status of many virtual machines (VMs) all

the time. Moreover, manual diagnosis can cause prolonged

service level objective (SLO) violation time, which is often

associated with big ﬁnancial penalty.

It is challenging to diagnose and prevent performance

anomalies in virtualized cloud computing environments.

First, the application running inside the IaaS cloud often

appears as a black-box to the cloud service provider,

which makes it infeasible to obtain detailed measurements

about the application and apply previous intrusive diagnosis

techniques. Second, the cloud management system wishes

to automatically prevent any performance anomaly in order

to minimize the ﬁnancial penalty. As a result, traditional

reactive anomaly management is often insufﬁcient.

In this paper, we present a novel PREdictive Performance

Anomaly pREvention (PREPARE) system for virtualized

cloud systems. PREPARE integrates online anomaly pre-

diction and virtualization-based prevention techniques (e.g.,

elastic resource scaling [4], [5] and live VM migration [6])

to automatically prevent performance anomalies in cloud

systems. PREPARE applies statistical learning algorithms

over system-level metrics (e.g., CPU, memory, network

I/O statistics) to achieve two objectives: 1) early anomaly

detection that can raise advance anomaly alerts before

a performance anomaly happens; and 2) coarse-grained

anomaly cause inference that can pinpoint faulty VMs

and infer the system metrics that are related to the per-

formance anomaly. Based on the informative prediction

results, PREPARE leverages virtualization technologies to

perform VM perturbations for automatically preventing per-

formance anomalies. PREPARE also performs false alarm

ﬁltering and prevention effectiveness validation to cope

with online anomaly prediction errors. Speciﬁcally, this

paper makes the following contributions:

• We present PREPARE, a prediction-driven perfor-

mance anomaly prevention system for virtualized

cloud computing infrastructures. PREPARE is non-

intrusive and application-agnostic, which can be read-

ily applied to any application running inside the IaaS

cloud.

• We show how to achieve accurate and informative

online anomaly prediction using only system-level

metrics by integrating the 2-dependent Markov chain

model with the tree-augmented Bayesian networks

(TAN) model.

• We introduce several prevention validation schemes to

cope with online anomaly prediction errors.

We have implemented a prototype of PREPARE on top

of the Xen platform [2]. We have deployed and tested

PREPARE on NCSU’s virtual computing lab (VCL) [7]

that operates in a similar way as Amazon EC2 [1]. We

conducted extensive experiments by running real distributed

下载后可阅读完整内容，剩余9页未读，立即下载

xiaofeng081716

粉丝: 0
资源: 2

虚拟化云系统性能异常预测与预防：PREPARE系统

基于随机决策树的抗噪数据流概念漂移分类方法

Linux定时启动CDC闲置订阅脚本及监控

icdc-codebase:综合犬类数据共享（ICDC）项目是FNL的ADRD与BIDS理事会之间的一项联合项目，旨在为NCI的DCTD组开发ICDC。

matplotlib-3.6.3-cp39-cp39-linux_armv7l.whl

numpy-2.0.1-cp39-cp39-linux_armv7l.whl

基于springboot个人公务员考试管理系统源码数据库文档.zip

onnxruntime-1.13.1-cp310-cp310-win_amd64.whl

基于springboot的西山区家政服务网站源码数据库文档.zip

Linux环境下，关于C++静态库的封装和调用代码

基于springboot软件技术交流平台源码数据库文档.zip

最新资源