IaaS云可用性敏感性分析：模型方法与修复策略

PDF格式 | 957KB | 更新于2024-08-25 | 16 浏览量 | 举报

"该资源是一篇研究论文，主要探讨了基于模型的IaaS（基础设施即服务）云可用性敏感性分析。文章通过建立单体模型和交互模型来研究不同的修复策略对云可用性的影响，并利用差分分析方法来分析交互模型的参数敏感性。" 在云计算领域，IaaS是一种关键的服务模式，它提供了如存储、计算和其他基础设施资源的按需访问。然而，云服务的可用性是用户关注的重要问题，因为任何服务中断都可能导致经济损失和客户满意度下降。这篇论文"基于模型的IaaS云可用性敏感性分析"深入研究了如何通过建模来理解和优化云服务的可用性。首先，作者为每种修复策略设计了一种单体模型和一种交互模型。单体模型通常简化了系统复杂性，用于快速理解系统的整体行为；而交互模型则考虑了组件间的相互依赖关系，更准确地反映了实际系统的行为。这两种模型的构建是理解云服务可用性的基础。接下来，论文验证了这两种模型的近似准确性，这一步对于确保分析结果的可信度至关重要。通过比较模型预测与实际数据，研究人员可以评估模型是否能有效地模拟云服务的运行情况。然后，论文使用交互模型评估了每种修复策略的有效性。修复策略的选择直接影响到云服务的恢复时间和可用性水平，因此，选择合适的策略对于提高云服务的整体性能至关重要。通过这种分析，可以为云服务提供商提供策略优化的依据。最后，论文采用了差分分析来分析交互模型的参数敏感性。这种方法可以帮助识别哪些参数对云服务的可用性影响最大，从而指导服务提供商调整系统参数以提高可用性。参数敏感性分析对于识别系统中的关键瓶颈和优化决策具有重要意义。这篇研究论文为云服务提供商提供了一套系统的方法来理解和改善IaaS云服务的可用性。通过模型建立、验证、策略评估和参数敏感性分析，论文为提升云服务的可靠性和用户体验提供了理论支持和实践指导。

展开

B. Liu et al. / Future Generation Computer Systems 83 (2018) 1–13 3

variety of pure performance models for cloud services were pro-

posed in the last few years. See [21] and references therein. These

models are complementary to our models to capture IaaS cloud

service behaviors. In the following we focus on the literature on

cloud availability analysis. In [22], cloud service availability was

evaluated from a user-centric point of view, unlike our work that

considers a cloud service provider’s point of view.

2.2. Sensitivity analysis

Sensitivity analysis allows the exposure of system QoS bottle-

neck as well as providing guidelines for the system optimization.

It could be divided into nonparametric and parametric sensitivity

analysis [23]. The first kind studies output variations caused by

modifications in the structure of a model (e.g., addition or removal

of a given component in a model). The second studies the output

variations due to a change in system parameter values. There are

several approaches for performing sensitivity analysis [11]. The

following presents three approaches to be used in this paper:

(i) Vary one parameter at a time within the considered range

while keeping the others constant and observe system measures

of interest with respect to the varying parameter. In order to

determine the parameters that cause the greatest impact on the

system QoS, simulations or numerical analysis for all parameters

in their defined ranges must be done.

(ii) Differential sensitivity analysis (also called directed

method). It computes the sensitivity of a given measure Y, which

depends on a specific parameter θ, as S

(Y ) =

∂Y

∂θ

, or SS

(Y ) =

∂Y

∂θ

for a scaled sensitivity. The sign of SS

denotes whether an

increase of θ causes a corresponding increase or instead a decrease

of the measure Y. Its absolute value indicates the magnitude of the

variations of Y for small variations of θ. This method is only suitable

for continuous parameters.

(iii) Sensitivity index. This technique is designed for integer-

valued parameters which are not properly evaluated by the dif-

ferential sensitivity analysis approach. The sensitivity formula is

(Y ) = 1 −

min{Y (θ)}

max{Y (θ)}

, where θ ∈ [θ

, θ

], min{Y (θ)} = min{Y (θ

Y (θ

) . . . Y (θ

)} and max{Y (θ)} = max{Y (θ

), Y (θ

) . . . Y (θ

)}.

Sensitivity analysis has been conducted in cloud systems.

In [23], the last two methods mentioned above were used for

sensitivity analysis of the availability of a virtualized system, which

was modeled as a continuous-time Markov chain (CTMC). The

authors in [24] studied a hierarchical model, which consisted of

several independent sub-models, each of which was modeled as

a CTMC. Thus, the overall system measure is the multiplication of

the measure of each sub-model. Then the sensitivity of the overall

system availability with respect to a system continuous parameter

could be obtained accordingly by calculating the overall availability

sensitivity with respect to each component and the component

availability sensitivity with respect to this parameter. But in our

hierarchical models, there exist complex interactions among sub-

models. It is hard, if not impossible, to compute the derivative of

the whole system measure with respect to any system parameter.

In Section 6, we show that although S

(Y ) of each parameter could

not be calculated, we could identify parameters which impact

system most significant by applying differential sensitivity analysis

method to each sub-model and then ignoring some parameters

with less impact on system QoS.

3. System description

In this paper, we assume that there are three PM pools (namely

hot, warm and cold) in a CDC. It is known that there exist several

types of failures in a cloud system such as software failures, hard-

ware failures and network failures [8]. This paper considers the

overall effect of these possible failures with an aggregated mean

time to failure (MTTF) [25,26]. Failure detection is assumed to be

an instantaneous event. PMs in the same pool have independent

and identical distributed TTFs. TTFs of hot, warm and cold PM

pools are exponentially distributed. As in [8], mean TTF rates are

assumed as λ

> λ

≫ λ

in this paper. Three possible reasons

for such assumption are as follows. It is known that software

execution could speed up hardware component failure, such as fan

and hard disk. In addition, software aging is unavoidable and then a

computer is forced to shut down if there is no active action to take.

The third is that a computer could generate corrupted files, which

could damage the computer hardware on the long term.

Upon failure of a hot PM, this failed PM is moved from the hot

PM pool to the pre-determined repair station for repair. Mean-

while, a PM available in the warm pool is moved to the hot pool.

When the warm pool is empty but there exists a PM available in the

cold pool, moving this PM to the hot pool is performed. Similarly,

when a warm PM fails, it is moved from the warm pool for repair

and a PM is moved from the cold pool to perform the role of this

warm PM. For each pool, if there is a PM moving from other pools

in order to play the role of a failed PM, this moving PM will return

to its original pool after the failed PM completes its repair. Time

to move a PM from one pool to another follows an exponential

distribution. PM repair activities are work conserving and repaired

PMs are as good as new. We consider two kinds of repair policies

as follows:

(1) Independent repair station (IRS). Each pool has its own repair

station. There is at least one repair facility in each station. Each

facility repairs a failed PM independently. A PM in a pool could

be repaired only by a repair facility of this pool’s repair station.

If the number of PMs in a pool to be repaired is larger than

the number of the corresponding repair facilities/servers, failed

PMs are placed in the corresponding waiting queue. Hot, warm

and cold PM mean repair times are exponentially distributed.

(2) Sharing repair station (SRS). The hot, warm and cold pools

share a single repair station. Failed hot PMs have the repair

priority over the failed PMs of the other pools, while failed

warm PMs have priority over cold failed PMs. The priority is

non-preemptive. Similar to previous policy, PM repair time is

exponentially distributed.

Table 1 summarizes definitions of system input parameters

to be used in the following sections. n

, n

and n

are design parameters, but MTTF, MTTR and MTTM values could

be experimentally measured. Note that we try to use notations

similar to those used in [8] in order to highlight the difference of

our models from those in [8] and then indicate the challenges of

modeling in this paper.

4. System models under SRS policy

This section first presents monolithic SRN model under SRS

repair policy. Then the corresponding scalable interacting SRN sub-

models are given.

4.1. Monolithic SRN model

Fig. 1 shows the monolithic SRN model for the availability

analysis of IaaS cloud under SRS repair policy. The numbers of

tokens in places P

, P

and P

represent the number of non-failed

PMs in hot, warm and cold pools respectively. The firing of each of

transitions T

bwhf

, T

bchf

and T

represents the failure event of a hot

PM. That is, there are three cases that will occur when a hot PM

fails:

Case (F1) A non-failed warm PM is available for moving to the

hot pool, represented by firing T

bwhf

;

下载后可阅读完整内容，剩余12页未读，立即下载

身份认证购VIP最低享 7 折!

30元优惠券

weixin_38747566

粉丝: 12

IaaS云可用性敏感性分析：模型方法与修复策略

中国云计算IaaS市场综合分析.pdf

基于云计算的大数据统一分析平台设计与应用.pdf

基于云计算的网络平台共享资源模型构建探析.docx

揭秘云服务模型：掌握IaaS、PaaS、SaaS的区别及应用策略

云计算服务模型深度解析：掌握IaaS, PaaS, SaaS的秘诀

云计算服务模型解析：IaaS、PaaS、SaaS的异同与选择，满足不同业务需求

云计算服务模型深度解析：IaaS_PaaS_SaaS的优化选择与管理秘籍

【云计算全面解析】：从IaaS到SaaS，掌握云服务模型的关键要点

网络云服务演变：IaaS到SaaS，云化网络服务的历程与趋势

【云计算选择指南】：企业如何从IaaS到SaaS中智慧选云

最新资源