FUNNEL：评估网络服务软件变更的影响

119 浏览量更新于2024-08-27 收藏 1.96MB PDF 举报

"FUNNEL: Assessing Software Changes in Web-based Services" 这篇研究论文“FUNNEL: Assessing Software Changes in Web-based Services”聚焦于在基于互联网的服务中评估软件变更的影响，特别是针对性能变化的检测。这对于运维团队至关重要，因为它使得当软件更新导致性能意外下降时，可以及时回滚更改。然而，手动检查数百万次的性能测量数据是不现实的。 FUNNEL（可能是“快速且鲁棒的性能评估工具”的缩写）是作者提出的一种自动化工具，旨在快速、稳健地识别软件发布后可能引发的性能问题。该工具的设计目标是解决大规模服务中的性能监控挑战，通过自动化手段提高问题发现的效率和准确性。论文的主要贡献可能包括以下几点： 1. **自动化性能分析**：FUNNEL提供了一种自动化的解决方案，能够处理大量性能数据，从而减少了人工干预的需求，提高了评估软件变更性能影响的速度。 2. **异常检测算法**：FUNNEL可能包含了一种或多种异常检测算法，用于识别性能指标的异常变化，这些变化可能是由于软件变更导致的。 3. **实时监控与反馈**：鉴于Web服务的实时性，FUNNEL可能会实时监测性能指标，并在检测到性能下降时立即触发警报或自动响应机制。 4. **鲁棒性与适应性**：由于Web服务的复杂性和动态性，FUNNEL设计为对各种类型和规模的软件变更具有鲁棒性，能适应不断变化的环境。 5. **决策支持**：FUNNEL可能还提供了决策支持功能，帮助运维团队判断是否需要回滚变更，以及何时进行回滚，从而减少服务中断和用户满意度下降的风险。 6. **实验验证与评估**：论文可能包括了使用真实世界的数据集进行的实验，以证明FUNNEL的有效性和效率，并与其他性能监控工具进行了对比。这篇研究论文为互联网服务的性能管理提供了一个创新的工具，通过自动化和智能化的方式提升了运维团队的响应能力，有助于维护服务的稳定性和用户体验。其方法和成果对于从事云服务、大数据处理和分布式系统运维的IT专业人士具有很高的参考价值。

1939-1374 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSC.2016.2539945, IEEE

Transactions on Services Computing

JOURNAL OF IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. X, NO. X, X 2016 3

2.1 Scope of Studied Software Changes

In this paper, we focus on two types of software changes on servers

in large Web-based services, software upgrades and conﬁguration

changes, for the following three reasons. (1) The operations

team typically care about the unexpected consequences that are

potentially due to these planned changes; (2) These changes are

controllable by the operations team via command line interfaces

and observable in logs; (3) We have observed that these two types

constitute more than 90% of the tens of thousands of software

changes in our data.

Software Upgrades. With the current rapid evolution of

the Internet, new features are continuously being deployed with

software upgrades. The operations team also conducts software

upgrades to ﬁx bugs or improve service performance. In a large

service, it is often the case that one software upgrade implements

multiple features or bug ﬁxes, and FUNNEL considers such a

software upgrade as a whole. FUNNEL decides whether the

whole software upgrade introduces any KPI change but does

not attempt to distinguish which individual feature or bug ﬁx

introduces KPI changes. In addition, the web services in our

scenario usually trigger a software upgrade when operators try

to patch the software, and thus we classify the patches as software

upgrades.

Conﬁguration Changes. Using command line interfaces, the

operations team can change the conﬁgurations by using speciﬁc

commands. The conﬁguration change can be in the operating sys-

tem (OS) or infrastructure software (e.g., a conﬁguration change

in Apache), service conﬁguration (e.g., an increase in the number

of threads in a service process by command lines), deployment

scale (e.g., an increase in the number of servers where a service

is deployed), or data source (e.g., an update to the strategy that

calculates the valid page view counts).

With above focuses, the following perspectives are out of

scope for this paper.

1) We do not consider the software changes on the network

devices such as routers and switches, which have been

already studied in depth in [6], [7], [14];

2) We do not consider software changes that were external

to the company, e.g., a change in a peer company, since

these changes might be invisible to the studied company’s

operations team.

3) We do not explicitly consider the interactions across

multiple concurrent or consecutive software changes on

a same server, which can be considered as one combined

change as a straw man approach.

More detailed studies along the last three directions are left as

future work.

2.2 KPI

In the studied Web-based services, there are hundreds of thousands

of servers providing various types of services. Each service (e.g.,

search, web mail, social networking) runs on one or more servers

with a speciﬁc process on each server. An instance denotes a

process of a speciﬁc service on a speciﬁc server. A KPI is a

performance metric of a given server/service/process. There are

three types of KPIs that need to be monitored for software changes

assessment: server KPIs, instance KPIs and service KPIs. The

operations team deploys an agent on each server to monitor the

status of each instance and collect the KPIs of all instances

Search engine service

Server 1 Server 2 Server n

Instance 1 Instance n

Instance

PVC

...

Instance 2

Instance

PVRD

Instance

AFC

...

Instance

PVC

Instance

PVRD

Instance

AFC

...

CPU

utilization

Memory

utilization

NIC

throughput

Service

PVC

Service

PVRD

Service

AFC

...

PVC = Page view count

PVRD = Page view response delay

ACF = Access failure count

KPI

Fig. 1. The relationship among service, server, instance and KPI

continuously. For example, immediately after the process serves

a customer with some Web page view, the page view count is

incremented and a new page view response delay is recorded.

In addition, by analyzing server log ﬁles that record the system

status, the agent is able to periodically collect server KPIs, such

as CPU utilization, memory utilization, and NIC throughput. A

service KPI is an aggregation of all instance KPIs in the service.

Fig. 1 shows an example of the relationship among service (search

engine service), server (server 1, server 2, ..., server n), instance

(instance 1, instance 2, ..., instance n) and KPI (page view count,

page view response delay, access failure count, CPU utilization,

memory utilization, and NIC throughput).

After collecting the measurements of KPIs of servers and

instances, the agent on each server delivers the measurements

via datacenter networks to a centralized Hadoop-based database,

which also stores the service KPIs aggregated based on the KPIs

of the instances. The database also provides a subscription tool for

other systems, such as FUNNEL, to periodically receive the sub-

scribed measurements based on the server, instance, and service.

The data collection interval at the servers is typically 1 minute,

and thus the time granularity of the input KPI measurements

of FUNNEL is 1 minute. If we set the time granularity of the

KPI measurements larger, the detection delay of FUNNEL will

be larger. Although the accuracy of FUNNEL may be impacted

by the time granularity of the measurements [18], the accuracy

of FUNNEL is relatively good (see Section 4.2) with the time

granularity of 1 minute, we set the time granularity of the KPI

measurements as 1 minute. Within one second, the measurements

subscribed by FUNNEL are pushed to FUNNEL via datacenter

networks.

In large services, there might exist some KPIs of dubious

quality. To the best of our knowledge, there is no previous work on

eliminating low-quality KPIs in Web-based services. In this paper,

we do not focus on eliminating low-quality KPIs either. FUNNEL

detects all KPI changes in the impact set regardless of the quality

of the KPI, and delivers the results to the operations team. The

operations team will then determine whether the performance

changes in the low-quality KPIs are induced by the software

change or not.

As described in [19], even the operations team does not know

exactly what is a “good” threshold for a speciﬁc KPI, e.g., due to

seasonal variation. Therefore, it is difﬁcult to obtain the expected

values from the operations team for FUNNEL, and we do not give

a threshold for any KPI in this paper.

剩余13页未读，继续阅读

weixin_38502693

粉丝: 8
资源: 908

FUNNEL：评估网络服务软件变更的影响

Funnel Sort:命令行排序实用程序-开源

deft-funnel：DEFT-FUNNEL：一种开放源代码全局优化求解器，用于解决Matlab中受约束的灰箱和黑箱问题

iwwa-funnel:火

sense-Funnel:Qlik Sense的d3漏斗图

elastic_funnel::bar_chart:分析工具可通过Elasticsearch的日志进行渠道可视化

node-messaging-funnel:将消息传递平台统一为一个API

funnel:智能客户支持服务台

recruit-funnel:在技​​术面试后帮助选择过程的应用程序

d3-funnel:一个JavaScript库，用于使用D3.js框架呈现漏斗图

mvp-funnel:为FAC网站建议的MVP渠道，旨在成为易于安装的插件包

最新资源

recruit-funnel:在技术面试后帮助选择过程的应用程序