云中的bug探析：一项针对3000+问题的研究

需积分: 9 125 浏览量更新于2024-09-08 收藏 321KB PDF 举报

"这篇研究主要探讨了云系统中的问题，对6个重要的云系统（包括Hadoop MapReduce、HDFS、HBase、Cassandra、ZooKeeper和Flume）的21,399个提交的问题进行了审查，特别关注了3655个对部署有实际影响的‘关键’问题，并进行了深入的分类分析。这项为期一年的研究成果被称为CloudBugStudy数据库（CBSDB），提供了许多关于云系统的独特见解，是迄今为止针对云系统进行的最大规模的问题研究。" 在当前数字化时代，云计算已经成为企业和个人数据处理和存储的核心。然而，随着云服务的广泛采用，其背后的问题和挑战也日益凸显。"What Bugs in the Cloud?" 这项研究聚焦于云系统中的bug及其影响，旨在深入理解这些系统在开发和部署过程中遇到的困难，为未来的设计和优化提供指导。首先，研究涵盖了2011年至2014年间的21,399个提交的问题，这表明云系统的复杂性和持续的改进需求。这些问题可能涉及软件错误、性能瓶颈、安全漏洞、兼容性问题等，每个都可能对用户服务的稳定性和效率产生重大影响。其次，研究者从这些问题中挑选出3655个"关键"问题进行深度分析。这些"关键"问题是指那些直接影响到云系统部署的问题，可能涉及到系统崩溃、数据丢失或服务中断等严重后果。通过详细的分类，研究者能够识别出问题的模式，了解哪些类型的bug更常见，以及它们是如何产生的。此外，CloudBugStudy数据库（CBSDB）的建立为后续研究提供了宝贵的资源。研究人员可以利用这个数据库来研究云系统的故障模式，开发者则能从中学习如何预防和解决类似的问题，从而提高云服务的可靠性和稳定性。在介绍部分，研究动机被强调为云计算的成熟。随着越来越多的可扩展服务被开发，理解和解决云环境中的问题变得至关重要。这项工作不仅有助于学术界深化对云系统复杂性的理解，也有助于业界提升云服务的质量和用户体验。 "What Bugs in the Cloud?" 这项研究揭示了云系统中普遍存在的问题类型，以及这些问题如何影响实际部署。通过对大量问题的分析，该研究提供了对云系统设计、维护和优化的洞察，对于确保云服务的可靠性和安全性具有重要意义。同时，它也为未来研究提供了基础数据，促进了云计算领域的持续发展。

Appears in the Proceedings of the 5th ACM Symposium on Cloud Computing (SoCC ’14)

2 Methodology

In this section, we describe our methodology, speciﬁcally our

choice of target systems, the base issue repositories, our issue

classiﬁcations and the resulting database.

• Target Systems: To perform an interesting cloud bug study

paper, we select six popular and important cloud systems

that represent a diverse set of system architectures: Hadoop

MapReduce [3] representing distributed computing frame-

works, Hadoop File System (HDFS) [6] representing scal-

able storage systems, HBase [

4] and Cassandra [1] represent-

ing distributed key-value stores (also known as No SQL sys-

tems), ZooKeeper [5] rep resenting synchronization services,

and ﬁnally Flume [

2] representin g streaming systems. T hese

systems are referred with different names (e.g., data-center

operating systems, IaaS/SaaS). For simplicity, we re fer them

as cloud systems.

• Issue Repositories: The development projects of our tar-

get systems are all hosted under Apache Software Founda-

tion Projects [

8] wherein each of them ma intains a highly

organized issue re pository.

Each repository contains devel-

opment and deployment issues submitted mostly b y the de-

velopers or sometimes by a larger user community. The term

“issue” is used here to represent both bugs and new features.

For every issue, the repository stores many “raw” labels,

among which we ﬁnd useful are: issue date, time to resolve

(in days) , bug priority level, patch availability, and numbe r

of developer responses. We download raw labels automati-

cally using the provided web API. Regarding the #responses

label, each issue contains developer re sponses that provide a

wealth of information for understanding the issue; a complex

issue or hard-to-ﬁnd bug typically has a long discussion. Re-

garding the bug priority label, there are ﬁve priorities: trivial,

minor, major, critical, and blo cker. For simplicity, we label

the ﬁrst two as “minor” and the last three as “major”. Al-

though we an a lyze all issues in our work, we only focus on

major issues in this paper.

• Issue Classiﬁcations: To perform a meaningful study of

cloud issues, we introduce several issue classiﬁcations as dis-

played in Table

1. The ﬁrst classiﬁcation tha t we perform is

based on issue type (“miscellaneous” vs. “vital”). Vital issues

pertain to system development and deployment problems that

are marked with a major priority. Miscellaneou s issues rep-

resent non-vital issues (e.g., code mainten a nce, refactoring,

unit tests, documen ta tion). Real bugs that are easy to ﬁx (e.g.,

few line ﬁx) tend to be labeled as a minor issue and hence are

also marked as miscellaneous by us. We had to manually add

our own issue-type classiﬁcation because the major/minor

Hadoop MapReduce in particular has two repositories (Hadoop and

MapReduce). The ﬁrst one contains mostly development infrastructure (e.g.,

UI, library) while the second one contains system issues. We use the latter.

Classiﬁcation

Labels

Issue Type Vital, miscellaneous.

Aspect Reliability, performance, availability, security,

consistency, scalability, topology, QoS.

Bug scope Single machine, multiple machines, entire cluster.

Hardware Core/processor, disk, memory, network, node.

HW Failure Corrupt, limp, stop.

Software Logic, error handling, optimization, conﬁg, race,

hang, space, load.

Implication Failed operation, performance, component down-

time, data loss, data staleness, data corruption.

Per-component Labels

Cassandra: Anti-entropy, boot, client, commit log, compaction,

cross system, get, gossiper, hinted handoff, IO, memtable, migra-

tion, mutate, partitioner, snitch, sstable, streaming, tombstone.

Flume: Channel, collector, conﬁg provider, cross system, mas-

ter/supervisor, sink, source.

HBase: Boot, client, commit log, compaction, coprocessor, cross

system, fsck, IPC, master, memstore ﬂush, namespace, read, region

splitting, log splitting, region server, snapshot, write.

HDFS: Boot, client, datanode, fsck, HA, journaling, namenode,

gateway, read, replication, ipc, snapshot, write.

MapReduce: AM, client, commit, history server, ipc, job tracker,

log, map, NM, reduce, RM, security, shufﬂe, scheduler, speculative

execution, task tracker.

Zookeeper: Atomic broadcast, client, leader election, snapshot.

Table 1: Issue Classiﬁcations and Component L ab el s.

raw labels do not sufﬁce; many miscellaneou s issues are also

marked as “major” by the developers.

We carefully read ea ch issue (the discussion, patches, etc.)

to decide whether the issue is vital. If an issue is vital we

proceed with further classiﬁcations, otherwise it is labeled

as m iscellaneous and skipped in our study. This pape r only

presents ﬁndings from vital issues.

For every vital issue, we introduce aspect labels (m ore in

3). If the issue involves har dware problems, then we add

informa tion about the hard wa re ty pe and failure mode (§

5).

Next, we pinp oint the software bug types (§ 6). Finally, we

add implicatio n labels (§

7). As a note, an issue can have

multiple aspect, hardware, software, and implication labels.

Interestingly, we ﬁnd that a bug can simultaneously affects

multiple machines or even the entire cluster. For this purpose,

we use bug scope labels (§

4). In addition to generic classiﬁ-

cations, we also add per-component labels to mark where the

bugs live; this enables more interesting analysis (§

8).

• Cloud Bug Study DB (CBSDB): The product of our c las-

siﬁcations is stored in CBSDB, a set of raw text ﬁles, data

mining scripts and graph utilities [9], which ena bles us (and

future CBSDB users) to perform both quantitative and qu al-

itative analysis of cloud issues.

As shown in Figure

1a, CBSDB contains a total of 21,399

issues submitted over a period of three y e ars (1/1/2011-

1/1/2014) which we analyze one by one. The majority of

the issues are miscellaneous issues (83% on average across

剩余13页未读，继续阅读

magicmae

粉丝: 0
资源: 2

云中的bug探析：一项针对3000+问题的研究

Kubernetes in Action-Manning Publications (2018)

The Problem With Software: Why Smart Engineers Write Bad Code

Exploring the Application and Advantages of kkfileview in Cross-platform Development

macOS_Sequoia_15.1.password(imacos.top).rdr.split.016

【java毕业设计】小区物业管理系统（springboot+vue+mysql+说明文档）.zip

里面全部都是浪漫的爱心特效，有html和python编写的，大概几十种，欢迎下载，收藏

Delphi 12 控件之FUPX-32bit-PORTABLE.zip

HandyControl

macOS_Sequoia_15.1.password(imacos.top).rdr.split.049

“雅乐”私人牙科诊所管理系统的设计与实现ssm.zip

最新资源