BigOP：构建全面大数据工作负载的基准框架

34 浏览量更新于2024-08-29 收藏 603KB PDF 举报

"BigOP: Generating Comprehensive Big Data Workloads as a Benchmarking Framework" 是一篇针对大数据系统性能评估的重要研究论文。随着大数据被广泛视为公司、组织乃至国家的宝贵资产，确保数据转化为实际财富依赖于强大的大数据存储和处理系统。市场上涌现了众多商业和开源产品，为满足不同用户的需求提供了选择。然而，对于大数据系统开发者来说，一个关键挑战是如何根据广泛的big data处理需求来评估他们的系统。现有的大数据基准测试存在局限性，要么无法全面反映各类大数据处理需求的多样性，要么过于侧重于特定的"热点"场景。BigOP框架应运而生，旨在解决这个问题。它作为一种新的基准测试框架，旨在生成全面且具有代表性的大数据工作负载，以便更准确地衡量和比较不同大数据系统在处理复杂任务如数据挖掘、实时分析、机器学习和数据流处理等方面的能力。 BigOP框架的设计涵盖了多种关键特性，包括但不限于： 1. **工作负载多样性**：框架通过创建一系列涵盖不同类型的数据集（结构化、半结构化和非结构化）、处理任务复杂度以及数据量级的组合，确保基准测试能够覆盖现实生活中的各种大数据场景。 2. **灵活性与可扩展性**：BigOP允许定制化的测试配置，以便模拟不同规模的数据处理需求，同时适应未来技术发展带来的变化。 3. **真实场景模拟**：该框架不仅关注基础操作，还着重于模拟实际应用中的数据处理流程，如数据清洗、整合、查询优化等，以评估系统的整体效能。 4. **性能指标**：BigOP定义了一套全面的性能指标，包括但不限于吞吐量、延迟、资源利用率、并发能力等，帮助开发者和用户深入了解系统的性能表现。 5. **开放性和可复现性**：作为开源项目，BigOP鼓励社区参与和贡献，以促进业界对大数据基准测试标准的共识，确保结果的可靠性和一致性。 BigOP论文提出了一种创新的方法，通过生成全面的大数据工作负载，为大数据系统的开发和评估提供了一个更为客观、公正和实用的基准测试工具。这对于驱动大数据技术的发展，提高系统设计质量，以及帮助企业做出最佳技术决策具有重要意义。

BigOP: Generating Comprehensive Big Data

Workloads as a Benchmarking Framework

Yuqing Zhu

⋆

, Jianfeng Zhan, Chuliang Weng

♯

, Raghunath Nambiar

⋄

, Jinchao

Zhang, Xingzhen Chen, and Lei Wang

State Key Lab oratory of Computer Architecture (Institute of Computing Technology,

Chinese Academy of Sciences),

♯

Huawei,

⋄

Cisco

{zhuyuqing, zhanjianfeng, zhangjinchao, chenxingzhen, wanglei 2011}@ict.ac.cn,

♯

chuliang.weng@huawei.com,

⋄

RNambiar@cisco.com

Abstract. Big Data is considered proprietary asset of companies, orga-

nizations, and even nations. Turning big data into real treasure requires

the support of big data systems. A variety of commercial and open source

pro ducts have been unleashed for big data storage and processing. While

big data users are facing the choice of which system best suits their need-

s, big data system developers are facing the question of how to evaluate

their systems with regard to general big data processing needs. System

b enchmarking is the classic way of meeting the above demands. However,

existent big data benchmarks either fail to represent the variety of big

data processing requirements, or target only one speciﬁc platform, e.g.

Hado op.

In this paper, with our industrial partners, we present BigOP, an end-

to-end system benchmarking framework, featuring the abstraction of

representative Operation sets, workload Patterns, and prescribed tests.

BigOP is part of an open-source big data benchmarking project, Big-

DataBench

. BigOP’s abstraction model not only guides the develop-

ment of BigDataBench, but also enables automatic generation of tests

with comprehensive workloads.

We illustrate the feasibility of BigOP by implementing an automatic test

generation to ol and benchmarking against three widely used big data

pro cessing systems, i.e. Hadoop, Spark and MySQL Cluster. Three tests

targeting three diﬀerent application scenarios are prescribed. The tests

involve relational data, text data and graph data, as well as all operations

and workload patterns. We report results following test speciﬁcations.

1 Introduction

Companies, organizations and countries are taking big data as their important

assets, as the era of big data has inevitably arrived. But drawing insights from

big data and turning big data into real treasure demand an in-depth extraction of

its values, which heavily relies up on and hence boosts the deployment of massive

big data systems.

Big data owners are facing the problem of how to choose the right system for

their big data processing requirements, while a variety of commercial and open

⋆

The corresp onding author.

BigDataBench is available at http://prof.ict.ac.cn/BigDataBench

下载后可阅读完整内容，剩余9页未读，立即下载

weixin_38632825

粉丝: 3
资源: 947

BigOP：构建全面大数据工作负载的基准框架

2015 Cadence Coverage Workshop: Generating & Analyzing Code Coverage Metrics

"PRIVANET：车载Ad-Hoc网络中高效的假名管理框架

"基于LabVIEW的虚拟仪器信号发生器设计及应用分析

论文MOJITALK: Generating Emotional Responses at Scale数据集

Sine PWM:Generating Sine PWM.Triangular wave generator is not used from library browser-matlab开发

Generating Any Levels Le Gall 5/3 Cdf 5/3 Wavelet Matrix Using Whole Point Symmetric Padding:Generating Any Levels Le Gall 5/3 Cdf 5/3 Wavelet Matrix Using Whole Point Symmetric Padding-matlab开发

Signal Generation in MATLAB: Generating Sine Waves, Square Waves, and Pulse Signals

Monte Carlo Simulation in MATLAB: A Comprehensive Guide from Beginner to Expert

Unveiling the Doris Database Architecture: A Comprehensive Analysis from Storage to Querying

【MATLAB Genetic Algorithm: From Beginner to Expert】: A Comprehensive Guide to Master Genetic ...

最新资源