大数据分析：最新工具与技术探索

Philip

spss

需积分: 10 114 浏览量更新于2024-07-22 收藏 1.89MB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"大数据分析" 大数据分析是信息技术领域中一个至关重要的概念，特别是在当前数字化时代，海量数据的产生和处理已经成为企业、政府以及各种组织的重要挑战。这篇由The Data Warehousing Institute (TDWI)的专家Philip Russom撰写的研究报告，详细探讨了2011年第四季度的大数据分析工具和技术的最新进展。报告首先介绍了大数据分析的重要性，将其定义为一种发现性的使命，旨在通过深入挖掘数据中的模式、趋势和关联，为企业决策提供洞察力。大数据分析的关键特征通常概括为三个V：Volume（大量）、Velocity（高速）和Variety（多样），这三个维度共同构成了大数据的复杂性。报告指出，将大数据与分析技术相结合的原因在于，随着数据量的爆炸式增长，传统的数据处理方式已经无法满足需求。大数据分析能够处理结构化、半结构化和非结构化数据，帮助用户从海量信息中提取价值。报告详细阐述了大数据分析的现状，包括其采纳情况、所带来的益处以及面临的障碍。在采纳方面，报告指出许多组织已经开始采用大数据分析来提升业务效率，优化运营，并驱动创新。这些益处包括但不限于：提高决策质量、发现新的市场机会、改进客户体验和增强竞争优势。然而，也存在一些挑战，如数据质量问题、技术复杂性、安全和隐私问题，以及组织内部对于大数据分析的认知和接受程度。报告还讨论了大数据的组织问题，包括所有权和控制权。大数据分析往往需要跨部门合作，因此，明确的领导和协调机制是成功实施的关键。此外，报告提到了技术和人才的需求，强调了具备数据分析技能的专业人士对于推动大数据项目成功至关重要。这篇报告为读者提供了关于大数据分析的全面视图，涵盖了其定义、价值、实施挑战以及组织策略等多个层面，对于理解和应用大数据分析具有很高的参考价值。对于IT专业人士来说，这份报告提供了一个深入了解大数据分析如何影响和改变业务运作的窗口，有助于他们更好地规划和执行大数据策略。

资源详情

资源推荐

6 T DW I RESE A RCH

BIG DATA A N A LY T I C S

natural language processing, text analytics, articial intelligence, and so on. It’s quite an arsenal of

tool types, and savvy users get to know their analytic requirements before deciding which tool type is

appropriate to their needs.

All these techniques have been around for years, many of them appearing in the 1990s. e

dierence today is that far more user organizations are actually using them. at’s because most of

these techniques adapt well to very large, multi-terabyte data sets with minimal data preparation.

at brings us to big data.

Dening Big Data Via the Three Vs

Most denitions of big data focus on the size of data in storage. Size matters, but there are other

important attributes of big data, namely data variety and data velocity. e three Vs of big data

(volume, variety, and velocity) constitute a comprehensive denition, and they bust the myth that

big data is only about data volume. In addition, each of the three Vs has its own ramications for

analytics.

(See Figure1.)

Figure1. e three Vs of big data

Data volume as a dening attribute of big data.

It’s obvious that data volume is the primary attribute of big data. With that in mind, most people

dene big data in terabytes—sometimes petabytes. For example, a number of users interviewed by

TDWI are managing 3 to 10 terabytes (TB) of data for analytics. Yet, big data can also be quantied

by counting records, transactions, tables, or les. Some organizations nd it more useful to quantify

big data in terms of time. For example, due to the seven-year statute of limitations in the U.S., many

rms prefer to keep seven years of data available for risk, compliance, and legal analysis.

e scope of big data aects its quantication, too. For example, in many organizations, the

data collected for general data warehousing diers from data collected specically for analytics.

Dierent forms of analytics may have dierent data sets. Some analytic practices lead a business

analyst or similar user to create ad hoc analytic data sets per analytic project. en, there’s the

entire enterprise, which in toto has its own, even larger scope of big data. Furthermore, each of these

Big data isn’t just about

data volume.

The scope of big data

varies widely.

VOLUME

VELOCITY VARIETY

•

Terabytes

•

Records

•

Transactions

•

Tables, les

•

Structured

•

Unstructured

•

Semistructured

•

All the above

•

Batch

•

Near time

•

Real time

•

Streams

3 Vs of

Big Data

ese denitions of big data were originally developed in TDWI blog posts, available at tdwi.org/blogs/philip-russom.

tdwi.org 7

Introduction

quantications of big data grows continuously. All this makes big data for analytics a moving target

that’s tough to quantify.

USER STORY THERE ARE VARIOUS WAYS TO QUANTIFY BIG DATA.

TDWI asked a user how many terabytes he’s managing for analytics, and he said: “I don’t know, because I don’t

have to worry about storage. IT provides it generously, and I tap it like crazy.” Another user said: “We don’t count

terabytes. We count records. My analytic database for quality assurance alone has 3 billion records. There’s

another 3 billion in other analytic databases.”

Data type variety as a dening attribute of big data.

One of the things that makes big data really big is that it’s coming from a greater variety of sources

than ever before. Many of the newer ones are Web sources, including logs, clickstreams, and social

media. Sure, user organizations have been collecting Web data for years. But, for most organizations,

it’s been a kind of hoarding. We’ve seen similar untapped big data collected and hoarded, such as

RFID data from supply chain applications, text data from call center applications, semistructured

data from various business-to-business processes, and geospatial data in logistics. What’s changed is

that far more users are now analyzing big data instead of merely hoarding it. e few organizations

that have been analyzing this data now do so at a more complex and sophisticated level. Big data isn’t

new, but the eective analytical leveraging of big data is.

e recent tapping of these sources for analytics means that so-called structured data (which

previously held unchallenged hegemony in analytics) is now joined by unstructured data (text

and human language) and semistructured data (XML, RSS feeds). ere’s also data that’s hard to

categorize, as it comes from audio, video, and other devices. Plus, multidimensional data can be

drawn from a data warehouse to add historic context to big data. at’s a far more eclectic mix of

data types than analytics has ever seen. So, with big data, variety is just as big as volume. In addition,

variety and volume tend to fuel each other.

USER STORY HADOOP IS ABOUT DATA VARIETY, NOT JUST DATA VOLUME.

TDWI found a couple of users who have employed Hadoop as an analytic platform. Both said the same thing:

Hadoop’s scalability for big data volumes is impressive, but the real reason they’re working with Hadoop is its

ability to manage a very broad range of data types in its le system, plus process analytic queries via MapReduce

across numerous eccentric data types. It’s not just Hadoop; TDWI has heard users make similar comments about

other analytic platforms.

Data feed velocity as a dening attribute of big data.

Big data can be described by its velocity or speed. You may prefer to think of it as the frequency of

data generation or the frequency of data delivery. For example, think of the stream of data coming

o of any kind of device or sensor, say robotic manufacturing machines, thermometers sensing

temperature, microphones listening for movement in a secure area, or video cameras scanning

for a specic face in a crowd. e collection of big data in real time isn’t new; many rms have

been collecting clickstream data from Web sites for years, using streaming data to make purchase

recommendations to Web visitors. With sensor and Web data ying at you relentlessly in real time,

data volumes get big in a hurry. Even more challenging, the analytics that go with streaming data

have to make sense of the data and possibly take action—all in real time.

Big data is remarkably

diverse in terms of sources,

data types, and entities

represented.

The leading edge of big

data is streaming data.

剩余37页未读，继续阅读

我没那么帅

粉丝: 5
资源: 13

大数据分析：最新工具与技术探索

Big Data Analytics

Data.Science.and.Big.Data.Analytics

TeamApple_Big_Scale_Analytics

Big Data Analytics for Healthcare

big data analytics with java 电子档

big data analytics

大数据与云计算融合技术相关文献

帮我找五份关于大数据的资料

Apache Drill

大数据软件技术的参考文献

车载应用开发的相关书籍

oracle rac mpp

大数据mapreduce经典案例倒排索引

How will the machinery industry change in the next 10 years

课程名称 大数据医疗领域的检测Python代码

Microsoft sql sever 2014

Hadoop地震数据参考文献

Statistical knowledge outline

python大数据分析教材

租房数据爬取参考文献

最新资源

课程名称大数据医疗领域的检测Python代码