没有合适的资源?快使用搜索试试~ 我知道了~
首页大数据分析的概念、技术与应用
资源详情
资源评论
资源推荐

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/263563679
Tutorial: Big Data Analytics: Concepts, Technologies, and Applications
ArticleinCommunications of the Association for Information Systems · January 2014
DOI: 10.17705/1CAIS.03465
CITATIONS
71
READS
11,211
1 author:
Hugh Watson
University of Georgia
149 PUBLICATIONS4,913 CITATIONS
SEE PROFILE
All content following this page was uploaded by Hugh Watson on 25 June 2016.
The user has requested enhancement of the downloaded file.

Communications of the Association for Information Systems
Volume 34 Article 65
5-2014
Tutorial: Big Data Analytics: Concepts,
Technologies, and Applications
Hugh J. Watson
University of Georgia, hwatson@uga.edu
Follow this and additional works at: hp://aisel.aisnet.org/cais
is material is brought to you by the Journals at AIS Electronic Library (AISeL). It has been accepted for inclusion in Communications of the
Association for Information Systems by an authorized administrator of AIS Electronic Library (AISeL). For more information, please contact
elibrary@aisnet.org.
Recommended Citation
Watson, Hugh J. (2014) "Tutorial: Big Data Analytics: Concepts, Technologies, and Applications," Communications of the Association
for Information Systems: Vol. 34, Article 65.
Available at: hp://aisel.aisnet.org/cais/vol34/iss1/65

Volume 34
Article 65
Tutorial: Big Data Analytics: Concepts, Technologies, and Applications
Hugh J. Watson
Department of MIS, University of Georgia
hwatson@uga.edu
We have entered the big data era. Organizations are capturing, storing, and analyzing data that has high volume,
velocity, and variety and comes from a variety of new sources, including social media, machines, log files, video,
text, image, RFID, and GPS. These sources have strained the capabilities of traditional relational database
management systems and spawned a host of new technologies, approaches, and platforms. The potential value of
big data analytics is great and is clearly established by a growing number of studies. The keys to success with big
data analytics include a clear business need, strong committed sponsorship, alignment between the business and
IT strategies, a fact-based decision-making culture, a strong data infrastructure, the right analytical tools, and people
skilled in the use of analytics. Because of the paradigm shift in the kinds of data being analyzed and how this data is
used, big data can be considered to be a new, fourth generation of decision support data management. Though the
business value from big data is great, especially for online companies like Google and Facebook, how it is being
used is raising significant privacy concerns.
Keywords: big data, analytics, benefits, architecture, platforms, privacy
Volume 34, Article 65, pp. 1247-1268, April 2014

Tutorial: Big Data Analytics: Concepts, Technologies, and Applications
Tutorial: Big Data Analytics: Concepts, Technologies, and Applications
1248
Volume 34
Article 65
I. INTRODUCTION
Big data and analytics are hot topics in both the popular and business press. Articles in publications like the New
York Times, the Wall Street Journal, and Financial Times, as well as books like Super Crunchers [Ayers, 2007],
Competing on Analytics [Davenport and Harris, 2007], and Analytics at Work [Davenport, Harris and Morison, 2010]
have spread the word about the potential value of big data and analytics.
Today, many organizations are collecting, storing, and analyzing massive amounts of data. This data is commonly
referred to as “big data” because of its volume, the velocity with which it arrives, and the variety of forms it takes. Big
data is creating a new generation of decision support data management. Businesses are recognizing the potential
value of this data and are putting the technologies, people, and processes in place to capitalize on the opportunities.
A key to deriving value from big data is the use of analytics. Collecting and storing big data creates little value; it is
only data infrastructure at this point. It must be analyzed and the results used by decision makers and organizational
processes in order to generate value.
Big data and analytics are intertwined, but analytics is not new. Many analytic techniques, such as regression
analysis, simulation, and machine learning, have been available for many years. Even the value in analyzing
unstructured data such as e-mail and documents has been well understood. What is new is the coming together of
advances in computer technology and software, new sources of data (e.g., social media), and business opportunity.
This confluence has created the current interest and opportunities in big data analytics. It is even spawning a new
area of practice and study called “data science” that encompasses the techniques, tools, technologies, and
processes for making sense out of big data.
Big data is creating new jobs and changing existing ones. Gartner [2012] predicts that by 2015 the need to support
big data will create 4.4 million IT jobs globally, with 1.9 million of them in the U.S. For every IT job created, an
additional three jobs will be generated outside of IT. Big data is also creating a high demand for people who can
analyze and use big data. A 2011 study by the McKinsey Global Institute predicts that by 2018 the U.S. alone will
face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and
analysts to analyze big data and make decisions [Manyika, Chui, Brown, Bughin, Dobbs, Roxburgh, and Byers,
2011]. Because companies are seeking people with big data skills, many universities are offering new courses,
certificates, and degree programs to provide students with the needed skills. Vendors such as IBM are helping
educate faculty and students through their university support programs.
At a high level, the requirements for organizational success with big data analytics are the same as those for
business intelligence (BI) in general [Williams, 2004]. At a deeper level, however, there are many nuances that are
important and need to be considered by organizations that are getting into big data analytics. For example,
organizational culture, data architecture, analytical tools, and personnel issues must be considered. Of particular
interest to information technology (IT) professionals are the new technologies, platforms, and approaches that are
being used to store and analyze big data. They aren’t your mother’s BI architecture [Watson, 2012].
Governments and companies are able to integrate personal data from numerous sources and learn much of what
you do, where you go, who your friends are, and what your preferences are. Although this leads to better service
(and profits for companies), it also raises privacy concerns [Clemons, Wilson, Barnett, Jin and Matt, 2014]. There
are few legal restrictions on what big data companies such as Facebook and Google can do with the data they
collect.
In this tutorial, we first consider the nature and sources of big data. Next, we look at the history of analytics, the
various kinds of analytics, and how they are used with big data. Starbucks, Chevron, U.S. Xpress, and Target are
used to illustrate various uses of big data analytics. Current research is documenting the benefits of big data and
provides a compelling argument for its use. The requirements for being successful with big data are discussed and
illustrated, including establishing a clear business need; having strong, committed sponsorship; alignment between
the business and analytics strategies; a fact-based decision-making culture; a strong data infrastructure; the right
analytical tools; and users, analysts, and data scientists skilled in the use of big data analytics. Special attention is
given to the technologies, platforms, and approaches for storing and analyzing big data. Privacy concerns about the
use of big data are also explored.

Volume 34
Article 65
1249
II. WHAT IS BIG DATA
From an evolutionary perspective, big data is not new. A major reason for creating data warehouses in the 1990s
was to store large amounts of data. Back then, a terabyte was considered big data.
1
Teradata, a leading data
warehousing vendor, used to recognize customers when their data warehouses reached a terabyte. Today,
Teradata has more than 35 customers, such as Wal-Mart and Verizon, with data warehouses over a petabyte in
size. eBay captures a terabyte of data per minute and maintains over 40 petabytes, the most of any company in the
world.
So what is big data? One perspective is that big data is more and different kinds of data than is easily handled by
traditional relational database management systems (RDBMSs). Some people consider 10 terabytes to be big data,
but any numerical definition is likely to change over time as organizations collect, store, and analyze more data.
Another useful perspective is to characterize big data as having high volume, high velocity, and high variety—the
three Vs [Russom, 2011]:
High volume—the amount or quantity of data
High velocity—the rate at which data is created
High variety—the different types of data
In short, “big data” means there is more of it, it comes more quickly, and comes in more forms.
Both of these perspectives are reflected in the following definition [Mills, Lucas, Irakliotis, Rappa, Carlson, and
Perlowitz, 2012; Sicular, 2013]:
Big data is a term that is used to describe data that is high volume, high velocity, and/or high variety;
requires new technologies and techniques to capture, store, and analyze it; and is used to enhance decision
making, provide insight and discovery, and support and optimize processes.
It is important to understand that what is thought to be big data today won’t seem so big in the future [Franks, 2012].
Many data sources are currently untapped—or at least underutilized. For example, every customer e-mail,
customer-service chat, and social media comment may be captured, stored, and analyzed to better understand
customers’ sentiments. Web browsing data may capture every mouse movement in order to better understand
customers’ shopping behaviors. Radio frequency identification (RFID) tags may be placed on every single piece of
merchandise in order to assess the condition and location of every item. Figure 1 shows the projected growth of big
data.
Figure 1: The Exponential Growth of Big Data (Source: Palfreyman, 2013)
1
As a frame of reference, a terabyte can hold 1,000 copies of the Encyclopedia Britannica. Ten terabytes can hold the printed collection of the
Library of Congress. A petabyte can hold approximately 20 million four-door filing cabinets full of text. It would take about 500 million floppy disks
to store the same amount of data. See www.whatsabyte.com.
剩余24页未读,继续阅读


















安全验证
文档复制为VIP权益,开通VIP直接复制

评论0