tdwi.org 7
Introduction
quantications of big data grows continuously. All this makes big data for analytics a moving target
that’s tough to quantify.
USER STORY THERE ARE VARIOUS WAYS TO QUANTIFY BIG DATA.
TDWI asked a user how many terabytes he’s managing for analytics, and he said: “I don’t know, because I don’t
have to worry about storage. IT provides it generously, and I tap it like crazy.” Another user said: “We don’t count
terabytes. We count records. My analytic database for quality assurance alone has 3 billion records. There’s
another 3 billion in other analytic databases.”
Data type variety as a dening attribute of big data.
One of the things that makes big data really big is that it’s coming from a greater variety of sources
than ever before. Many of the newer ones are Web sources, including logs, clickstreams, and social
media. Sure, user organizations have been collecting Web data for years. But, for most organizations,
it’s been a kind of hoarding. We’ve seen similar untapped big data collected and hoarded, such as
RFID data from supply chain applications, text data from call center applications, semistructured
data from various business-to-business processes, and geospatial data in logistics. What’s changed is
that far more users are now analyzing big data instead of merely hoarding it. e few organizations
that have been analyzing this data now do so at a more complex and sophisticated level. Big data isn’t
new, but the eective analytical leveraging of big data is.
e recent tapping of these sources for analytics means that so-called structured data (which
previously held unchallenged hegemony in analytics) is now joined by unstructured data (text
and human language) and semistructured data (XML, RSS feeds). ere’s also data that’s hard to
categorize, as it comes from audio, video, and other devices. Plus, multidimensional data can be
drawn from a data warehouse to add historic context to big data. at’s a far more eclectic mix of
data types than analytics has ever seen. So, with big data, variety is just as big as volume. In addition,
variety and volume tend to fuel each other.
USER STORY HADOOP IS ABOUT DATA VARIETY, NOT JUST DATA VOLUME.
TDWI found a couple of users who have employed Hadoop as an analytic platform. Both said the same thing:
Hadoop’s scalability for big data volumes is impressive, but the real reason they’re working with Hadoop is its
ability to manage a very broad range of data types in its le system, plus process analytic queries via MapReduce
across numerous eccentric data types. It’s not just Hadoop; TDWI has heard users make similar comments about
other analytic platforms.
Data feed velocity as a dening attribute of big data.
Big data can be described by its velocity or speed. You may prefer to think of it as the frequency of
data generation or the frequency of data delivery. For example, think of the stream of data coming
o of any kind of device or sensor, say robotic manufacturing machines, thermometers sensing
temperature, microphones listening for movement in a secure area, or video cameras scanning
for a specic face in a crowd. e collection of big data in real time isn’t new; many rms have
been collecting clickstream data from Web sites for years, using streaming data to make purchase
recommendations to Web visitors. With sensor and Web data ying at you relentlessly in real time,
data volumes get big in a hurry. Even more challenging, the analytics that go with streaming data
have to make sense of the data and possibly take action—all in real time.
Big data is remarkably
diverse in terms of sources,
data types, and entities
represented.
The leading edge of big
data is streaming data.