want to design/implement top-notch data science solutions to business problems, we
all need to have a common understanding of this material.
Colleagues also tell us that the book has been quite useful in an unforeseen way: for
preparing to interview data science job candidates. The demand from business for hiring
data scientists is strong and increasing. In response, more and more job seekers are
presenting themselves as data scientists. Every data science job candidate should un‐
derstand the fundamentals presented in this book. (Our industry colleagues tell us that
they are surprised how many do not. We have half-seriously discussed a follow-up
pamphlet “Cliff’s Notes to Interviewing for Data Science Jobs.”)
Our Conceptual Approach to Data Science
In this book we introduce a collection of the most important fundamental concepts of
data science. Some of these concepts are “headliners” for chapters, and others are in‐
troduced more naturally through the discussions (and thus they are not necessarily
labeled as fundamental concepts). The concepts span the process from envisioning the
problem, to applying data science techniques, to deploying the results to improve
decision-making. The concepts also undergird a large array of business analytics meth‐
ods and techniques.
The concepts fit into three general types:
1. Concepts about how data science fits in the organization and the competitive land‐
scape, including ways to attract, structure, and nurture data science teams; ways for
thinking about how data science leads to competitive advantage; and tactical con‐
cepts for doing well with data science projects.
2. General ways of thinking data-analytically. These help in identifying appropriate
data and consider appropriate methods. The concepts include the data mining pro‐
cess as well as the collection of different high-level data mining tasks.
3. General concepts for actually extracting knowledge from data, which undergird the
vast array of data science tasks and their algorithms.
For example, one fundamental concept is that of determining the similarity of two
entities described by data. This ability forms the basis for various specific tasks. It may
be used directly to find customers similar to a given customer. It forms the core of several
prediction algorithms that estimate a target value such as the expected resouce usage of
a client or the probability of a customer to respond to an offer. It is also the basis for
clustering techniques, which group entities by their shared features without a focused
objective. Similarity forms the basis of information retrieval, in which documents or
webpages relevant to a search query are retrieved. Finally, it underlies several common
algorithms for recommendation. A traditional algorithm-oriented book might present
each of these tasks in a different chapter, under different names, with common aspects
xii | Preface